AAbstract
We study a very small three player poker game (one-third street Kuhn poker),and a simplified version of the game that is interesting because it has three distinctequilibrium solutions. For one-third street Kuhn poker, we are able to find all of theequilibrium solutions analytically. For large enough pot size, P , there is a degree offreedom in the solution that allows one player to transfer profit between the othertwo players without changing their own profit. This has potentially interestingconsequences in repeated play of the game. We also show that in a simplifiedversion of the game with P >
5, there is one equilibrium solution if 5 < P ≤ P ∗ ≡ (5 + √ /
2, and three distinct equilibrium solutions if
P > P ∗ . This may be thesimplest non-trivial multiplayer poker game with more than one distinct equilibriumsolution and provides us with a test case for theories of dynamic strategy adjustmentover multiple realisations of the game.We then study a third order system of ordinary differential equations that modelsthe dynamics of three players who try to maximise their expectation by continuouslyvarying their betting frequencies. We find that the dynamics of this system are os-cillatory, with two distinct types of solution. Finally, we study a difference equationmodel, based on repeated play of the game, in which each player continually updatestheir estimates of the other players’ betting frequencies. We find that the dynam-ics are noisy, but basically oscillatory for short enough estimation periods and slowenough frequency adjustments, but that the dynamics can be very different for otherparameter values. a r X i v : . [ m a t h . O C ] A p r implified three player Kuhn poker J. BillinghamSchool of Mathematical Sciences,The University of Nottingham,Nottingham NG7 2RD, UKNovember 7, 2018
Poker is a multiplayer game of imperfect information. Although popular variants ofpoker, such as Texas Holdem and Omaha, are large, complex well-defined test problemsfor researchers in artificial intelligence [4], their size and complexity masks the fact thateven small and simple toy poker games pose significant theoretical challenges. Most ofthe literature on poker, simplified or otherwise, focusses on solving the game, in thesense of finding Nash equilibrium solutions. Two player poker is a zero sum game, soall equilibrium solutions have the same expectation for each player, [11]. Recently, twoplayer Limit Holdem, a large and complex game, was numerically solved in this sense [5],and a recently-developed, sophisticated AI that combines neural networks with on the flyequilibrium computation plays the even larger game of two player No Limit Holdem closeto optimally [7]. However, even a simple two player, zero sum game leads to significanttheoretical challenges when studied as a repeated game (for example, two player Kuhnpoker, also known as the AKQ game, [9]).Full street, three player Kuhn poker with pot size three units was introduced in [8]in order to test the performance of the counterfactual regret algorithm in three playergames, a context in which convergence to an equilibrium solution is not guaranteed.Later, a family of equilibrium solutions was found analytically, [10], although it was notestablished whether other equilibrium solutions exist. This game was subsequently pro-posed as part of the 2015 Annual Computer Poker Competition to test the performanceof AI players [1]. An indicator of the jump in dynamic complexity from two player tothree player zero sum games is the fact that a repeated game for only three playerswith a deck of just four cards is a difficult, unsolved problem that attracts significantattention from the AI community.In this paper we study the one-third street version of three player Kuhn poker. Fulland half street two player games are discussed extensively by Chen and Ankenman in[6]. Following their definitions, in the full street game Players 1, 2 and 3 can all bethe first to bet, whilst in the one-third street game, Players 1 and 2 must check, andPlayer 3 makes the first decision (check or bet). It is this simpler game that we study2ere, but with the size of the initial pot, P units, arbitrary. The natural choice, P = 3,(each player contributes one unit to the pot before play starts) has been used in previouswork, [8], [10]. In Section 2 (details in Appendix A), we determine all of the equilibriumsolutions of three player, one-third street Kuhn poker. The equilibrium is unique andtrivial for P ≤
2. For
P >
2, there is no unique equilibrium solution, but the nature ofthis non-uniqueness varies with P . When P = 5 or P ≥ P ∗ ≡ (cid:0) √ (cid:1) ≈ .
77, thereis a degree of freedom in the solution that allows one player to transfer profit betweenthe other two players without changing their own profit. This has potentially interestingconsequences in repeated play of the game. For all other values of
P >
2, Player 3 bluffswith J and/or Q at equilibrium with a well-defined total bluffing frequency.Our analysis of this game suggests that it is also of interest to study a simplifiedvariant (Player 3 must check with K and Q, Player 2 must call a single bet with K, P ≥ < P ≤ P ∗ ,and three distinct equilibrium solutions when P > P ∗ . Moreover, for P >
The deck contains four cards, A > K > Q > J, and each player is dealt a single cardat random. There is a pot of P units. Players 1 and 2 are forced to check. Player 3can then either check, in which case there is a showdown, or bet one unit. If Player 3bets, Player 1 must either call or fold. Player 2 must then either call (overcall if Player1 has called) or fold. If Player 1 and/or Player 2 calls, there is a showdown at which theplayer with the best card wins the pot and all the bets, otherwise Player 3 wins the pot.This is the one-third street version of three player Kuhn poker, which was introduced in[8]. Figure 1 shows the decision tree . Note that this is not the full game tree as it does not show the hidden cards (information sets) orthe payoffs at the terminal nodes. Check 12Fold Call, (0 , d Q , d K , , , o K , , c Q , c K , b J , b Q , , Figure 1: The decision tree for three player, one-third street Kuhn poker. Open circlesare decision nodes (labelled by the player making the decision), whilst solid circles areterminal nodes. Betting and calling frequencies with (
J, Q, K, A ) are also shown.Although each player is free to make either of their available decisions with any card,assuming rational players:1. Player 3 will bet with A, and Players 1 and 2 will call or overcall with A (A alwayswins at showdown).2. Players 1 and 2 will always fold J when Player 3 bets (J always loses at showdown).3. Player 2 will not overcall with Q after a call from Player 1 (Q cannot be the bestcard).4. Player 3 will not bet with K (Either Player 1 or Player 2 will hold A two-thirdsof the time, call and win, which leads to a loss that outweighs any profit Player 3may make from Player 1 or 2 calling with Q).As an aside, note that point 4. above assumes a level of rationality that exceeds that ofmany recreational poker players. Betting in situations in which no worse hand will calland no better hand will fold is endemic amongst weak human players, [3].The strategy parameters that we need to consider are:
Player 3 : Bluffing frequencies b J and b Q with J and Q. Player 1 : Calling frequencies c Q and c K with Q and K. Player 2a : Calling frequencies d Q and d K with Q and K after Player 2 folds. Player 2b : Overcalling frequency o K with K after Player 2 calls.When P < b J = b Q = c Q = c K = d Q = d K = 0, with o K undetermined, isthe unique, trivial equilibrium solution. Unless the pot is large enough, Player 3 has4o incentive to bluff, and Players 1 and 2 therefore have no incentive to call. We willnow assume that P ≥
2. We show in Appendix A, and confirm by symbolic algebra inAppendix C.2, that the equilibrium solutions are0 ≤ b J + b Q ≤ , c Q = c K = d Q = d K = o K = 0 for P = 2, (1) b J + b Q = 2 P + 1 , c Q = c K = 0 , d Q = 0 , d K = 2 P − P + 1 , o K = 0 (2)for 2 < P < ≤ b J + b Q ≤ , c Q = c K = 0 , d Q = 0 , d K = 1 , o K = 0 (3)for P = 5, b J + b Q = 2 P , b Q >
12 + 5 P − P P ( P + 1) , c Q = 0 , c K = P − P + 1 , d Q = 0 , d K = 1 , o K = 0(4)for 5 < P < P ∗ ≡ (5 + √ ≈ . b J + b Q = 2 P , c Q = 0 , c K = P − P + 1 , d Q = 0 , d K = 1 , o K = 0 (5)for P ≥ P ∗ , (Solution A) b J = 2 P , b Q = 0 , ≤ c Q ≤ P + 4 , c K = P − P + 1 , d Q = 0 , d K = 1 , o K = 0 (6)for P ≥ P ∗ . (Solution B)Note that Player 2 neither calls with Q nor overcalls with K at equilibrium ( d Q = 0 and o K = 0), which is not obvious a priori . Note also that P ∗ ≡ (cid:0) √ (cid:1) is the onlypositive root of 12 + 5 P − P = 0, so the constraint on b Q in (4) is satisfied by all positive b Q when P ≥ P ∗ .At the bifurcation points P = 2 and P = 5, a range of total bluffing frequencies, b J + b Q is possible for Player 3. Away from these two points, there is a fixed bluffingfrequency for Player 3. However, with the exception of Solution B, the choice of J orQ as a bluffing card for Player 3 is not constrained; only b J + b Q is prescribed. Thisindeterminacy arises because Players 1 and 2 never call with Q at equilibrium. InSolution B, which only exists for P > P ∗ ≡ (5 + √ ≈ .
77, Player 1 can call withQ with a frequency up to 2 / ( P + 4) without affecting her expectation, and Players 2or 3 cannot exploit her, neither by overcalling with K (Player 2) nor by bluffing morewith J (Player 3), provided that Player 3 bluffs only with J ( b J = 2 /P , b Q = 0). Theequilibrium frequencies are plotted in Figure 2. For 2 < P <
5, Player 2 uses K to catchPlayer 3’s bluffs. For
P >
5, Player 2 cannot call often enough with K, and Player 1must call with K at frequency ( P − / ( P + 1), which allows Player 3 to bluff slightlymore often.The ex-showdown expectations of these solutions are E = E = E = 0 for P ≤
2, (7)5 P b J +b Q c K d K Figure 2: The equilibrium bluffing and calling frequencies for one-third street, threeplayer Kuhn poker. 6 = − P − P + 1) , E = − P − P + 1) , E = P − P + 1) for 2 ≤ P <
5, (8) E = −
18 ( b J + b Q ) , E = −
112 + 18 ( b J + b Q ) , E = 112 for P = 5, (9) E = − P − P , E = − ( P − P − P ( P + 1) , E = P − P + 1) for P >
5, (Solution A) (10) E = − P − P , E = − c Q − ( P − P − P ( P + 1) , E = c Q + P − P + 1) for P ≥ P ∗ , (Solution B)(11)and are plotted in Figure 3 with c Q = 0. For all P , Player 3 has the chance to check withK and thereby realize its potential at showdown (poker players say that Player 3 hasposition). In contrast, at equilibrium Players 1 and 2 sometimes fold K when it is thebest card and would win the pot at showdown. This means that Player 3’s ex-showdownexpectation, E , is positive, whilst those of Players 1 and 2, E and E , are negative.Note that Player 3’s expectation, E , is a continuous function of P at equilibrium,whilst those of the other players, E and E , are discontinuous at P = 5, with Player1 losing more than Player 2 for P >
5. At P = 5, Player 3 can transfer profit betweenPlayers 1 and 2 at equilibrium by varying her total bluffing frequency ( ≤ b J + b Q ≤ ).Similarly, when P ≥ P ∗ , Player 1 can transfer profit from Player 2 to Player 3 bychoosing 0 ≤ c Q ≤ P +4 . Player 3 should therefore choose b J = 2 /P for P ≥ P ∗ in orderto maximise her potential profit. We now discuss some possible implications of this forrepeated play of the game. If P = 5, Player 3 can transfer profit between Players 1 and 2 at equilibrium by appro-priately choosing her bluffing frequency, b J + b Q . If P > P ∗ , Player 1 can transfer profitbetween Players 2 and 3 by changing her calling frequency with Q, c Q . In either case,in repeated play of the game with rotation, if two of the players form an alliance theycan transfer profit to each other.As an example, consider the game with P = 5. Player 3 controls the distribution ofprofit between Players 1 and 2 at equilibrium. If players 3 and 1 are in an alliance, Player1 will use c K = c Q = 0, and if Player 3 chooses b J = 1 / b Q = 0, the expectations of thethree players are independent of Player 2’s choice of strategy and Player 1’s expectationis maximised. However, if Players 3 and 2 are in an alliance (perhaps the same twoplayers after rotation of seats), with Player 2 choosing d Q = 0, d K = 1 and o K = 0 andPlayer 3 choosing b J + b Q = 2 /
5, the expectations are E = 524 c Q (cid:18) b J − (cid:19) − , E = − c Q (cid:18) b J − (cid:19) −
130 + 160 c K , E = 112 − c K + 124 c Q . By choosing non-zero values of c Q and/or c K (i.e. calling some fraction of the timewith Q and/or K), Player 1 can transfer profit between Players 2 and 3. Note that bychoosing b J < /
5, Player 3 can protect Player 2 from this possibility, but that if she7 P -0.1-0.0500.050.10.15 E E E Figure 3: The ex-showdown expectations for one-third street, three player Kuhn pokerwith c Q = 0 (i.e. not solution B).chooses b J > /
5, Player 1 can choose to target either Player 2 by calling with Q (atsome cost to herself) or Player 3 by calling with K (at no cost to herself). Furthermore,as the players’ positions rotate, the player who is not in the alliance will be Player 3one-third of the time, which gives her an opportunity to decide how to treat the othertwo players.A further complicating factor is that the strategy parameters are betting and callingfrequencies. The actual game is played with the cards dealt, not with publicly availablevalues of the parameters, which must be estimated by the players from the informationthat they have. Game theory usually assumes rationality in the players, but it is notclear whether the level of rationality assumed when asserting that a player will not foldwith A is the same as that assumed when asserting that a player will estimate anotherplayer’s frequency of calling with Q when they are Player 1 and exploit any opportunityfor profit that this might present. This is where skill enters the picture and an AI playershould be able to achieve win rates that exceed those of a rational but imperfect humanplayer, or indeed an inferior AI.In [10], a similar possibility is discussed for an equilibrium solution of the full streetversion of the game. Note that there is numerical evidence from CFR solutions of thefull street game with a range of pot sizes, [2], that P = 3 (the pot size used in [10]) isa bifurcation point (along with P = 4 and P = 5) in the same way that P = 5 is abifurcation point in the one-third street game studied here. It is not clear how much ofthe complicated equilibrium solution studied in [10] remains when P (cid:54) = 3, and whetherthere is any analogue of the continuous range of solutions of this type that exist in the8ne-third street game for P > P ∗ . In this section we consider simplified one-third street three player Kuhn poker (SKP),in which: • P > • Player 3 must check with Q ( b Q ≡ • Player 1 cannot call with Q ( c Q ≡ • Player 2 must call with K ( d K ≡ P > d K = 1and o K = 0, and also allows c Q = 0. The main simplification we make, which leads tothe existence of multiple, distinct equilibrium solutions, is that Player 3 is not allowedto bluff with Q ( b Q = 0).The remaining nontrivial decisions are: Player 3 : bluffing frequency with J, b J , Player 1 : calling frequency with K, c K , Player 2 : calling frequency with Q, d Q .The simplified betting tree is shown in Figure 4.We demonstrate in Appendix B, and confirm numerically in Appendix C.1, that theequilibrium solutions are b J = 2 P + 1 , c K = 0 , d Q = P − P + 1 , (Solution 1) (12) b J = 2 P , c K = P − P + 1 , d Q = 0 , (Solution 2) (13)for P ≥ P ∗ = (5 + √ ≈ . b J = 2 P , c K = 2 P + 2 , d Q = P − P − P ( P + 1) , (Solution 3) (14)for P ≥ P ∗ .In Solution 1, Player 2 calls with Q at a non-zero frequency, whilst Player 1 folds K. InSolution 2, Player 1 calls with K at the same non-zero frequency, whilst Player 2 folds Q(this is also a solution in the unsimplified, one-third street game). In Solution 3, Players9 Check 12Fold Call, (0 , d Q , , , , , , , c K , b J , , , Figure 4: The decision tree for simplified three player, one-third street Kuhn poker.Open circles are decision nodes (labelled by the player making the decision), whilst solidcircles are terminal nodes. Betting and calling frequencies with (
J, Q, K, A ) are alsoshown.1 and 2 call with K and Q respectively at a non-zero frequency. These frequencies areillustrated in Figure 5.The ex-showdown expectations, E = − P − P + 1) , E = − P − P + 1) , E = P − P + 1) , (Solution 1) (15) E = − P − P , E = − ( P − P − P ( P + 1) , E = P − P + 1) , (Solution 2) (16) E = − P − P , E = − P − P − P ( P + 2) , E = 2 P − P − P ( P + 2) , (Solution 3) (17)are illustrated in Figure 6. Note that Player 1 has her greatest expectation in Solution 1and Player 2 in Solution 2. Player 3 has her greatest expectation in Solution 3 if P > P ∗ ,but has the same expectation in each of Solutions 1 and 2 if P ≤ P ∗ . However, if eachplayer is restricted to choosing one of the three possible options (effectively two optionsfor Player 3), we find that the maxmin strategy (the best worst-case payoff) is: • Player 1: Solution 1, • Player 2: Solution 1 for
P <
7, Solution 2 for
P > • Player 3: Solution 1.This was determined by numerical computation. We conclude that for
P <
7, Solution1 is a rational choice for the three players. However, for
P >
7, this suggests thatPlayer 1 has an incentive to choose Solution 1 ( c K = 0) and Player 2 to choose Solution2 ( d Q = 0). Although the maxmin strategy for Player 3 under these constraints is10 P Solution 1 b J d Q P Solution 2 b J c K P Solution 3 b J c K d Q Figure 5: The three equilibrium solutions of simplified, one-third street, three playerKuhn poker. 11 P -0.07-0.06-0.05-0.04-0.03 E Solution 1Solutions 2 and 3 P -0.07-0.06-0.05-0.04-0.03 E Solution 1Solution 2Solution 3 P E Solutions 1 and 2Solution 3
Figure 6: The expectations of the three equilibrium solutions of simplified, one-thirdstreet, three player Kuhn poker. 12olution 1, c K = d Q = 0 is not part of an equilibrium solution and leaves Players 1and 2 open to exploitation by Player 3 by increasing b J . This strongly suggests thatthe outcome of a repeated game between rational players is unlikely to settle at anequilibrium solution if P >
7, and, depending on the dynamics of the players’ decisionmaking, this may also be the case for P ≤
7. In the following section, we will investigatethis further.
In SKP, as defined in the previous section, there are three distinct equilibrium solutions.In order to understand which, if any, of these equilibria might be selected in repeatedplay, we will construct and analyse two models. Each is based on the idea that a playerknows their ex-showdown expectation in terms of the three betting frequencies definedabove ( b J , c K and d Q ) and adjusts the frequency that they control to try to maximisethis expectation. This is a reasonable assumption about how human players mightapproach this game, i.e. bluff more if Players 1 and/or 2 don’t call enough, call more ifPlayer 3 bluffs too much. In section 4.1, we study a model that treats the three bettingfrequencies as continuous functions of time, t , and write down a third order system ofnonlinear ordinary differential equations that controls their evolution and has the samethree equilibrium solutions as SKP. Although this model has a number of weaknesses,which we discuss, we will see that in some cases its dynamics are very similar to thoseof the more realistic model that we study in section 4.2. In particular, none of thethree equilibrium states is an asymptotic attractor of the system. The attractors arenested periodic solutions, one set oscillating about the equilibrium that corresponds toSolution 1 and the other about the equilibrium that corresponds to Solution 2, withthe selection of the attractor depending on the initial frequencies. In section 4.2, wedescribe a difference equation model that is linked to repeated play of SKP, with eachplayer storing information about the most recent plays of the game and using it toestimate the betting frequencies of the other two players.Note that in each game the players remain in the same seats for each deal of thecards. In real three player poker games, the participants’ roles rotate after every deal,so that in effect they are successively playing three separate games in rotation. Humanplayers make deductions about each others strategies based on their play in similar, butdifferent, situations. We will not consider this possibility here. The signs of the coefficients of c K , d Q and b J in expressions (38) to (40) respectively,indicate the direction in which each player should change their betting frequency inorder to increase their expectation. If we now treat the three frequencies as continuous13unctions of time, t , this suggests that a rational model for how they vary is given by˙ b J = g ( b J ) f (cid:18) P − P + 1 − d Q − c K + d Q c K (cid:19) , (18)˙ c K = g ( c K ) f (cid:18) b J − P (cid:19) , (19)˙ d Q = g ( d Q ) f (cid:18) c K − P + 1 + b J (1 − c K ) (cid:19) , (20)where a dot denotes d/dt , g is a smooth, non-negative function that vanishes at zeroand one, thereby ensuring that each frequency lies in [0 , f i for i = 1, 2, 3 arenon-decreasing functions of their single arguments. The simplest form of this model,which we shall study here, has g ( x ) = x (1 − x ) and f i ( x ) = k i x , with k i strictly positiveconstants. This leads to the system˙ b J = k b J (1 − b J ) (cid:18) P − P + 1 − d Q − c K + d Q c K (cid:19) , (21)˙ c K = k c K (1 − c K ) (cid:18) b J − P (cid:19) , (22)˙ d Q = k d Q (1 − d Q ) (cid:18) c K − P + 1 + b J (1 − c K ) (cid:19) . (23)We can see immediately that the three equilibrium solutions of SKP are also equilibriumsolutions of this system, and also that the six planes b J = 0, b J = 1, c K = 0, c K = 1, d Q = 0 and d Q = 1 are invariant. We are only interested in the dynamics of the systemfor each of b J , c K and d Q in [0 , t . The three frequencies are controlled bythree individual players, who adjust them in response to the frequencies of the other twoplayers, of which they are assumed to have perfect knowledge.A local analysis close to the three equilibrium points, S = (2 / ( P +1) , , ( P − / ( P +1)), S = (2 /P, ( P − / ( P + 1) ,
0) and, for
P > P ∗ , S = (2 /P, / ( P + 2) , ( P − P − /P ( P + 1)), which correspond to (12) to (14), shows that • S has a one-dimensional stable manifold and a two dimensional centre manifold.This centre manifold is the plane c K = 0, and here the dynamics are those of anonlinear centre, i.e. a series of nested limit cycles (periodic solutions). • S has a one-dimensional manifold that is stable for P > P ∗ and unstable for P < P ∗ , and a two dimensional centre manifold. This centre manifold is the plane d Q = 0, and here the dynamics are those of a nonlinear centre, i.e. a series ofnested limit cycles. 14 S has a one-dimensional unstable manifold and a two-dimensional stable manifold.The dynamics on the stable manifold are oscillatory. The stable manifold separatesthe two domains of attraction of the planes c K = 0, which contains S and d Q = 0,which contains S .Moreover, it is possible to show that this description is also qualitatively correct forthe more general system, (18) to (20). We conclude that, generically, the solution isattracted to one of the limit cycles that surrounds either S or S . When P < P ∗ ,the solution is attracted to a limit cycle surrounding S , but for P > P ∗ the selectiondepends on the initial conditions (initial betting frequencies). We will focus on the moreinteresting case, P > P ∗ .In the following, all solutions are plotted for the typical case k = k = k = 1, P = 9and calculated numerically using the solver ode45 in MATLAB. Figure 7 shows the limitcycles in the plane c K = 0. On each integral path, the solution evolves anticlockwise. AsPlayer 3 bluffs more frequently, Player 2 calls more frequently, which causes Player 3 tobluff less frequently, then Player 2 to call less frequently, and so on, ad infinitum . Thesolutions in the invariant plane d Q = 0 are very similar and represent the same cycle ofrise and fall in bluffing and calling frequencies, but for Players 1 and 3. By integratingequations (21) to (23) backwards in time from an initial point in phase space close tothe unstable equilibrium solution S , we can numerically calculate an integral path thatlies in the the stable manifold of S , which is shown in Figure 8. This gives us a visualindication of the boundary of the basins of attraction of the two stable attractors. Anexample of each type of solution is shown in Figure 9. Finally, Figure 10 shows theex-showdown expectations of each player for the solutions shown in Figure 9, defined by E ( t ) = 124 (cid:90) t { c K ( s )( P b J ( s ) − − ( P − b J ( s ) } ds, (24) E ( t ) = 124 (cid:90) t { d Q ( s ) { c K ( s ) − b J ( s )(1 − c K ( s ))( P + 1) } + b J ( s )( c K ( s ) + 3) − } ds, (25) E ( t ) = 124 (cid:90) t { b J ( s ) [ P − − ( P + 1) { d Q ( s ) + (1 − d Q ( s )) c K ( s ) } ]+2 c K ( s ) + (2 − c K ( s )) d Q ( s ) + 2 } ds. (26)We can see that, for the case shown in Figure 10, these expectations oscillate aboutthose of the S and S equilibrium solutions given by (15) and (16). In this section, we study a more realistic model of repeated play of SKP. In the i th of N rounds of play, Players 1, 2 and 3 have frequencies c i , d i and b i respectively and stacks S i , S i and S i , with S p = 0 for p = 1, 2, 3. On each round, each player contributes The limit cycles that surround S are unstable for P < P ∗ , which reflects the fact that S is not anequilibrium solution of SKP. c K = 0.16igure 8: An integral path in the stable manifold of S . This manifold separates thebasins of attraction of the two attracting planes.17igure 9: Two solutions with different final behaviour.18igure 10: The ex-showdown expectations of the solutions shown in Figure 9.19 / b i , c i and d i , and stacks updated appropriately after each roundof play is complete, when the betting frequencies are also adjusted using a differenceequation analogue of (21) to (23), namely b i +1 = max (cid:26) , min (cid:26) , b i + k (cid:18) P − P + 1 − ¯ d i − ¯ c i + ¯ d i ¯ c i (cid:19)(cid:27)(cid:27) , (27) c i +1 = max (cid:26) , min (cid:26) , c i + k (cid:18) ¯ b i − P (cid:19)(cid:27)(cid:27) , (28) d i +1 = max (cid:26) , min (cid:26) , d i + k (cid:18) ¯ c i − P + 1 + ¯ b i (1 − ¯ c i ) (cid:19)(cid:27)(cid:27) . (29)Here, the barred variables on the right hand side are estimators of the opponents’ bettingfrequencies. For example, ¯ b i is Player 2’s estimate of Player 3’s bluffing frequency after i rounds of play. Player p uses the previous L p hands to construct unbiassed estimatorsof the other players’ frequencies based on information available to him. Details of theseestimators are given in Appendix D.The strategy of each player is characterised by the adjustment rate parameter, k p ,which determines how rapidly they adjust their betting frequency in response to theirestimates of the opponents’ strategies, and L p , the number of recent rounds of playthat they use to estimate these strategies. Larger values of L p lead to more accurateestimates, but a longer delay in the estimates. In this paper, we will focus on thedynamics when each player uses identical parameters, with L p = L and k p = k for p = 1, 2, 3, which allows us to illustrate the dynamic complexity of the game, alongwith the difficulty of predicting this based on either equilibrium considerations or eventhe differential equation model studied in the previous section.Figures 11 and 12 show how the frequencies vary for a range of values of k and L when P = 9, with initial conditions close to S . When the rate of adaptation, k , isrelatively slow ( k = 0 . S . Figure 13 shows the solutionfor small and large L . Note that the solution becomes smoother as L increases, andlies further from S . When the rate of adaptation is faster ( k = 0 . S and displays rather larger amplitude oscillations, particularly forlarger values of L . Figure 14 shows the time series for large and small L . The solutionwhen L = 192 displays a very regular relaxation oscillation, with Player 3 bluffing more,Players 1 and 2 both calling more in response, Player 3 bluffing less, Players 1 and 2calling less, and so on. Figure 15 shows that, for P > P ∗ ( P = 9 in the Figure), bychanging the initial condition, for k and L sufficiently small, the solution can eventuallylie close to either S or S . This bistability is a feature that we also saw in solutions ofthe ordinary differential equation model.The ex-showdown expectations of each player in the solutions shown in Figures 11and 12 are shown in table 1. For k = 0 . P = 9, k = 0 .
01 and 0 .
001 and L = 6, 12 and 24.Figure 12: The solution for P = 9, k = 0 .
01 and 0 .
001 and L = 48, 96 and 192.21igure 13: The solution for P = 9, k = 0 .
001 and L = 6 and 192. The horizontal dashedlines indicate the solution S . 22igure 14: The solution for P = 9, k = 0 .
01 and L = 6 and 192. The horizontal dashedlines indicate the solution S . 23igure 15: The solution for P = 9, k = 0 .
001 and L = 6 for two different initialconditions. The horizontal dashed lines indicate the solution S .24 = 0 . k = 0 . L = 6 -0.098 -0.1010.230 0.235-0.130 -0.132 L = 12 -0.103 -0.1050.234 0.237-0.129 -0.133 L = 24 -0.105 -0.1070.234 0.241-0.129 -0.138 L = 48 -0.105 -0.1100.234 0.248-0.129 -0.143 L = 96 -0.106 -0.1120.235 0.255-0.130 -0.139 L = 192 -0.107 -0.1150.237 0.254Table 1: Ex-showdown expectations when P = 9 for Player 1, 2 and 3, correspondingto solutions shown in Figures 11 and 12. The corresponding values for Solution 2 are-0.130, -0.104 and 0.233.solution S , given by (16), multiplied by a factor of two to account for the fact that only12 of the 24 possible combinations of cards are dealt in the simulation. For k = 0 . In this paper we have studied a reduced version of three player Kuhn poker - the one-third street game. We found that we could find the complete set of possible equilibriumsolutions analytically, for all positive pot sizes P . For some values of the pot size ( P = 5, P > P ∗ = (cid:0) √ (cid:1) ≈ . P > P ∗ , three distinct equilibriumsolutions exist, and for P > L deals, we found a variety of possible behaviours, depending on the frequency adjustmentrate and memory parameters. For slow enough frequency adjustment, the solutions arenoisy, but otherwise similar to the differential equation model, but more rapid frequencyadjustment leads to large oscillations in all players’ betting frequencies. Acknowledgements
I would like to acknowledge the contribution of my undergraduate project studentat the University of Nottingham in 2015, Richard Farbridge, who discovered two ofthe three equilibrium solutions in the simplified game.
References
The Education of a ModernPoker Player . D & B Poker, 2013.[4] Darse Billings, Aaron Davidson, Jonathan Schaeffer, and Duane Szafron. The chal-lenge of poker.
Artif. Intell. , 134(1-2):201–240, January 2002.[5] Michael Bowling, Neil Burch, Michael Johanson, and Oskari Tammelin. Heads-uplimit hold’em poker is solved.
Science , 347(6218):145–149, January 2015.[6] Bill Chen and Jerrod Ankenman.
The Mathematics of Poker . ConJelCo, 2009.[7] Matej Moravˇc´ık, Martin Schmid, Neil Burch, Viliam Lis´y, Dustin Morrill, NolanBard, Trevor Davis, Kevin Waugh, Michael Johanson, and Michael Bowling. Deep-stack: Expert-level artificial intelligence in heads-up no-limit poker.
Science , 2017.[8] Nick Abou Risk and Duane Szafron. Using counterfactual regret minimization tocreate competitive multiplayer poker agents. In
Proceedings of the 9th InternationalConference on Autonomous Agents and Multiagent Systems: Volume 1 - Volume1 , AAMAS ’10, pages 159–166, Richland, SC, 2010. International Foundation forAutonomous Agents and Multiagent Systems.[9] Finnegan Southey, Bret Hoehn, and Robert C. Holte. Effective short-term opponentexploitation in simplified poker.
Machine Learning , 74(2):159–189, 2009.2610] Duane Szafron, Richard Gibson, and Nathan Sturtevant. A parameterized familyof equilibrium profiles for three-player kuhn poker. In
Proceedings of the 2013International Conference on Autonomous Agents and Multi-agent Systems , AAMAS’13, pages 247–254, Richland, SC, 2013. International Foundation for AutonomousAgents and Multiagent Systems.[11] John von Neumann, Oskar Morgenstern, Harold W. Kuhn, and Ariel Rubinstein.
Theory of Games and Economic Behavior (60th Anniversary Commemorative Edi-tion) . Princeton University Press, 1944.
A Kuhn poker: Unique equilibrium solution for P ≥ There are P = 24 different ways to deal three from four cards to three players. Byconsidering each of these and the probability and payoffs associated with each possiblesequence of actions shown in the game tree in Figure 1, we find that the ex-showdownexpectations for each player are given by24 E = c Q [ − b J { P − ( P + 2) o K } ]+ c K {− P ( b J + b Q ) } +(2 − P + o K ) ( b J + b Q ) , (30)24 E = d Q { c K − b J (1 − c K ) ( P + 1) } + d K { c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) } + o K [ − c Q − b Q + b J { ( P + 2) c Q − } ] + b J (2 − P + c Q + c K ) + b Q (2 − P + c K ) , (31)24 E = b J [2 P − − ( P + 1) { c Q (1 − d K ) + d K + c K (1 − d Q ) + d Q } ]+ b Q { P − − ( P + 1) ( c K + d K ) } +2 ( c Q + c K )+(2 − c K ) d Q +(2 − c Q ) d K + c Q o K . (32)By noting that Player 3 chooses b J and b Q to maximise E , Player 1 chooses c Q and c K to maximise E and Player 2 chooses d Q , d K and o K to maximise E , we can findthe seven constraints that must be satisfied by equilibrium solutions, which correspondto the seven strategy parameters. In each case there are three possibilities, either anindifference holds (each equality labelled (c)) or the corresponding parameter is chosento be one or zero to maximise expectation (each inequality labelled (a) or (b)).1. (a) c Q (1 − d K ) + d K + c K (1 − d Q ) + d Q < P − P +1 & b J = 1,(b) c Q (1 − d K ) + d K + c K (1 − d Q ) + d Q > P − P +1 & b J = 0,(c) c Q (1 − d K ) + d K + c K (1 − d Q ) + d Q = P − P +1 & 0 ≤ b J ≤ c K + d K < P − P +1 & b Q = 1,(b) c K + d K > P − P +1 & b Q = 0,(c) c K + d K = P − P +1 & 0 ≤ b Q ≤ − b J { P − ( P + 2) o K } > c Q = 1,(b) − b J { P − ( P + 2) o K } < c Q = 0,27c) − b J { P − ( P + 2) o K } = 0 & 0 ≤ c Q ≤ b J + b Q > P & c K = 1,(b) b J + b Q < P & c K = 0,(c) b J + b Q = P & 0 ≤ c K ≤ c K − b J (1 − c K ) ( P + 1) > d Q = 1,(b) c K − b J (1 − c K ) ( P + 1) < d Q = 0,(c) c K − b J (1 − c K ) ( P + 1) = 0 & 0 ≤ d Q ≤ c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) > d K = 1,(b) c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) < d K = 0,(c) c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) = 0 & 0 ≤ d K ≤ − c Q − b Q − b J + b J c Q ( P + 2) > o K = 1,(b) − c Q − b Q − b J + b J c Q ( P + 2) < o K = 0,(c) − c Q − b Q − b J + b J c Q ( P + 2) = 0 & 0 ≤ o K ≤ P ≥
2, our task is to find all sets of strategy parameters, ( b J , b Q , c Q , c K , d Q , d K , o K ),such that at least one of (a), (b) and (c) holds for each of these seven constraints. Theseare the equilibrium solutions.We can eliminate 1.(a), 1.(b), 2.(b) and 7.(a) immediately. • ⇒ b J = 1 = ⇒ . ( a ) = ⇒ c K = 1 = ⇒ . ( b ) = ⇒ d Q = 0. Then1.(a) = ⇒ c K + d K < P − P +1 − c Q (1 − d K ) ≤ P − P +1 = ⇒ ⇒ b Q = 1 = ⇒ c K − b J (1 − c Q )( P + 1) + b Q ( P + 1) = P + (1 − c Q )( P + 1) > ⇒ ⇒ d K = 1 = ⇒ c K + d K > P − P +1 , a contradiction. • ⇒ b J = 0 = ⇒ (cid:0) (3.(b) = ⇒ c Q = 0) & (5.(b) = ⇒ d Q = 0) (cid:1) . Then1.(b) = ⇒ c K + d K > P − P +1 = ⇒ ⇒ b Q = 0 = ⇒ (cid:0) (4 . ( b ) = ⇒ c K =0) & (6 . ( b ) = ⇒ d K = 0) (cid:1) = ⇒ c Q (1 − d K ) + d K + c K (1 − d Q ) + d Q = 0, whichcontradicts 1.(b). • Since we now know that 1.(c) must hold, c K + d K − P − P +1 = − c Q (1 − d K ) − d Q (1 − c K ) ≤ ⇒ • ⇒ o K = 1 = ⇒ ⇒ c Q = 0, which contradicts 7.(a).The remaining constraints can now be written as1. (c) c K + d K − P − P +1 = − c Q (1 − d K ) − d Q (1 − c K ) & 0 ≤ b J ≤
12. (a) − c Q (1 − d K ) − d Q (1 − c K ) < b Q = 1,(c) − c Q (1 − d K ) − d Q (1 − c K ) = 0 & 0 ≤ b Q ≤ − b J { P − ( P + 2) o K } > c Q = 1,(b) − b J { P − ( P + 2) o K } < c Q = 0,(c) − b J { P − ( P + 2) o K } = 0 & 0 ≤ c Q ≤ b J + b Q > P & c K = 1,(b) b J + b Q < P & c K = 0,(c) b J + b Q = P & 0 ≤ c K ≤ c K − b J (1 − c K ) ( P + 1) > d Q = 1,(b) c K − b J (1 − c K ) ( P + 1) < d Q = 0,(c) c K − b J (1 − c K ) ( P + 1) = 0 & 0 ≤ d Q ≤ c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) > d K = 1,(b) c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) < d K = 0,(c) c Q − b J (1 − c Q ) ( P + 1) + b Q ( P + 1) = 0 & 0 ≤ d K ≤ − c Q − b Q − b J + b J c Q ( P + 2) < o K = 0,(c) − c Q − b Q − b J + b J c Q ( P + 2) = 0 & 0 ≤ o K ≤ c Q . A.1 c Q = 0 By looking for equilibrium solutions with c Q = 0, after noting that this implies 7.(b),and hence o K = 0, the constraints are greatly simplified to1. (c) c K + d K − P − P +1 = − d Q (1 − c K ), 0 ≤ b J ≤ − d Q (1 − c K ) < b Q = 1,(c) − d Q (1 − c K ) = 0 & 0 ≤ b Q ≤ b J ≤ P ,4. (a) b J + b Q > P & c K = 1,(b) b J + b Q < P & c K = 0,(c) b J + b Q = P & 0 ≤ c K ≤ c K − b J (1 − c K ) ( P + 1) > d Q = 1,(b) c K − b J (1 − c K ) ( P + 1) < d Q = 0,(c) c K − b J (1 − c K ) ( P + 1) = 0 & 0 ≤ d Q ≤ b J + b Q > P +1 & d K = 1,(b) b J + b Q < P +1 & d K = 0, 29c) b J + b Q = P +1 & 0 ≤ d K ≤ b Q = 1 = ⇒ ⇒ c K = 1, which contradicts 2.(a). Weconclude that only 2.(c) can be true, and therefore that d Q = 0 or c K = 1. Now notefrom 5.(b) that c K = 1 = ⇒ d Q = 0, so the least restrictive conclusion is that d Q = 0,and hence from 1.(c) c K + d K = 2 P − P + 1 , ≤ b J ≤ . (33)Now note that 4.(a) = ⇒ ⇒ c K + d K = 2, which contradicts (33), so 4.(a)cannot hold. Similarly 6.(b) = ⇒ ⇒ c K + d K = 0, which also contradicts (33),so 6.(b) cannot hold. Then, since 4.(c) and 6.(c) cannot hold simultaneously, we haveeither b J + b Q = 2 P + 1 , ≤ d K ≤ , c K = 0 , (34)or b J + b Q = 2 P , ≤ c K ≤ , d K = 1 . (35)From (33) we can see that (34) can hold if 2 ≤ P ≤
5, whilst (35) can hold if P ≥ b Q that arises from 5.(b) (shownin (4)). This is the equilibrium solution given by (1) to (6) with c Q = 0. A.2 < c Q < In this case, 3.(c) shows that b J = 2 P − ( P + 2) o K ≥ P , (36)and hence 4.(b) cannot hold. It is now easiest to consider 2.(a) and 2.(c) separately.
A.2.1 b Q = 1Now b Q = 1 = ⇒ ⇒ (cid:0) (4.(a) = ⇒ c K = 1) & (6.(a) = ⇒ d k = 1) (cid:1) . However, c K = d K = 1 is inconsistent with 1.(c), so there is no equilibrium that satisfies 2.(a). A.2.2 b Q < b Q < ⇒ ⇒ ( d K = 1 and d Q = 0) = ⇒ c K = P − P +1 = ⇒ ( P > ⇒ b J + b Q = P . Now (36) = ⇒ b J = P , b Q = 0 and o K = 0. This meansthat the inequalities in 5.(b), 6.(a) and 7.(b) must hold. On substituting these values ofthe parameters into these inequalities, we find that we also need P > P ∗ ≡ (cid:16) √ (cid:17) ≈ . , c Q < P + 4 . (37)This is Solution B in (6). 30 .3 c Q = 1 If 2.(a) is true = ⇒ b Q = 1 = ⇒ (cid:0) (4.(a) = ⇒ c K = 1) & (6.(a) = ⇒ d K = 1) (cid:1) , whichcontradicts 2.(a), and hence 2.(c) must hold, i.e. − (1 − d K ) − d Q (1 − c K ) = 0. Since5.(b) shows that c K = 1 = ⇒ d Q = 0, the least restrictive assumption is that d Q = 0and d K = 1, and hence c K = P − P +1 and P ≥ o K = 0, 3.(a) contradicts 4.(c), so 7.(c) must hold. Along with 4.(c), this gives b J = 1 /P and b Q = 1 /P , in contradiction with 3.(a). We conclude that no equilibriumsolution with c Q = 1 is possible. B Simplified Kuhn poker: Two or three equilibrium solutions
The ex-showdown expectations for this game are given by24 E = c K ( P b J − − ( P − b J , (38)24 E = d Q { c K − b J (1 − c K )( P + 1) } + b J ( c K + 3) − , (39)24 E = b J [ P − − ( P + 1) { d Q + (1 − d Q ) c K } ] + 2 c K + (2 − c K ) d Q + 2 , (40)and the three constraints that must hold at equilibrium are1. (a) d Q + (1 − d Q ) c K < P − P +1 & b J = 1,(b) d Q + (1 − d Q ) c K > P − P +1 & b J = 0,(c) d Q + (1 − d Q ) c K = P − P +1 & 0 ≤ b J ≤ b J > P & c K = 1,(b) b J < P & c K = 0,(c) b J = P & 0 ≤ c K ≤ − c K ) { b J ( P + 1) − } − > d Q = 1,(b) (1 − c K ) { b J ( P + 1) − } − < d Q = 0,(c) (1 − c K ) { b J ( P + 1) − } − ≤ d Q ≤ • ⇒ b J = 1 = ⇒ ⇒ c K = 1 = ⇒ ⇒ d Q = 0, incontradiction with 1.(a), • ⇒ b J = 0 = ⇒ ⇒ c K = 0 = ⇒ ⇒ d Q = 0, incontradiction with 1.(b).We conclude that only 1.(c) can hold, and hence that equilibrium solutions have d Q + (1 − d Q ) c K = P − P + 1 . (41)This immediately shows that c K (cid:54) = 1 and d Q (cid:54) = 1, eliminating the possibility of either2.(a) or 3.(a) at equilibrium. This leaves just 2.(b), (c) and 3.(b), (c). Although 2.(b)and 3.(b) cannot hold simultaneously, the other three combinations are all possible, andlead to the equilibrium solutions (12) to (14).31 Solution using symbolic algebra
For the relatively small games that we have studied in this paper, the analytical solutioncan also be determined using symbolic algebra.
C.1 Simplifed Kuhn Poker
We used Mathematica to confirm the analysis of section B. Noting that && and || arethe logical AND and OR operators, the command FullSimplify[Solve[((d + (1 - d) c < (P - 5)/(P + 1) && b == 1) ||(d + (1 - d) c > (P - 5)/(P + 1) && b == 0) || (d + (1 - d) c == (P - 5)/(P + 1)))&&((b > 2/P && c == 1) || (b < 2/P && c == 0) || (b == 2/P))&&(((1 - c) (b (P + 1) - 1) - 1 > 0 && d == 1) ||((1 - c) (b (P + 1) - 1) - 1 < 0 && d == 0) || ((1 - c) (b (P + 1) - 1) - 1 == 0))&&(d >= 0) && (d <= 1) && (c >= 0) && (c <= 1) && (b >= 0) && (b <= 1)&&(P>5) , {b, c, d}]] asks Mathematica to find solutions of the problem defined in section B. The resultingsolution is 32hich reproduces the solutions (12) to (14) for P (cid:54) = P ∗ . In order to find the solution for P = P ∗ , which Mathematica is unable to locate in this general setting, the value of P must be specified and the command run again. C.2 Kuhn poker
Using this method directly for the problem defined in section A outruns the 16Gb ofRAM available to us. This suggested that we should set up the problem in separatepieces, using
S1[1] = cQ (1 - dK) + dK + cK (1 - dQ) + dQ < (2 P - 4)/(P + 1) && bJ == 1;S1[2] = cQ (1 - dK) + dK + cK (1 - dQ) + dQ > (2 P - 4)/(P + 1) && bJ == 0;S1[3] = cQ (1 - dK) + dK + cK (1 - dQ) + dQ == (2 P - 4)/(P + 1);S2[1] = cK + dK < (2 P - 4)/(P + 1) && bQ == 1;S2[2] = cK + dK > (2 P - 4)/(P + 1) && bQ == 0;S2[3] = cK + dK == (2 P - 4)/(P + 1);S3[1] = -2 + bJ (P - (P + 2) oK) > 0 && cQ == 1;S3[2] = -2 + bJ (P - (P + 2) oK) < 0 && cQ == 0;S3[3] = -2 + bJ (P - (P + 2) oK) == 0;S4[1] = bJ + bQ > 2/P && cK == 1;S4[2] = bJ + bQ < 2/P && cK == 0;S4[3] = bJ + bQ == 2/P;S5[1] = cK - 2 + bJ (1 - cK) (P + 1) > 0 && dQ == 1;S5[2] = cK - 2 + bJ (1 - cK) (P + 1) < 0 && dQ == 0;S5[3] = cK - 2 + bJ (1 - cK) (P + 1) == 0;S6[1] = cQ - 2 + bJ (1 - cQ) (P + 1) + bQ (P + 1) > 0 && dK == 1;S6[2] = cQ - 2 + bJ (1 - cQ) (P + 1) + bQ (P + 1) < 0 && dK == 0;S6[3] = cQ - 2 + bJ (1 - cQ) (P + 1) + bQ (P + 1) == 0;S7[1] = -cQ - bQ - bJ + bJ cQ (P + 2) > 0 && oK == 1;S7[2] = -cQ - bQ - bJ + bJ cQ (P + 2) < 0 && oK == 0;S7[3] = -cQ - bQ - bJ + bJ cQ (P + 2) == 0;
We then ask Mathematica to look for a solution for each of the 3 = 2187 possiblecombinations of constraints and output every valid solution using Do[sol = FullSimplify[Solve[S1[i] && S2[j] && S3[k] && S4[l] && S5[m] && S6[n] && S7[o] &&(dK >= 0) && (dK <= 1) && (cK >= 0) && (cK <= 1) && (bJ >=0) && (bJ <= 1) && (dQ >= 0) && (dQ <= 1) && (cQ >=0) && (cQ <= 1) && (bQ >= 0) && (bQ <= 1) && (oK >=0) && (oK <= 1) && (P > 0), {bJ, bQ, cK, cQ, dK, dQ, oK}]];If[Length[sol] > 0, {Print[i, j, k, l, m, n, o, sol]}], {i, 1, 3}, {j, 1, 3}, {k, 1, 3}, {l, 1, 3}, {m, 1, 3}, {n, 1, 3}, {o, 1, 3}] P = 5 and P = P ∗ do not appear in this solutionand need to be computed separately. D Unbiassed estimators of opponents’ betting frequencies