Kolkata Paise Restaurant Problem in Some Uniform Learning Strategy Limits
Asim Ghosh, Anindya Sundar Chakrabarti, Bikas K. Chakrabarti
KKOLKATA PAISE RESTAURANT PROBLEM IN SOME UNIFORM LEARNINGSTRATEGY LIMITS
Asim Ghosh a , ∗ Anindya Sundar Chakrabarti b , † and Bikas K. Chakrabarti a,b,c ‡ a Theoretical Condensed Matter Physics Division,Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata 700 064, India. b Economic Research Unit, Indian Statistical Institute,203 Barrackpore Trunk Road, Kolkata 700108, India. c Center for Applied Mathematics and Computational Science and Theoretical Condensed Matter Physics Division,Saha Institute of Nuclear Physics, 1/AF Bidhannagar, Kolkata 700 064, India.
We study the dynamics of some uniform learning strategy limits or a probabilistic version of the“Kolkata Paise Restaurant” problem, where N agents choose among N equally priced but differentlyranked restaurants every evening such that each agent can get dinner in the best possible rankedrestaurant (each serving only one customer and the rest arriving there going without dinner thatevening). We consider the learning to be uniform among the agents and assume that each follow thesame probabilistic strategy dependent on the information of the past successes in the game. Thenumerical results for utilization of the restaurants in some limiting cases are analytically examined. PACS numbers:
I. INTRODUCTION
The Kolkata Paise Restaurant (KPR) problem (see [1]) is a repeated game, played between a large number of agentshaving no interaction among themselves. In KPR, N prospective customers choose from N restaurants each evening(time t ) in a parallel decision mode. Each restaurant have identical price but different rank k (agreed by the all the N agents) and can serve only one customer. If more than one agents arrive at any restaurant on any evening, oneof them is randomly chosen and is served and the rest do not get dinner that evening. Information regarding theagent distributions etc for earlier evenings are available to everyone. Each evening, each agent makes his/her decisionindependent of others. Each agent has an objective to arrive at the highest possible ranked restaurant, avoiding thecrowd so that he or she gets dinner there. Because of fluctuations (in avoiding herding behavior), more than oneagents may choose the same restaurant and all of them, except the one randomly chosen by the restaurant, then missdinner that evening and they are likely to change their strategy for choosing the respective restaurants next evening.As can be easily seen, no arrangement of the agent distribution among the restaurants can satisfy everybody on anyevening and the dynamics of optimal choice continues for ever. On a collective level, we look for the fraction ( f ) ofcustomers getting dinner in any evening and also its distribution for various strategies of the game.It might be interesting to note here that for KPR, most of the strategies will give a low average (over evenings)value of resource utilization (average fraction ¯ f << f = ¯ f = 1). Also, each one gets in turn to the best ranked restaurant (withperiodicity N ). The process starts from the first evening itself. It is hard to find a strategy in KPR, where eachagent decides independently (democratically) based on past experience and information, to achieve this even afterlong learning time.Let the strategy chosen by each agent in the KPR game be such that, at any time t , the probability p k ( t ) to arriveat the k -th ranked restaurant is given by p k ( t ) = 1 z (cid:20) k α exp (cid:18) − n k ( t − T (cid:19)(cid:21) , z = N (cid:88) k =1 k α exp (cid:18) − n k ( t − T (cid:19) , (1) ∗ Electronic address: [email protected] † Electronic address: [email protected] ‡ Electronic address: [email protected] a r X i v : . [ c s . G T ] M a y where n k ( t −
1) gives the number of agents arriving at the k -th ranked restaurant on the previous evening (or time t − T is a noise scaling factor and α is an exponent. Here for α > T >
0, the probability for any agent tochoose a particular restaurant increases with its rank k and decreases with the past popularity of the same restaurant(given by the number n k ( t −
1) of agents arriving at that restaurant on the previous evening). For α = 0 and T → ∞ , p k ( t ) = 1 /N corresponds to random choice (independent of rank) case. For α = 0, T →
0, the agents avoid thoserestaurants visited last evening and choose again randomly among the rest. For α = 1, and T → ∞ , the gamecorresponds to a strictly rank-dependent choice case. We concentrate on these three special limits. II. NUMERICAL ANALYSISA. Random-choice
For the case where α = 0 and T → ∞ , the probability p k ( t ) becomes independent of k and becomes qeuivalent to 1 /N .For simulation we take 1000 restaurant and 1000 agents and on each evening t an agent selects any restaurant withequal probability p = 1 /N . All averages have been made for 10 time steps. We study the variation of probability D ( f ) of the agents getting dinner versus their fraction f . The numerical analysis shows that mean and mode of thedistribution occurs around f (cid:39) .
63 and that the distribution D ( f ) is a Gaussian around that (see Fig. 1). random choicestrict-rank-dependent choiceavoiding-past-crowd choicefraction of people getting dinner ( f ) f r e q u e n c y d i s tr i bu t i o n D ( f ) FIG. 1: Numerical simulation results for the distribution D ( f ) of the fraction f of people getting dinner any evening (orfraction of restaurants occupied on any evening) against f for different limits of α and T . All the simulations have been donefor N = 1000 (number of restaurants and agents) and the statistics have been obtained after averages over 10 time steps(evenings) after stabilization. B. Strict-rank-dependent choice
For α = 1, T → ∞ , p k ( t ) = k/z ; z = (cid:80) k . In this case, each agent chooses a restaurant having rank k with aprobability, strictly given by its rank k . Here also we take 1000 agents and 1000 restaurants and average over 10 time steps for obtaining the statistics. Fig. 1 shows that D ( f ) is again a Gaussian and that its maximum occurs at f (cid:39) . ≡ ¯ f . C. Avoiding-past-crowd choice
In this case an agent chooses randomly among those restaurents which went vacant in the previous evening: withprobability p k ( t ) = exp( − n k ( t − T ) /z , where z = (cid:80) k exp( − n k ( t − T ) and T →
0, one gets p k → k values for which n k ( t − > p k = 1 /N (cid:48) for other values of k where N (cid:48) is the number of vacant restaurants in time t −
1. For α = 1 . α = 0 . T a v a r ag e f r a c t i o n o f u t ili z a t i o n ( ¯ f ) FIG. 2: Numerical simulation results for the average resource utilization fraction ( ¯ f ) against the noise parameter T for differentvalues of α ( > numerical studies we again take N = 1000 and average the statistics over 10 time steps. In the Fig. 1, the Gaussiandistribution D ( f ) of restaurant utilization fraction f is shown. The average utilization fraction ¯ f is seen to be around0 . III. ANALYTICAL RESULTSA. Random-choice case
Suppose there are λN agents and N restaurants. An agents can select any restaurant with equal probability. Therefore,the probability that a single restaurant is chosen by m agents is given by a Poission distribution in the limit N → ∞ : D ( m ) = (cid:18) λNm (cid:19) p m (1 − p ) λN − m ; p = 1 N = λ m m ! exp( − λ ) as N → ∞ . (2)Therefore the fraction of restaurants not chosen by any agents is given by D ( m = 0) = exp( − λ ) and that implies thataverage fraction of restaurants occupied on any evening is given by [1]¯ f = 1 − exp( − λ ) (cid:39) .
63 for λ = 1 , (3)in the KPR problem. The distribution of the fraction of utilization will be Gaussian around this average. B. Strict-rank-dependent choice
In this case, an agent goes to the k -th ranked restaurant with probability p k ( t ) = k/ (cid:80) k ; that is, p k ( t ) given by (1) inthe limit α = 1, T → ∞ . Starting with N restaurants and N agents, we make N/ k and N + 1 − k where 1 ≤ k ≤ N/
2. Therefore, an agent chooses any pair of restaurant withuniform probability p = 2 /N or N agents chooses randomly from N/ f = 1 − exp( − λ ) (cid:39) .
86 for λ = 2 . (4)Also, the expected number of restaurants occupied in a pair of restaurants with rank k and N + 1 − k by a pair ofagents is E k = 1 × k ( N + 1) + 1 × ( N + 1 − k ) ( N + 1) + 2 × × k ( N + 1 − k )( N + 1) . (5)Therefore, the fraction of restaurants occupied by pairs of agents f = 1 N (cid:88) i =1 ,...,N/ E k (cid:39) . . (6)Hence, the actual fraction of restaurants occupied by the agents is¯ f = f .f (cid:39) . . (7)Again, this compares well with the numerical observation of the most probable distribution position (see Figs. 1 and2). C. Avoiding-past-crowd choice
We consider here the case where each agent chooses on any evening ( t ) randomly among the restaurants in whichnobody had gone in the last evening ( t − α = 0 and T → D ( f ) of the fraction f of utilized restaurants is again Gaussian witha most probable peak at ¯ f (cid:39) .
46 (see Figs. 1 and 2). This can be explained in the following way: As the fraction ¯ f of restaurants visited by the agents in the last evening is avoided by the agents this evening, the number of availablerestaurants is N (1 − ¯ f ) for this evening and is chosen randomly by all the N agents. Hence, when fitted to Eq. (2), λ = 1 / (1 − ¯ f ). Therefore, following Eq. (2), we can write the equation for ¯ f as(1 − ¯ f ) (cid:18) − exp( − − ¯ f ) (cid:19) = ¯ f . (8)Solution of this equation gives ¯ f (cid:39) .
46. This result agrees well with the numerical results for this limit (see Figs. 1and 2; α = 0, T → IV. SUMMARY AND DISCUSSION
We consider here a game where N agents (prospective customers) attempt to choose every evening ( t ) from N equallypriced (hence no budget consideration for any individual agent is important) restaurants (each capable of servingonly one) having well-defined ranking k (= 1 , ..., N ), agreed by all the agents. The decissions on every evening ( t )are made by each agent independently, based on the informations about the rank k of the restaurants and their pastpopularity given by n k ( t − , .., n k (0) in general. We consider here cases where each agent chooses the k -th rankedrestaurant with probability p k ( t ) given by Eq. (1). The utilization fraction f of those restaurants on every eveningis studied and their distributions D ( f ) are shown in Fig. 1 for some special cases. From numerical studies, we findtheir distributions to be Gaussian with the most probable utilization fraction ¯ f (cid:39) .
63, 0 .
58 and 0 .
46 for the caseswith α = 0, T → ∞ , α = 1, T → ∞ and α = 0, T → f in these limits arealso given and they agree very well with the numerical observations.The KPR problem (see also the Kolkata Restaurant Problem [2]) has, in principle, a ‘trivial’ solution (dictatedfrom outside) where each agent gets into one of the respective restaurant (full utilization with f = 1) starting onthe first evening and gets the best possible sharing of their ranks as well when each one shifts to the next rankedrestaurant (with the periodic boundary) in the successive evenings. However, this can be extremely difficult to achievein the KPR game, even after long trial time, when each agent decides parallelly (or democratically) on their own,based on past experience and information regarding the history of the entire system of agents and restaurants. Theproblem becomes truly difficult in the N → ∞ limit. The KPR problem has similarity with the Minority GameProblem [3, 4] as in both the games, herding behavior is punished and diversity’s encouraged. Also, both involveslearning of the agents from the past successes etc. Of course, KPR has some simple exact solution limits, a few ofwhich are discussed here. In none of these cases, considered here, learning strategies are individualistic; rather allthe agents choose following the probability given by Eq. (1). In a few different limits of such a learning strategy, theaverage utilization fraction ¯ f and their distributions are obtained and compared with the analytic estimates, whichare reasonably close. Needless to mention, the real challenge is to design algorithms of learning mixed strategies (e.g.,from the pool discussed here) by the agents so that the simple ‘dictated’ solution emerges eventually even when everyone decides on the basis of their own information independently. Acknowledgment:
We are grateful to Arnab Chatterjee and Manipuspak Mitra for their important comments andsuggestions. [1] A.S. Chakrabarti, B.K. Chakrabarti, A. Chatterjee, M. Mitra,
The Kolkata Paise Restaurant problem and resource utilization ,Physica A (2009) 2420-2426.[2] B. K. Chakrabarti,
Kolkata Restaurant Problem as a generalised El Farol Bar Problem, in Econophysics of Markets andBusiness Networks , Eds. A. Chatterjee and B. K. Chakrabarti, New Economic Windows Series, Springer, Milan (2007), pp.239-246.[3] D. Challet, M. Marsili, Y.-C. Zhang,
Minority Games: Interacting Agents in Financial Markets , Oxford University Press,Oxford (2005).[4] D. Challet,