[PDF] DKPRG or how to succeed in the Kolkata Paise Restaurant gamevia TSP

Abstract

Full PDF

aa r X i v : . [ ec on . T H ] J a n DKPRG or how to succeed in the Kolkata Paise Restaurant gamevia TSP

Kalliopi Kastampolidou, Christos Papalitsas and Theodore AndronikosIonian University,Department of Informatics,7 Tsirigoti Square, Corfu, GreeceEmails: { c17kast, c14papa, andronikos } @ionio.grJanuary 20, 2021 Abstract

The Kolkata Paise Restaurant Problem is a challenging game, in which n agents must decide whereto have lunch during their lunch break. The game is very interesting because there are exactly n restaurants and each restaurant can accommodate only one agent. If two or more agents happen tochoose the same restaurant, only one gets served and the others have to return back to work hungry. Inthis paper we tackle this problem from an entirely new angle. We abolish certain implicit assumptions,which allows us to propose a novel strategy that results in greater utilization for the restaurants. Weemphasize the spatially distributed nature of our approach, which, for the ﬁrst time, perceives thelocations of the restaurants as uniformly distributed in the entire city area. This critical change inperspective has profound ramiﬁcations in the topological layout of the restaurants, which now makesit completely realistic to assume that every agent has a second chance. Every agent now may visit,in case of failure, more than one restaurants, within the predeﬁned time constraints. From the pointof view of each agent, the situation now resembles more that of the iconic travelling salesman, whomust compute an optimal route through n cities. Following this shift in paradigm, we advocate theuse of metaheuristics. This is because exact solutions of the TSP are prohibitively expensive, whereasmetaheuristics produce near-optimal solutions in a short amount of time. Thus, via metaheuristicseach agent can compute her own personalized solution, incorporating her preferences, and providingalternative destinations in case of successive failures. We analyze rigorously the resulting situation,proving probabilistic formulas that conﬁrm the advantages of this policy and the increase in utilization.The detailed mathematical analysis of our scheme demonstrates that it can achieve utilization rangingfrom 0 .

85 to 0 .

95 from the ﬁrst day, while rapidly attaining steady state utilization 1 .

0. Finally, wenote that the equations we derive generalize formulas that were previously presented in the literature,which can be seen as special cases of our results.

Keywords: : Game theory, Kolkata Paise Restaurant Problem, TSP, metaheuristics, optimization,probabilistic analysis.

The

El Farol Bar problem is a well-established problem in Game Theory. It was William Brian Arthur whointroduced El Farol Bar problem in Inductive Reasoning and Bounded Rationality [1]. It cab be describedas follows: N people, the players, need to decide simultaneously but independently whether they will visittonight a bar that oﬀers live music. In order to have an enjoyable night the bar must not be too crowded.Each potential visitor does not know the number of attendances each night in advance, so the visitor mustpredict and decide whether she wants to go to the bar or stay home. Although the players decide usingprevious knowledge, their choice is not aﬀected by previous visits and they cannot communicate with eachother [2]. In the El Farol Bar problem, the number of choices n is equal to 2, so the players have to choosebetween staying home or going out.The Kolkata Paise Restaurant Problem , as well as the

Minority Game , are variants of the El FarolBar problem. The Minority Game was ﬁrst introduced in 1997 by Damien Challet and Yi-Cheng Zhang[3]. They developed the mathematical formulation of the El Farol Bar which they named Minority Game.This game has an odd number N of agents and at each stage of the game they decide whether they will go1o the bar or stay home. The minority wins and the majority loses. Agents have to decide whether theywant to go to the bar or not, regardless of the predictions for the attendance size. The Minority Game isa binary symmetric version of the El Farol Bar problem, with the symmetry relying on the fact that thebar can contain half of the players.The Kolkata Paise Restaurant Problem (KPRP for short) is a repeated game that was named after thecity Kolkata in India. In KPRP there are n cheap restaurants (Paise Restaurants) and N laborers whochoose among these places for their quick lunch break. If the restaurant they go to is crowded, they haveto return to work hungry, since they do not have time to visit another restaurant, or lack the resourcesneeded to travel to another area. This generalization of the El Farol Bar is described as follows: eachof a large number N of laborers has to choose between a large number n of restaurants, where usually N = n . In order for a player to win, that is to eat lunch, only one player should go to each restaurant.If more than one players attend the same restaurant at the same time, an agent is chosen randomly andonly this agent is served. The player who gets to eat has a payoﬀ equal to 1, whereas all others whoalso chose this restaurant have a payoﬀ equal to 0. Each agent prefers to go to an unoccupied restaurant,than visit a restaurant where there are other agents as well. This realization in turn implies that the purestrategy Nash equilibria of the stage game are Pareto eﬃcient. Consequently, there are exactly n ! purestrategy Nash equilibria for the stage game. This, combined with the rationality of the players, leads to theconclusion that it is possible to sustain a pure strategy Nash equilibrium of the stage game as a sub-gameperfect equilibrium of the KPRP.In [4] each agent has a rational preference over the restaurants and, despite the fact that the ﬁrstrestaurant is the most preferred, all agents prefer to be served even at their least preferable restaurantthan not to be served at all. The prices are considered to be identical and each restaurant is allowed toserve only one agent. If more than one laborers attend the same restaurant, one laborer is chosen randomly,while the others remain starved for that day. The Kolkata Paise Restaurant problem is symmetric, giventhe preferences of the agents over the set of restaurants. The game is non-trivial because there is ahierarchy among the restaurants, with the ﬁrst being the most preferable. Another approach stipulatesthat if multiple agents choose the same restaurant they have to share the same meal and as a result, noneof them is happy. The choice of each player is secret and they have to choose simultaneously. The playerschoose their strategy based on the payoﬀs. It is assumed that the restaurants charge their meals withthe same price. There is even a version where some restaurants oﬀer much tastier meals than others.This game is a repeated game with a period of one day, and the choices of each player are known to theother players at the end of the day. The agents have their personal strategy as to where they intend tohave lunch. In order to attain the optimal solution, the agents have to communicate and coordinate theiractions, something which is forbidden. As a result, some agents may end up hungry and, at the same time,some restaurants may waste their food.Some authors study the case where the number of restaurants n is small and the agents take coordinatedactions. Then, they analyze the game as a sub-game of KPRP and estimate the possibility to preservethe cyclically fair norm. As a result, punishment schemes need to be designed in this case. Every eveningthe agent makes up her mind based on her past experiences and the available information about eachrestaurant, which is supposed to be known to every agent. Each agent decides on her own, with nointeraction with the other players. If more than one customers arrive at the same restaurant, an agentis randomly chosen to eat and the rest have to starve. There is a ranking system among the restaurants shared by the customers. The n ! Pareto eﬃcient states can be achieved when all customers get served. Theprobability of this event is very low, due to the absence of cooperation and disclosure among the agents. In discrete optimization problems, the variables take discrete values and, usually, the objective is to ﬁnd agraph or another similar visualization, from an inﬁnite or ﬁnite set [5]. The Travelling Salesman Problem(TSP) is a famous optimization problem described as follows: a salesman has to visit all the nearby citiesstarting from a speciﬁc city to which the salesman must return [6]. The only constraint is that the salesmanmust start and ﬁnish at the same speciﬁc city and visit each city only once. The visiting order is to bedetermined by the salesman each time the problem arises. The cities are connected through railway orroads and the cost of each travel is modeled by the diﬃculty in traversing the edges of the graph. Thesalesman has just one purpose and that is to visit all the cities with the minimum possible travel cost. Inthis problem, the optimum solution is the fastest, shortest and cheapest solution. TSP is easily expressedas a mathematical problem that typically assumes the form of a graph, where each of its nodes are thecities that the salesman has to visit. TSP was formulated during the 1800s by Sir William Rowan Hamilton2nd Thomas Kirkman and it was ﬁrst studied by Karl Menger during the 1930s at Harvard and Vienna[6]. The purpose of TSP is for the salesman to determine the route with the lowest possible cost. Someof the typical applications of TSP are network optimization and hardware identiﬁcation problems. It haskept researchers busy for decades and many solutions have emerged. TSP is an NP-hard problem andthe results of the practical, heuristic solutions are not always optimal, but approximate [6]. The simplest“naive” solution to this problem is, of course, to try all possibilities and explore all paths, but the cost intime and complexity is so huge that is practically impossible. In order to overcome that, when solving aTSP the pragmatic focus is a near-optimal route, instead of always the best. For the graph depicted inFigure 1 the optimal tour is 1 → → → → The Kolkata Paise Restaurant Problem (KPRP) was initially introduced in an earlier form in 2007 [7].Its current formulation appeared in 2009 in [8] and [9]. Subsequently, many creative ideas and diﬀerentlines of thought have been published and even a quantum version of the game has arisen. In [8], theimportance of diversity is emphasized while herd behavior is penalized. Furthermore, the diﬀerencesbetween the KPRP and the Minority Game are highlighted. One major diﬀerence is that in the KPRPthe emphasis is placed on the simultaneous move many choice problem , in contrast to the Minority Game,which studies a simultaneous move two choice problem . Another important diﬀerence is the existenceof a ranking system in the KPRP, but not in the Minority Game. Some of the strategies developed forthe KPRP are discussed in [10] which also discusses problems where these strategies can be successfullyapplied. Ghosh et al. in [11] present a dictator’s, or a social planner’s as they call it, solution. In thissolution the agents form a queue and the planner assigns each of them to a ranked restaurant dependingon the queue of the ﬁrst evening. The following evening the agents go to the next ranked restaurant andthe last in the queue goes to the ﬁrst ranked restaurant. This solution is called the fair social norm . Inreal life, each agent decides in parallel or democratically every evening, so this solution may be consideredsomewhat unrealistic. However, the parallel decision or democratic decision strategy is not as eﬃcient asthe dictated one, with the last leading to one of the best solutions to this problem. Banerjee et al. in [4]oﬀer a generalization of the problem in such a way that the cyclically fair norm is sustained. Each strategyis viewed as a sub-game of perfect equilibrium of the KPRP. In 2013, Ghosh et al. published an articleabout stochastic optimization strategies in the Minority Game and the KPRP [12]. There, they pointout that a stochastic crowd avoiding strategy results in a eﬃcient utilization in the KPRP. Reinforcementlearning was ﬁrst introduced in the KPRP by [13], together with six revision protocols aiming at eﬃcientresource utilization. These protocols combine local information with reinforcement learning, Each revisionprotocol has two variants depending on whether or not customers who were once served by a restaurantremain loyal to that restaurant in all subsequent periods. Some of these protocols were experimentallytested and shown to improve the utilization rate. Another generalization was introduced by Yang et al. in[14] aiming at dynamic markets this time. They studied what happens when agents can either divert toanother district or stay in the current one. Each agent may replace another agent with no prior knowledgeof the game, following a Poisson distribution. Agarwal et al. in [15] showed that the KPRP can be reducedto a Majority Game. In the latter, capacity is not restricted and agents aim at choosing with the herd.If more than one agents choose the same option, the utility decreases (see also [16] and [17]). Abergel etal. in [18] applied the KPRP in hospitals and beds. The local patients choose among the local hospitalsthose with the best ranking and compete with the other patients. If the patients are not treated in timeit is a clear case of social waste of service for the rest of the hospitals. A brief presentation of the KPRP3as given by Sharma et al. in [19], which included the origin and an overview of the game, strategies thatmay arise, several extensions and its applications in a variety of phenomena. The authors also presentedan experimental analysis. Park et al. in [20] introduced the KPRP in the Internet of Things (IoT) andIoT devices. They used a KPRP approach to develop a scheme for these devices, because it allowedthem to model situations where multiple resources are shared among multiple users, each with individualpreferences. In [21], Sinha et al. propose a phase transition behavior, where if two or more agents visitthe same restaurant, one is randomly picked to eat. The agents evolve their strategy based on the publiclyavailable information about past choices in order for each of them to reach the best minority choice. Inthe same paper, they also develop two strategies for crowd-avoiding.A signiﬁcant trend, which has been quite evident in the last two decades, is to enhance classical gamesusing unconventional means. The most prominent direction is to cast a classical game in a quantumsetting. Since the pioneering works of Meyer [22] and Eisert et al. [23], quantum versions for a plethoraof well-known classical games have been studied in the literature. Starting from the most famous of allgames, the Prisoners’ Dilemma [23], [24], [25], [26], many researchers have sought to achieve better solutionsby employing quantumness (see the recent [27] and [26] and references therein), or other tools, such asautomata ([28]). It not surprising that unconventional approaches to classical games are undertakenbecause they promise clear advantages over the classical ones. Another line of research is to turn tobiological systems for inspiration. The Prisoners’ Dilemma features prominently in this setting also (see[29] for a brief survey), but in reality most game situations can easily ﬁnd analogues in biological andbio-inspired processes [30] and [31]. A quantum version of the KPRP was proposed in [32], where thequantum Minority Game was expanded to a multiple choice version. The agents cannot communicate witheach other and have to choose among m choices, but an agent wins if she makes a unique choice. Higherpayoﬀs than the classical version were observed due to shared entanglement and quantum operations. InSharif’s [33] review, quantum protocols for quantum games were introduced, including a protocol for athree-player quantum version of the KPRP. In [34] the authors study the eﬀect of quantum decoherencein a three-player quantum KPRP using tripartite entangled qutrit states. They observe that in the case ofmaximum decoherence the inﬂuence of the amplitude damping channel dominates over depolarizing andﬂipping channels. Furthermore, the Nash equilibrium of the problem does not change under decoherence. The Travelling Salesman Problem is a well-known combinatorial optimization problem. In this problem asalesman must compute a route that begins from a particular node (the starting location), passes throughall other nodes only once before returning to the starting location, and has the minimum cost. The ﬁrstappearance of the term “Travelling Salesman Problem” probably occurred between 1931 and 1932. Thecore of the TSP problem, however, was ﬁrst mentioned over a century before, in a 1832’s German book [35].The mathematical formulation was introduced by Hamilton and Kirkman [35] and is typically expressedas follows. A cycle in a graph is a path that begins and ends at the same node and passes through allother nodes once. A

Hamiltonian cycle contains all the vertices of the graph. The Travelling SalesmanProblem amounts to ﬁguring the cheapest way to visit every city and return back. Research eﬀorts on TSPand closely related problems include Ascheuer et al. [36] that addressed the asymmetric TSP-TW usingmore than three alternative integer programming formulations and more than ten neighborhood structures.Gutin and Punnen [37] studied the eﬀect of sorting-based initialization procedures. The authors claimedthat understanding the algorithmic behavior is the best way to ﬁnd solutions, since this would help indetermining the best solution out of those available. Jones and Adamatzky [38] showed experimentallythat using a sorting function within their algorithm was not functional and failed to return a feasiblesolution in some cases.The diﬃculty in tackling the TSP motivated researchers to explore other avenues. One such notableand particularly promising approach is based on metaheuristics. A metaheuristic is a high-level heuristicthat is designed to recognize, build, or select a lower-level heuristic (such as a local search algorithm) thatcan provide a fairly good solution, particularly with missing or incomplete information or with limitedcomputing capacity [39]. The term “metaheuristics” was coined by Glover. Metaheuristics can be usedfor a wide range of problems. Of course, it must be noted that metaheuristic procedures, in contrastto exact methods, do not guarantee a global optimal solution [40]. Papalitsas et al. [41] designed ametaheuristic based on VNS for the TSP with emphasis on Time Windows. Another quantum-inspiredmethod, based on the original General Variable Neighborhood Search (GVNS), was proposed in order tosolve the standard TSP [42]. This quantum-inspired procedure was also applied successfully to the solutionof real-life problems that can be modeled as TSP instances [43]. A quantum-inspired procedure for solvingthe TSP with Time Windows was also presented in [44]. More recently, [45] applied a quantum-inspired4etaheuristic for tackling the practical problem of garbage collection with time windows that producedparticularly promising experimental results, as further comparative analysis demonstrated in [46]. Athorough statistical and computational analysis on asymmetric, symmetric, and national TSP benchmarksfrom the well known TSPLIB benchmark library, was conducted in [47]. Very recently, Papalitsas etal. parameterized the TSPTW into the QUBO (Quadratic Unconstrained Binary Optimization) model[48]. The QUBO formulation enables TSPTW to run on a Quantum Annealer and is a critical steptowards the ultimate goal of running the TSPTW with pure quantum optimization methods. Stochasticoptimization can be implemented through several metaheuristic processes. The solution generated dependson the set of created random variables [39]. Metaheuristic processes may ﬁnd successful solutions withless computational eﬀort than accurate algorithms, iterative methods or basic heuristic procedures bylooking for a wide variety of feasible solutions [40]. Hence, metaheuristic procedures are extremely usefuland practical approaches for optimization because they can guarantee good solutions in a small amountof time. For example, a problem instance with thousands of nodes can be run for 30 −

40 seconds andproduce a solution with 3 −

5% deviation from the optimal. This deviation depends on the implementedlocal search procedures inside the main part of the algorithm. An eﬃcient design and choice of thoseimprovement heuristics will deﬁne the deviation from the optimal solution. In view of the small amountof time they require and of the good quality of the solution they produce, we advocate their functional usein the Distributed Kolkata Paise Restaurant game.

Let us now brieﬂy summarize the contributions of this paper. • We study the Kolkata Paise Restaurant Problem from an entirely new perspective. We identify andstate explicitly certain implicit assumption that are inherent in the standard formulation of the game.We then take the unconventional step to abolish them entirely. This provides the opportunity foran entirely new setting and the adoption of a novel approach that leads to a new and more eﬃcientstrategy and, ultimately, to greater utilization for the restaurants. • For the ﬁrst time, to the best of our knowledge, we focus on the spatial setting of the game andwe propose a more realistic and plausible topological layout for the restaurants. We perceive therestaurants to be uniformly distributed in the entire city area. This, rather pragmatic and moreprobable in reality situation, has profound ramiﬁcations on the topological layout of the game: therestaurants now get closer and, as their number n increases, a standard assumption in the literature,the distances between nearby restaurants decrease. Due to the distribution of the restaurants, theresulting version of the game is aptly named the Distributed Kolkata Paise Restaurant Game . • Thus, now it is realistic to assume that every agent has a second, a third, maybe even a fourth,chance. Every agent may visit, within the predeﬁned time constraints, more than one restaurants.The agent is no longer a single destination and back traveller. The agent now resembles the iconic travelling salesman , who must pass through a network of cities, visiting every city once, coming backto the starting point, and all the time following the optimal route. This leads to the completely novelidea that each agent faces her own personalized TSP. We emphasize that the situation is speciﬁc foreach agent, since the resulting network will vary. This is because each agent may have a diﬀerentstarting position and a diﬀerent preference ranking of the restaurants. Of course, it is practicallyimpossible to compute exact solutions for the TSP, as TSP is a famous NP-hard problem. However,this is a very small setback, as we may use metaheuristics. Metaheuristics can produce near-optimalsolutions in a very short amount of time and this makes them indispensable tools of great practicalvalue. • This entirely new setting is formalized and then rigorously analyzed via probabilistic tools. Wederive general formulas that mathematically conﬁrm the advantages of this policy and the increasein utilization. Detailed examples of typical instances of the game are given in a series of Tablesand the derived equations are graphically depicted in order to demonstrate their qualitative andquantitative characteristics. Our scheme demonstrably achieves utilization ranging from 0 .

85 andgoing to 0 .

95 and even beyond from the ﬁrst day. The steady state utilization, to which the gamerapidly converges, is, as expected, 1 . • Finally, let us point out that the equations we derive generalize formulas that were previously pre-sented in the literature, showing that the latter are actually special cases of our results.5 .5 Organization of the paper

The structure of this paper is as follows. In section 1 we provide a comprehensive description of the KPRPand the TSP. In subsection 1.3 we mention some important works that deal with the KPRP and the TSP.The rigorous formulations of the KPRP and the TSP are presented in section 2. In section 3 we give athorough explanation and presentation of the distributed version of the game, which we call DistributedKolkata Paise Restaurant Game. We analyze mathematically the topological situation regarding therestaurants in section 4, where the profound ramiﬁcations of the hypothesis that they follow the uniformprobability distribution are developed. We formally prove the main results of the paper, which showcasethe advantages of the distributed framework in a deﬁnitive manner in section 5 . Finally, in Section 6 wesummarize our results and discuss future extensions of this work.

In its most usual formulation, the Kolkata Paise Restaurant Problem is a repeated game with inﬁniterounds. There is a set of players, typically called agents or customers , that is denoted by A = { a , . . . , a n } ,a set of restaurants that is denoted by R = { r , . . . , r n } , and a utility vector u = ( u , . . . , u n ) ∈ R n , whichis associated with the restaurants and is common to every agent. On any given day, all agents decide togo to one of the n restaurants for lunch. If it happens that just one agent arrives at a speciﬁc restaurant,then she will have lunch and she will be happy. If, however, two or more agents choose the same restaurantfor lunch, then, it is generally assumed that just one of them will eat. The one to eat is chosen randomly.So, in such a case all but one will not be happy. Each agent has a utility and if they have lunch theirutility is one, otherwise it is zero. In Chakrabarti et al. [8] the KPRP is modeled as a general one-shotrestaurant game, where the set of agents is considered to be ﬁnite and the utilities are ranked as follows:0 < u n ≤ . . . ≤ u ≤ u . The set of agents A and the ranking of the utilities can be used to deﬁne thegame. The latter can be represented as G ( u ) = ( A, S, Q ), where A is the set of agents, S is the set ofstrategies available to all agents, and Q = ( Q , . . . , Q n ) stands for the payoﬀ vector. If the i th agent a i decides to go to the j th restaurant r j , then the corresponding strategy is s i = j . Every day each agentdecides to which of the n restaurants will go to eat. If s i = j , this means that agent a i has decided togo to restaurant r j . Given any strategy combination s = ( s , . . . , s n ) ∈ S n , the associated payoﬀ vectoris deﬁned as Q ( s ) = ( Q ( s ) , . . . , Q n ( s )), where the payoﬀ Q i ( s ) of player a i is u si N i ( s ) and N i ( s ) is thetotal number of players that have made the same choice, i.e., restaurant r j , as player a i , including a i . Thestrategy combination is in fact the restaurants the agents chose to eat to, and their payoﬀ depends ontheir decision and the number of other agents that have made the same choice. In the literature, a gamelike KPRP, where there are potentially inﬁnite rounds and in each round the same stage game is played,is referred to a supergame [49]. A supergame is a situation where the same game is repeatedly played as aone-shot game and the agents count the payoﬀ in the long run of the game. This makes the payoﬀ functionmore complex due to the repetitions. The problem of ﬁnding the shortest Hamiltonian cycle is closely related to the TSP. The Hamiltoniangraph problem, i.e., determining if a graph has a Hamiltonian cycle, is reducible to the traveling salesmanproblem. The trick is to assign zero length to the graph edges and, at the same time, create a new edgeof length one for each missing edge. If the TSP solution for the resulting graph is zero, then there isa Hamiltonian cycle in the original graph; if the TSP solution is a positive number, then there is noHamiltonian cycle in the original graph (see [50]). In diﬀerent ﬁelds, such as operational research andtheoretical computer science, TSP, which is NP-hard, is of great signiﬁcance. Usually TSP is representedby a graph. The fact that TSP is NP-hard implies that there is no known polynomial-time algorithmfor ﬁnding an optimal solution regardless of the size of the problem instance [51]. There are two typesof models for the TSP, symmetric and asymmetric . The former is represented by a complete undirectedgraph G = ( V, E ) and the latter by a complete directed graph G = ( V, A ). Assuming that n denotes thenumber of cities (nodes), V = { , , , . . . , n } is the set of vertices, E = { ( i, j ) : i, j ∈ V, where i < j } is the set of edges, and A = { ( i, j ) : i, j ∈ V, where i = j } is the set of arcs. A cost matrix C = [ c i,j ],which satisﬁes the triangle inequality c i,j ≤ c i,k + c k,j for every i, j, k , is deﬁned for each edge or arc. If c i,j is equal to c j,i , the TSP is symmetric (sTSP), otherwise it is called asymmetric (aTSP). In particular,this is the case for problems where the vertices are points P i = ( X i , Y i ) of the Euclidean plane, and6estaurants r r r r n − r n Agents a a n Figure 2: The assumed topology in the standard KPRP. The restaurants are located on a “circle,” forminga “regular polygon.” The agents are concentrated in a very narrow region around center of the “circle.” c i,j = p ( X j − X i ) + ( Y j − Y i ) is the Euclidean distance. The triangle inequality holds if the quantity c i,j represents the length of the shortest path from i to j in the graph G [52]. In the case of the symmetricalTSP, the number of all possible routes covering all cities and corresponding to all feasible solutions is givenby ( n − (recall that the number of cities is n ). The cost of the route is the sum of the costs of the edgesfollowed. The Kolkata Paise Restaurant problem (KPRP) is considered an extension of the minority game, as itinvolves multiple players ( n ) each having multiple choices ( N ). In its most general form it is possible that n = N . In this paper we follow the pretty much standard approach that the number of agents is equalto the number of restaurants, i.e., n = N . The novelty of our work lies on the fact that we advocatea spatially distributed and, in our view, more realistic version of the KPRP by taking into account thetopology of the restaurants and by allowing the agents to begin their routes from diﬀerent starting points.We call our version the Distributed Kolkata Paise Restaurant Game , or DKPRG for from now on.In the original formulation of the KPRP one may readily point out the following important underlyingassumptions.( A1 ) All agents start from the same location.( A2 ) All restaurants are near enough to the point of origin of every customer, so that each customer can,in principle, go to any restaurant, eat there and return back to work in time, that is within the timewindow of the lunch break.( A3 ) Every restaurant is suﬃciently far away from every other restaurant, so as to make prohibitive interms of time constraints the possibility of any customer trying a second restaurant, in case her ﬁrstchoice proved fruitless.In the two dimensional setting of Kolkata, or, as a matter of fact, of any city, the above assumptionstaken together imply something very close to the situation depicted in Figure 2. There, one can see thatthe agents are concentrated within a very narrow region, which can be viewed as the center of a conceptual“circle.” The restaurants are located on this “circle” and since no two of them are allowed to be close theyform something that resembles a “regular polygon.”7his last remark is signiﬁcant because it disallows a situation as the one shown in Figure 3. Thespatial layout depicted in this Figure is strictly forbidden. The proximity of two, three or more restaurantswould contradict the impossibility of a second chance. In the standard KPRP no agent is allowed a secondchance. We write “circle” and “regular polygon” inside quotation marks because we are not obviouslydealing with a perfect geometric circle or a perfect regular polygon, but two dimensional approximationsresembling the aforementioned symmetric shapes. Clearly, this a very special topological layout, one thatis highly unlikely to be observed in practice. There is no compelling reason for the restaurants to exhibitthis regularity or the agents to be conﬁned to approximately the same location. On the contrary, itwould seem far more reasonable to assume that at least the restaurants and perhaps even the agents areuniformly distributed within a given area. Finally, the usual assumption that the preference ranking ofthe restaurants is common to all customers seems a bit too special and probably too restrictive.Restaurants r j − r j r j +1 r p r q Agents a a n Figure 3: The spatial layout depicted in this Figure is strictly forbidden. The proximity of two, three ormore restaurants would contradict the impossibility of a second chance. In the standard KPRP no agentis allowed a second chance.With that motivation in mind, in this work we propose to abolish all these assumptions. The resultinggame is spatially distributed in terms of restaurants and as such is called the Distributed Kolkata PaiseRestaurant Game (DKPRG). In our setting, each customer may have her own staring point, which is, ingeneral, diﬀerent from the starting locations of the other customers. The staring locations can either beconcentrated in a small region of the Kolkata city area, precisely like the standard KPRP, or they may beassumed to follow a random distribution. The fundamental diﬀerence with prior approaches is that now therestaurants are viewed as being uniformly distributed over the city of Kolkata. This uniform randomnessin the placement of the restaurants implies that there must be clusters of restaurants suﬃciently neareach other. This conclusion becomes inescapable, particularly in the case where the number of restaurantsis large ( n → ∞ ). As will be shown in the following sections, the expected distance between “adjacent”restaurants will be relatively short and will only decrease as the number n of restaurants increases.The assumption of the random placement of restaurants leads to a personalized situation for eachindividual agent: each agent is in eﬀect faced with a personalized Travelling Salesman Problem. To everyagent corresponds an individual graph, which is assumed to be complete. The completeness assumptionis not absolutely essential for the TSP, but, in any case, seems reasonable in the sense that one can gofrom any given restaurant to any other. This graph has n + 1 nodes, which are the locations of the n restaurants plus the location of the starting point of the customer. The costs assigned to the edges of thegraph are also personalized ; each agent combines an objective factor, the spatial distances between therestaurants, with a subjective factor, her personal preferences. Recall that in the DKPRG we forego thecommon preference restriction and we let every customer have a distinct preference, i.e., she may prefera particular restaurant and dislike another. Let us clarify however, that getting served, even at the least8referable restaurant, is more desirable than not getting served at all! This in turn will lead to a possiblyunique ordering of the restaurants from the most preferable to the least for each agent . For instance, iftwo restaurants r and r ′ are equidistant from the staring point s of a certain customer, something thatis obviously an objective fact, but the customer in question has a clear preference for r over r ′ , then sheadjusts the costs c s,r and c s,r ′ corresponding to the edges ( s, r ) and ( s, r ′ ), respectively, so that c s,r < c s,r ′ .Hence, every agent is faced with a distinctive network topology, which is the combined result of theinherent randomness of the spatial locations and the subjectiveness of her preferences. The topology ofthe restaurants has a further implication of the utmost importance: a customer whose ﬁrst choice is aparticular restaurant, will now have with very high probability the opportunity to visit a second, a third,or even a fourth restaurant in the same area if need be. For each agent the time cost is dominated bythe time taken to visit the ﬁrst restaurant; the trip to other nearby restaurants in the same region incursa relatively negligible time cost due to their spatial proximity. The customer has a second (even a third)chance to be served within the time window of the lunch break. Thus, an eﬃcient, if not optimal, methodfor every customer to make well-informed decisions regarding her ﬁrst, second, third, etc. choice is tosolve the Travelling Salesman Problem for her personalized graph. Obviously, the TSP being an NP-hardproblem, precludes the possibility of exact solutions. Nonetheless, near-optimal solutions of great practicalvalue can easily be achieved in very short time by employing metaheuristics, as we have pointed out insubsection 1.3.From this perspective, we proceed now to propose an eﬀective distributed strategy that, if adopted byevery agent, will lead to an eﬃcient global solution. All of them will use a common high-level strategythat is tailored and ﬁne-tuned according to their individual preferences. To enhance clarity we explicitlystate below the hypotheses and that deﬁne and characterize the DKPRG variant.( H1 ) DKPRG is an inﬁnitely repeated game.( H2 ) There are two main protagonists in the game. First, the n agents (also referred to as customers ) withdiﬀerent, in general, starting locations. The set of agents is denoted by A = { a , . . . , a n } . Second,the n restaurants that are uniformly distributed within the same area. The set of restaurants isdenoted by R = { r , . . . , r n } . All agents know the locations of the restaurants, but each one of themneed not know the starting locations of the other agents.( H3 ) To each agent a ∈ A corresponds a distinct personal preference ordering P a = ( r j , r j , . . . , r j n ) suchthat restaurant r j is her ﬁrst preference, r j is her second preference, and so on, with r j n being theleast preferable restaurant for a .( H4 ) We adopt the standard convention that each restaurant can accommodate only one customer at atime. The immediate ramiﬁcation of this convention is that if two or more customers arrive at arestaurant, only one can be served. The one to be served is chosen randomly.( H5 ) The aforementioned hypotheses immediately bring to the front the novelty and contribution of ourapproach. The positions of the restaurants relative to the starting point of each customer create forevery customer a distinct topology, a distinct network of restaurants. Speciﬁcally, each agent a ∈ A perceives a personalized graph G a = ( V a , E a ). G a is a complete undirected graph having n +1 vertices v , v , . . . , v n , where v is the starting location of a and v j is the location of restaurant j, ≤ j ≤ n .For each pair of distinct vertices u, v ∈ V a there exists an undirected edge ( u, v ) ∈ E a . The graph G a is complemented with the (symmetric) cost matrix C a , that assigns to each edge ( u, v ) a cost c u,v .We may surmise that the costs are computed by a function f a that incorporates geographical data,i.e., the distances between the restaurants, and the preference ordering P a . The topological layout ofthe restaurants is an objective and global reality that is common to all customers and is undeniablycrucial to a rational computation of the travel costs. On the other hand, it would be illogical ifan agent did not take into account her preferences. The weight assigned to the spatial distancesneed not be equal to the weight assigned to the preferences. A conservative approach could assigna far greater weight to the distances compared to the preferences. A more idiosyncratic approachwould deal with both on an equal footing by assigning equal weights to distances and preferences.It is plausible that for customer a the personal preferences may play a more prominent role than forcustomer a ′ , in which case we may allow for the possibility that, in the process of computing thecosts, each customer assigns completely diﬀerent weights. In any event, we regard each cost matrix C a as distinct, which, along with the uniqueness of each V a , explains why the resulting networks G a are all considered diﬀerent, that is every customer is confronted with her own unique and personalizedTSP . 9 H6 ) Each customer a ∈ A solves the corresponding TSP using an eﬃcient metaheuristic that outputs anear-optimal solution. In that manner, a computes a (near-optimal) tour T a = ( l , l , . . . , l n , l n +1 ).The tour is represented by the ordered list ( l , l , . . . , l n , l n +1 ), where l = l n +1 is the starting pointof a and l k , ≤ k ≤ n , is the index of the restaurant in the k th position of the tour. Endowed withtheir individual route T a , which is an integral part of their strategy, all customers follow a simple common strategy. From their starting location l they ﬁrst travel to the restaurant r l . Once there,those that get served conclude their route successfully. Those that do not get served, proceed to therestaurant r l . If their attempt at getting lunch also fails at r l , then they proceed to r l , and so on.Obviously, the time constraints, that is the fact that the agent must have returned to her staringpoint by the time the lunch break is over, means that the agent will not have the opportunity toexhaust the entire tour. The customer must interrupt the tour at some point in order to return.This may happen after travelling unsuccessfully to two, three, or more restaurants, depending on thetopology of the network.( H7 ) The Revision Strategy . We adopt the standard assumption that the agents operate independentlyand no communication takes place between any two of them. Therefore, each customer is completelyunaware of the routes of the other customers. They revise their strategy every evening takinginto account only what happened during the present day. This means that they decide using onlyinformation from the last day and no prior information or history need to be kept. We assume thatall agents follow the same policy. If they got served at a speciﬁc restaurant this day, then tomorrowthey go straight to the same restaurant. This applies even if this restaurant is not in the ﬁrst placeof their tour. For example, those agents that failed to get lunch at their ﬁrst choice, but managedto do so at their second, or third choice, tomorrow go straight the restaurant that served themdespite the fact that this particular restaurant is not their most preferable. Those that failed to getlunch, only know which restaurants were left vacant , i.e., not visited by any agent today. Further ormore elaborate information, such as the choices of other players or if they got served and at whichrestaurant, seems unnecessary. The unserved agents construct and solve their new personalized TSP,this time using as vertices only the vacant restaurants (plus of course their starting location).Having explained the details of the DKPRG, we shall proceed to analyze the mathematical character-istics and evaluate the resulting utilization of this policy in the following sections.

We begin this section by ﬁxing the notation and giving some deﬁnitions to clarify the most importantconcepts of our exposition.

Deﬁnition 4.1. • The one-shot DKPRG takes place every day. We use the parameter t = 1 , , . . . , to designate theday under consideration. • To each agent a ∈ A corresponds the personalized network G a = ( V a , E a ) together with the person-alized cost matrix C a , which are constructed in the way we outlined in the previous section. Agent a follows the tour T a = ( l , l , . . . , l n , l n +1 ) , which is the solution to her personalized TSP . As we haveemphasized, by using metaheuristics it is possible to obtain near-optimal solutions in a very shortamount of time. • The quality and eﬃciency of the strategy is measured by the utilization ratio f . This is of course thefraction of agents being served in a day, or, equivalently, the fraction of restaurants serving customersin a day. The equivalence is obvious because there are n customers and n restaurants. In section 5 we shall revisit the concept of utilization and we shall be more precise by asserting theexpected utilization per day as a function of the game parameters.An agent a who has opted to follow tour T a = ( l , l , . . . , l n , l n +1 ) will initially try to get lunch atrestaurant r l . If she succeeds, she will eat and then return to her starting point. If she fails, she will visitthe next restaurant in the tour, i.e., r l . If she gets lunch there, she will subsequently go back to work.This process will go on until either she gets served or runs out of time, in which case she must interruptthe tour and return to work. If the time constraints allow her to pass through the ﬁrst m restaurants inher tour, in the worst-case scenario of m − T a is an m - stop tour.10o facilitate our mathematical analysis we take for granted that all customers follow m -stop tours. Wehave already explained why, in our view, m must be ≥

2. The case where m = 1 reduces to the standardtreatment of the KPRP, which has already been analyzed extensively in the literature. In the rest of thiswork we study the case where m ≥

2. All these considerations motivate the next deﬁnition.

Deﬁnition 4.2. • The tour T a = ( l , l , . . . , l n , l n +1 ) associated with agent a is an m - stop tour, m ≥ , if, in the worstcase, agent a can visit restaurants l , l , . . . , l m in this order without violating her time constraints.In such a tour, l is the ﬁrst stop, l is the second stop, and so on, with l m being the ﬁnal m th stop. • If ∀ a ∈ A , T a is an m -stop tour, then the resulting game is the m -stop DKPRG. Let us now explore the spatial ramiﬁcations of our assumption that the restaurants are uniformlydistributed within the overall city area. We now give the formal deﬁnition of uniform distribution.

Deﬁnition 4.3.

Given a region B on the plane, a random variable L has uniform distribution on B , ifgiven any subregion C the following holds: P ( L ∈ C ) = area ( C ) area ( B ) , C ⊂ B . (1)

We assume of course that L takes values in B . The above deﬁnition is adapted from [53]. For a more general and sophisticated deﬁnition in terms ofmeasures we refer the interested reader to [54].

Proposition 4.1.

Assuming that the n restaurants are uniformly distributed on the whole city area, thenif the city area is partitioned into n regions of equal area, the expected number N p of restaurants in eachregion is exactly . N p = 1 , ≤ p ≤ n . (2) Proof.

Let B stand for the whole city area and let B , . . . , B n be the n regions. The hypotheses assertthat:1. B ∪ . . . ∪ B n = B ,2. B p ∩ B q = ∅ , if 1 ≤ p = q ≤ n , and3. area( B ) = area( B ) = . . . = area( B n ) = area( B ) n .Invoking the fact that the n restaurants are uniformly distributed on the whole city, we deduce from(1) that for every restaurant r j , ≤ j ≤ n , and for every region B p , ≤ p ≤ n , P ( r j ∈ B p ) = area( B p )area( B ) = 1 n , ≤ j, p ≤ n . ( 4.1.i)We may now deﬁne the following collection of auxiliary random variables N pj , where 1 ≤ p, j ≤ n . N pj = (cid:26) r j is located in region B p . ( 4.1.ii)By combining the result from ( 4.1.i ) with deﬁnition ( 4.1.ii ), we may conclude that N pj = (cid:26) n n − n , ≤ p, j ≤ n . ( 4.1.iii) If one wants to be overly technical, one should assume that both B and C are measurable sets. In the current setting,we believe that it is unnecessary to go to such a technical depth. B B B n − B n − B n • • •••• ••• •••• • • Uniformly Distributed Restaurants r r r r n − r n − r n Figure 4: Kolkata can be conceptually partitioned into n regions B , . . . , B n of equal area. If the n restaurants are uniformly distributed in the overall Kolkata area, then the expected number of restaurantsin each region B j , ≤ j ≤ n , is 1.Then, the random variable N p = n X j =1 N pj (1 ≤ p ≤ n ) ( 4.1.iv)gives the number of restaurants in region B p , 1 ≤ p ≤ n . We are not interested in the actual value ofthe random variable N p per se, but in its expected value E [ N p ]. The latter can be easily computed if weuse the above results and the linearity of the expected value operator. N p = E [ N p ] ( . .iv ) = E  n X j =1 N pj  = n X j =1 E [ N pj ] ( . .iii ) = n X j =1 (cid:18) · n (cid:19) = 1 ( 4.1.v)This establishes that the expected number of restaurants in each region is precisely 1 and proves formula(2).Partitioning a city area into n disjoint regions of equal area might not be an easy task. The pointis that for large values of n , as is the standard assumption in the literature, it is certainly doable. Westress the fact the shape of the regions need not be the same. Indeed, the validity of Proposition 4.1 holdsirrespective of whether the regions have the same shape or any particular shape for that matter.This topological layout of the restaurants is shown in Figures 4 and 5. In these Figures, the regions aredrawn are squares, but this is just for convenience and to facilitate their graphic depiction. As we haveexplained, the regions are not required to have the same shape and nor does their shape need to resemblea regular two dimensional ﬁgure. For very large values of n , partitioning a city into very small identicalsquares is a good approximation, as we know from the ﬁeld of image representation.It is useful to contrast the two Figures. The latter depicts the situation where the number of restaurantsis much larger compared to the number of restaurants in the former Figure. This demonstrates clearlywhat happens when n increases signiﬁcantly, i.e., when n → ∞ . Irrespective of the size of magnitude of n ,the expected number of restaurants in each of the n regions (recall that they are pairwise disjoint and ofequal area) remains 1. What does change however is the area of each region, which decreases with n and,as a consequence, the expected distance between restaurants located in adjacent regions.12 B B B n − B n − B n • • •••• ••• •••• • • Uniformly Distributed Restaurants r r r rn − rn − rn Figure 5: As n → ∞ , the expected number of restaurants in each region remains 1, but the expecteddistance between restaurants in neighbouring regions decreases. B B B B n − B n − B n · · ···· ··· ···· · · r r r rn − rn − rn d i a m B d i a m B d i a m B d i a m B n − d i a m B n − d i a m B n d ( r , r ) d ( r , r ) d ( r n − , r n − ) d ( r n − , r n ) Figure 6: This ﬁgure shows the expected distances between restaurants located in adjacent regions.13et us make the rather obvious observation that there is a meaningful notion of distance deﬁned between any two points , or locations if you prefer, in the entire city area. In reality, this can be the geographicaldistance between any two locations, expressed in meters or kilometers or in some other unit of length. Forinstance, let us consider two points x and y with spatial coordinates ( x , x ) and ( y , y ), respectively. Atypical manifestation of the notion of distance is the Euclidean distance: p ( x − x ) + ( y − y ) between x and y . In any event, we take for granted the existence of such a distance function deﬁned on every pairof points ( x, y ) of the city, which is denoted by d ( x, y ). Deﬁnition 4.4. • The distance between two regions B p and B q is deﬁned as d ( B p , B q ) = inf { d ( x, y ) : x ∈ B p and y ∈ B q } . (3) • Two regions B p and B q are adjacent if d ( B p , B q ) = 0 . (4) • We deﬁne the concept of diameter (see [55] for details) for the regions B p , ≤ p ≤ n . In particular,we deﬁne diamB p = sup { d ( x, y ) : x, y ∈ B p } . (5) Proposition 4.2.

Let the n restaurants be uniformly distributed on the city area and assume that thewhole area is partitioned into n regions of equal area. If r p and r q are the restaurants located at adjacentregions B p and B q respectively, where ≤ p = q ≤ n , then the distance d ( r p , r q ) between them is boundedabove by diamB p + diamB q : d ( r p , r q ) ≤ diamB p + diamB q , ≤ p = q ≤ n . (6) Proof.

Consider two adjacent regions B p and B q . By (4), this means that d ( B p , B q ) = 0, which inturn implies that ∀ ε ∃ x ∈ B p ∃ y ∈ B q such that d ( x, y ) ≤ ε ( ⋆ ). In view of Proposition 4.1, oneexpects to ﬁnd exactly one restaurant in B p and exactly one restaurant in B q . So, let r p and r q be therestaurants located at regions B p and B q , respectively, and consider the distance d ( r p , r q ) between them.By the triangle inequality, which is a fundamental property of every distance function, we may writethat d ( r p , r q ) ≤ d ( r p , x ) + d ( x, y ) + d ( y, r q ) , ∀ x ∈ B p ∀ y ∈ B q ( ⋆⋆ ). From ( ⋆ ) and ( ⋆⋆ ) we concludethat ∀ ε ∃ x ∈ B p ∃ y ∈ B q such that d ( r p , r q ) ≤ d ( r p , x ) + d ( y, r q ) + ε ( ⋆ ⋆ ⋆ ). Now, according to (5), d ( r p , x ) ≤ diamB p and d ( y, r q ) ≤ diamB q . These last two relations combined with ( ⋆ ⋆ ⋆ ), give that d ( r p , r q ) ≤ diamB p + diamB q , as desired.The above upper bound can be simpliﬁed if we further assume that all regions have the same geometricshape. This regularity does not impose any serious restriction on the overall setting of the game and allowsus to assert that diamB = . . . = diamB n = D , in which case inequality (6) becomes: d ( r p , r q ) ≤ D , ≤ p = q ≤ n . (7)In the special case where the regions are squares, as depicted in Figures 6 and 7, one can easily seethat the diameter D is proportional to q n : D ∝ r n . (8)A comparison between Figures 6 and 7 demonstrates that the expected distance between restaurantswhich lie in adjacent regions is quite short, as it is bounded above by the sum of the diameters of thecorresponding regions. The diameter of the regions decreases as n increases, and in the special case shownin these two Figures, the diameter decreases in proportion to √ n . In layman terms, this means that adjacent restaurants get very close to each other as n → ∞ . Once the agent arrives at a restaurant, then,14 B B Bn − Bn − Bn · · ···· ··· ···· · · r r r rn − rn − rn d i a m B d i a m B n d ( r , r ) d ( r , r ) d ( r n − , r n − ) d ( r n − , r n ) Figure 7: When n increases, the area and the diameter of the regions B , . . . , B n decrease. As a result theexpected distances between restaurants located in adjacent regions decrease . In other words, as n → ∞ ,the restaurants in adjacent regions get closer and closer.15 B B B n − B n − B n · · ···· ··· ···· · · r r r rn − rn − rn Agents Concentrated in a Small Region a an Figure 8: The above ﬁgure depicts the situation where all n agents are concentrated within a small regionof the Kolkata city, while the n restaurants are uniformly distributed in the overall Kolkata area.with high probability, visiting an adjacent restaurant will only incur a negligible extra cost that will notviolate her time constraints.We clarify that we are not making any assumption about the probabilistic distribution of the agents.One possibility is that the agents might be concentrated in the “center,” or in another speciﬁc location ofthe city area, as is tacitly assumed by the original KPRP. Another possibility is that the agents follow arandom distribution over the area, for instance they might also follow the uniform distribution. The formercase is depicted in Figure 8 and the latter in Figure 9. The crucial observation is that in both cases anyof the n agent can, potentially, have lunch in any of the n restaurants and return back in time. This factimplies that, assuming each agent follows the (near-optimal) tour produced as a solution to her individualTSP, she may visit a second, or even a third, restaurant if her previous choices proved fruitless. To seewhy this is indeed so, one may consider for instance agent a in both Figures 8 and 9 and the restaurantthat is furthest apart. Without loss of generality let us say that in both cases this is restaurant r n . Beingable to visit r n while adhering to her time constraints, implies being also able to pass through adjacentrestaurants within the same time window. The current section is devoted to the analytic estimation of the evolution of the game parameters and thedaily utilization of the proposed strategy scheme. Let us brieﬂy summarize the policy that regulates the m -DKPRG. • At the beginning of day 1 all n agents are in the same position, in that they have not got lunch yet,and they in a precarious state not knowing if they will manage to eat eventually. So, at this pointin time they are all unsatisﬁed . The situation with the restaurants is symmetrical. All n restaurantsface uncertainty in that it is yet unknown whether they will be chosen by at least one customer.Therefore, at this point they are all vacant . • The situation is quite diﬀerent at the end of day 1. A signiﬁcant percentage of the n agents, aswill be shown in this section, managed to get lunch. An equal percentage of the n restaurants wasutilized. The common strategy followed by all agents ensures that the same agents will get lunchnext day, the day after the next, etc. These agents are satisﬁed , since they have eﬀectively “won”the game. Symmetrically, the same restaurants will be utilized every day from now on. They will bepermanently reserved . 16 B B B n − B n − B n · · ···· ··· ···· · · r r r rn − rn − rn Agents Uniformly Distributed a a a an − an − an Figure 9: This ﬁgure reﬂects the situation where both the n restaurants and the n agents are uniformlydistributed in the overall Kolkata area. • At the beginning of day 2, only those agents that failed to eat yesterday will essentially play thegame. These will the active players of day 2. The active players will strive to get lunch exclusivelyto the restaurants that did not serve any customer yesterday. The rest of the agents are alreadysatisﬁed and will certainly have lunch today, each one at the speciﬁc restaurant that (eventually)served her yesterday. • By the end of day 2, a signiﬁcant percentage of the active agents will have succeeded in getting lunch.Thus, the total number of satisﬁed agents will increase by the amount of today’s gains . Of course,an equal percentage of yesterday’s vacant restaurants will also be utilized for the ﬁrst time today. • This process will continue ad inﬁnitum.The next concepts will prove useful in our analysis.

Deﬁnition 5.1. • The expected number of agents that managed to eat lunch during day is denoted by A s and theexpected number of agents that failed to eat lunch during day is denoted by A u . • The expected number of agents that got lunch for the ﬁrst time during day t, t = 2 , , . . . , is denotedby A st . The expected number of agents that failed to get lunch during day t, t = 2 , , . . . , is denotedby A ut . • Symmetrically, the expected number of restaurants that served lunch during day is denoted by R r and the expected number of restaurants that did not serve lunch during day is denoted by R v . • The expected number of restaurants that served a customer for the ﬁrst time during day t, t = 2 , , . . . ,is denoted by R rt . The expected number of agents that failed to serve lunch during day t, t = 2 , , . . . ,is denoted by R vt . • The vacancy probability of day is the probability that a restaurant did not accommodate any cus-tomer during day and is designated by V P . • The vacancy probability of day t, t = 2 , , . . . , designated by V P t , is the probability that a restaurantthat has not served any customer before day t did not serve a customer during day t either. In the m -stop DKPRG, only the customers that have yet to get lunch participate actively in today’sgame. The agents that actually play the game at the beginning of day t , seeking a restaurant to getlunch, are called active players and their expected number is denoted by n t . • The expected utilization of day t, t = 1 , , . . . , denoted by f t , is the fraction of the expected number ofagents that were served during day t . The steady state utilization is deﬁned as f ∞ = sup { f t : t ∈ N } . Equiprobability of tours.

The following analysis is based on the premise that all n ! tours are equiprobable . In the rest of this paper we shall refer to this assumption as the equiprobability of toursassumption (EPT for short). In view of the discussion in the previous sections, this premise is welljustiﬁed.An immediate consequence of the EPT assumption is the equiprobability of each restaurant appearingin any position. In particular, let us recall that in the tour T a = ( l , l , . . . , l n , l n +1 ), corresponding toagent a , l = l n +1 is the starting point of a and l k , ≤ k ≤ n , is the index of the restaurant in the k th position of the tour. We may easily calculate the probability that a restaurant is in a speciﬁc positionof the tour, as well as the probability of the complementary event. For easy reference, these facts arecollected in the next Proposition whose proof is trivial and thus omitted. Proposition 5.1.

Assuming the equiprobability of tours, the following hold. ∀ a ∈ A ∀ r ∈ R ∀ k, ≤ k ≤ n, P ( r is in position k of T a ) = 1 n (9) ∀ a ∈ A ∀ r ∈ R ∀ k, ≤ k ≤ n, P ( r not in position k of T a ) = n − n (10) The above can be generalized to handle the case of a restaurant r appearing in one of w distinct positions k , k , . . . , k w , where < w ≤ n . ∀ a ∈ A ∀ r ∈ R P ( r is in one of positions k , . . . , k w of T a ) = wn (11) ∀ a ∈ A ∀ r ∈ R P ( r not in any of positions k , . . . , k w of T a ) = n − wn (12)We only mention that the above hold for every restaurant, every position, and, of course, for everytour. Since the probability that restaurant r ∈ R is in the k th position of the tour of agent a is n , theprobability of the complementary event, i.e., that restaurant r is not in the k th position of T a is n − n . Ifwe deem as “success” the case where r is indeed in the k th position of T a and as “failure” the case where r is not, then this situation is a typical example of a Bernoulli trial , having probability of success n (alsoreferred to as parameter , see [56]) and probability of failure n − n . In view of (9) we denote this as P ( r is in position k of T a ) ∼ Ber ( 1 n ) , ∀ a ∈ A ∀ r ∈ R ∀ k, ≤ k ≤ n . (13)Analogously, the probability that restaurant r ∈ R appears in one of w, < w ≤ n , distinct positionsof the tour of agent a is wn . The probability of the complementary event, i.e., that restaurant r is not inany one of these w positions of T a is n − wn . This time, one may view as “success” the case where r is indeedin one of the designated w positions of T a and as “failure” the case where r is not. So, once again we arefacing with a Bernoulli trial, this time with parameter wn . P ( r is in one of positions k , . . . , k w of T a ) ∼ Ber ( wn ) , ∀ a ∈ A ∀ r ∈ R . (14)The fact that the n agents calculate their tours independently , implies that n independent Bernoullitrials take place simultaneously, all with the same success and failure probabilities. This situation isdescribed by the binomial distribution with parameters ( n, p ) , denoted by Bin ( n, p ), where p = n inthe simple case of one position and p = wn in the general case of w positions. By employing well-knownformulas from probability textbooks we may assert the following Proposition, whose proof is also trivial. We refer the reader to [56] and [53] for a more detailed analysis. roposition 5.2. Given a restaurant r , if its appearance in a speciﬁed position k in one tour counts asone success, whereas its failure to appear in the speciﬁed position k in one tour counts as one failure, thenthe probability of exactly l appearances in position k in total is given by ∀ r ∈ R ∀ k, ≤ k ≤ n, P ( r appears l times in position k in n tours ) = (cid:18) nl (cid:19) (cid:18) n (cid:19) l (cid:18) n − n (cid:19) n − l . (15) In the special case, where r never appears, that is it appears times, in the speciﬁed position k , the aboveformula becomes: ∀ r ∈ R ∀ k, ≤ k ≤ n, P ( r never appears in position k in n tours )= (cid:18) n (cid:19) (cid:18) n (cid:19) (cid:18) n − n (cid:19) n = (cid:18) n − n (cid:19) n . (16) More generally, the probability that restaurant r appears exactly l times in total in one of the w distinctpositions k , . . . , k w , < w ≤ n is given by ∀ r ∈ R P ( r appears l times in one of positions k , . . . , k w in n tours ) = (cid:18) nl (cid:19) (cid:16) wn (cid:17) l (cid:18) n − wn (cid:19) n − l . (17) If r never appears, that is it appears times, in anyone of the w designated positions k , . . . , k w , < w ≤ n ,the previous formula reduces to: ∀ r ∈ R P ( r never appears in any of positions k , . . . , k w in n tours )= (cid:18) n (cid:19) (cid:16) wn (cid:17) (cid:18) n − wn (cid:19) n = (cid:18) n − wn (cid:19) n . (18)We must emphasize that the above hold for every restaurant r ∈ R , for every position k, ≤ k ≤ n , andfor every set of positions { k , . . . , k w } , < w ≤ n . In other words, for every restaurant, the probabilitythat it does not appear in one speciﬁc position in any of the n tours is (cid:0) n − n (cid:1) n , and the probability thatit does not appear in any of w distinct positions in any of the n tours is (cid:0) n − wn (cid:1) n .According to the strategy scheme employed in the m -stop DKPRG, at the start of the second (third,etc.) day, the satisﬁed customers always go straight to the restaurant that eventually served them the pre-vious day. We stress the word eventually because an agent may have failed to get lunch during stop 1 of theprevious day, but she may have succeeded during the second, third, or m th stop. This strategy is followedby all agents, something that guarantees that those customers that were satisﬁed on the previous day willremain satisﬁed today. Eﬀectively, this strategy implies that the satisﬁed agents have “won” the game andfrom now on they do not need to solve their personalized TSP. The game will be played competitively bythe unsatisﬁed agents of the previous day. We assume that they are aware of the unoccupied restaurantsand, therefore, each one of them will once again solve her personalized TSP to compute her near-optimaltour. Of course, today the network of restaurants will consist of only the unoccupied restaurants, i.e., itwill be signiﬁcantly smaller that yesterday. The one-shot m -stop DKPRG of today will be diﬀerent fromthe one-shot game of the previous day in a critical factor: the number of “actively competing” playerswill be signiﬁcantly smaller. By the nature of the game, the number of active players at the beginning ofstop 1 of the present day is equal to the number of unsatisﬁed customers at the end of the previous day.The way the expected number of active players varies with each passing day is captured by the followingTheorem 5.1. Theorem 5.1.

The daily progression of the m -DKPRG is described by the following formulas, where t tands for the day in question. V P t = (cid:18) n t − mn t (cid:19) n t , n t ≥ m, t = 1 , , . . . , (19) R vt = n t (cid:18) n t − mn t (cid:19) n t , n t ≥ m, t = 1 , , . . . , (20) R rt = n t (cid:18) − (cid:18) n t − mn t (cid:19) n t (cid:19) , n t ≥ m, t = 1 , , . . . , (21) A ut = n t (cid:18) n t − mn t (cid:19) n t , n t ≥ m, t = 1 , , . . . , (22) A st = n t (cid:18) − (cid:18) n t − mn t (cid:19) n t (cid:19) , n t ≥ m, t = 1 , , . . . , (23) n = n , (24) n t +1 = n t (cid:18) n t − mn t (cid:19) n t , n t ≥ m, t = 2 , , . . . , (25) f t = P td =1 n d (cid:16) − (cid:16) n d − mn d (cid:17) n d (cid:17) n = n − n t (cid:16) n t − mn t (cid:17) n t n , n t ≥ m, t = 1 , , . . . . (26) Proof.

The proof of the above formulas goes as follows.1. We ﬁrst prove the auxiliary result that the vacancy probability at the beginning of stop z, ≤ z ≤ m ,of day t is V P t,z = (cid:18) n t + 1 − zn t (cid:19) n t , ≤ z ≤ m . (27) • Indeed, at the beginning of stop 1 of day t , the expected number of restaurants that have notserved any customer yet is equal to the expected number n t of active agents. On day t , thegame is all about the active agents and the restaurants that have never been utilized up to now.At this moment in time all these restaurants are still unutilized, so vacancy is a certainty. Thus,indeed V P t, = 1, which is in agreement with (27) when z = 1. • We recall that, according to our strategy, at the beginning of day t the expected number ofrestaurants that have not served any customer yet is equal to the expected number n t of activeplayers. At the beginning of stop 2 of day t , the probability that one of these restaurants r hasnot served lunch yet is precisely the probability that r never appears in position 1 in any tourof the active players. This last probability is given from (16), where of course n must now bereplaced by n t . Hence, V P t, = (cid:18) n t − n t (cid:19) n t , ( 5.1.i)which is also in agreement with (27) when z = 2. • Let us now carefully examine what happens during stop 2 of day t of the game. Accordingto our scheme, those customers who have failed to get lunch at their ﬁrst destination willimmediately proceed to their second destination. For example, if customer a , who follows tour T a = ( l , l , . . . , l n , l n +1 ), was not served at restaurant r l , she will try restaurant r l . However,an added complication arises now. It may well be the case that r l is already occupied fromstop 1. In such a case r l is completely unavailable, i.e., it is now serving another active agent.In view of this fact, we may conclude that the restaurants that are vacant at the beginning ofstop 3 must satisfy two properties:( P1 ) they must be vacant at the beginning of stop 2, which means that must never appear inposition 1 in any tour of the active players, and( P2 ) they must never appear in position 2 in any tour of the active players.The above are summarized more succinctly in the following rule.( C ) The restaurants that have not served any customer up to day t and are still vacant at thebeginning of stop 3 of day t , never appear in position 1 or position 2 in any tour of theactive players. 20herefore, V P t, = P ( r never appears in positions 1 or 2 in n t tours) ( ) = (cid:18) n t − n t (cid:19) n t , ( 5.1.ii)which is again in agreement with (27) when z = 3. • The same reasoning can be employed to show that the vacancy probability

V P t,z at the begin-ning of stop z of day t is V P t,z = P ( r never appears in positions 1 , . . . , z − ( ) = (cid:18) n t + 1 − zn t (cid:19) n t . ( 5.1.iii)Hence, we have proved the validity of (27). • Finally, to calculate the probability that one of the restaurants r that have not served anycustomer up to day t is still vacant at the end of stop m of day t , which in eﬀect means at theend of day t , we must determine the probability that r never appears in positions 1, or, 2, or . . . , or m in any tour of the active players. Thus, V P t = P ( r never appears in positions 1 , . . . , m ) ( ) = (cid:18) n t − mn t (cid:19) n t , ( 5.1.iv)which veriﬁes (19), as desired. However, there is one ﬁnal detail that we must emphasize here.Probabilities are real numbers taking values in the real line interval [0 , n t ≥ m , otherwise it cannot be regarded as a probability. Thephysical meaning of this restriction is that (19) is meaningful and correct as long as there areat least as many active players as stops m . If on some day t we have that n t ≤ m , then thestrategy we adhere to will make sure that all n t active players will manage to get lunch duringday t .2. Let us clarify that our sample space consists precisely of the restaurants that have not served anycustomer up to day t . The expected number of restaurants in our sample space that remained vacantat the end of day t is given by R vt . First, we express probabilistically those restaurants of our samplespace that remain vacant after all agents visit their ﬁrst m destinations. We deﬁne the family ofrandom variables R vtj , ≤ j ≤ n t . The random variable R vtj indicates whether restaurant r j isvacant or not at the end of day t . Speciﬁcally, if R vtj has the value 1, then restaurant r j is vacant atthe end of day t , whereas if R vtj is 0, then r j is occupied. R vtj = (cid:26) r j is vacant at the end of day t , ≤ j ≤ n t . ( 5.1.v)By combining deﬁnition ( 5.1.v ) and equation (19), we deduce that R vtj =  (cid:16) n t − mn t (cid:17) n t − (cid:16) n t − mn t (cid:17) n t , ≤ j ≤ n t . ( 5.1.vi)Having done that, we deﬁne the random variable R vt , which counts the the number of restaurantsthat are vacant at the end of day t . R vt = n t X j =1 R vtj . ( 5.1.vii)As always, in this probabilistic setting, we are interested not in the actual value of the randomvariable R vt , but in its expected value E [ R vt ]. In view of deﬁnition ( 5.1.vii ) and the linearity of theexpected value operator, we derive that R vt = E [ R vt ] ( . .vii ) = E  n t X j =1 R vtj  = n t X j =1 E (cid:2) R vtj (cid:3) ( . .vi ) = n t X j =1 (cid:18) · (cid:18) n t − mn t (cid:19) n t (cid:19) = n t (cid:18) n t − mn t (cid:19) n t , ( 5.1.viii)which veriﬁes (20). 21. Recall that our sample space contains exactly those restaurants that have not served any customerup to day t . R rt denotes the expected number of the restaurants of the sample space that were visitedby an agent by the end of day t . Now, we deﬁne the family of random variables R rtj , ≤ j ≤ n t ,which indicate whether restaurant r j is occupied or not at the end of day t . Speciﬁcally, if R rtj hasthe value 1, then restaurant r j is occupied at the end of day t , whereas if R rtj is 0, then r j is vacant. R rtj = (cid:26) r j is occupied at the end of day t , ≤ j ≤ n t . ( 5.1.ix)By combining deﬁnition ( 5.1.ix ) and equation (19), we deduce that R rtj =  − (cid:16) n t − mn t (cid:17) n t (cid:16) n t − mn t (cid:17) n t , ≤ j ≤ n t . ( 5.1.x)Having done that, we deﬁne the random variable R rt , which counts the the number of restaurantsthat are occupied at the end of day t . R rt = n t X j =1 R rtj . ( 5.1.xi)We are not interested in the actual value of the random variable R rt , but in its expected value E [ R rt ].In view of deﬁnition ( 5.1.xi ) and the linearity of the expected value operator, we derive that R rt = E [ R rt ] ( . .xi ) = E  n t X j =1 R rtj  = n t X j =1 E (cid:2) R rtj (cid:3) ( . .x ) = n t X j =1 · (cid:18) − (cid:18) n t − mn t (cid:19) n t (cid:19) = n t (cid:18) − (cid:18) n t − mn t (cid:19) n t (cid:19) , ( 5.1.xii)which veriﬁes (21).4. The rules of the game stipulate that the number of customers that have not managed to eat lunch atthe end of day t is equal to the number of restaurants that have not served any customer at the endof day t . Hence, their expected values are also equal, which means that A ut = R vt and (22) is proved.5. Likewise, the adopted strategy ensures that the number of the active players that succeeded in gettinglunch at the end of day t is equal to the number of restaurants that, although they had not servedany agent up to day t , they managed to accommodate a customer by the end of day t . Hence, theirexpected values are also equal, which means that A st = R rt and (23) is proved.6. We are now in a position that enables us to assert the expected number of active players. • At the beginning of the ﬁrst day, the numbers of active players is exactly n . This trivialobservation conﬁrms the initial condition (24). • As we have previously explained, the adopted strategy in the m -DKPRG ensures that thenumber of agents that have not got lunch at the end of day t is always equal to the number ofactive players on day t + 1. Thus, the expected number of active agents on day t + 1 is equalto the expected number of unsatisﬁed agents at the end of day t : n t +1 = A ut . ( 5.1.xiii)By combining the previous result ( 5.1.xiii ) with (22), we derive n t +1 = n t (cid:18) n t − mn t (cid:19) n t , ( 5.1.xiv)which establishes the validity of (25), as desired.22. The expected utilization f t for day t = 1 , , . . . , is the ratio of the expected number of agents thatwere served during day t . This last numbers is equal to the expected number of customers that gotlunch on day 1, plus the expected number of the additional customers that got lunch on day 2, andso on. The additional agents of day t are precisely those agents that had failed to get lunch prior today t , but succeeded in eating on day t . Their expected number is A st , which is given by equation(23). Hence, the total number of agents that have eaten lunch up to and including day t is given by t X d =1 A sd ( ) = t X d =1 n d (cid:18) − (cid:18) n d − mn d (cid:19) n d (cid:19) , t = 1 , , . . . . ( 5.1.xv)An equivalent way to compute this exact number is by subtracting from the total number of agents n the expected number of agents that failed to get lunch on day t , which is A ut , which is given byequation (22). Thus, n − A ut ( ) = n − n t (cid:18) n t − mn t (cid:19) n t , t = 1 , , . . . . ( 5.1.xvi)Together the two formulas ( 5.1.xv ) and ( 5.1.xvi ), allow us to conclude that f t ( . .xv ) = P td =1 n d (cid:16) − (cid:16) n d − mn d (cid:17) n d (cid:17) n ( . .xvi ) = n − n t (cid:16) n t − mn t (cid:17) n t n , t = 1 , , . . . , ( 5.1.xvii)which establishes the validity of (26), as desired.Let us now make an important observation: formula (20) that we derived above, and which gives theexpected number of vacant restaurants at the end of day t , is completely general and subsumes morespecial formulas found in the literature. Take for example the special case where t = 1 and m = 1. Forthese values, (20) computes the expected number of vacant restaurants at the end of day 1 for the standardone-stop KPRP. By subtracting this quantity from n , the number of initially available restaurants, andthen dividing by n , we derive the expected utilization ratio for day 1. Indeed f = n − n (cid:0) n − n (cid:1) n n = 1 − (cid:18) n − n (cid:19) n . (28)One assumption that is taken for granted in the literature is that the number of agents n tends toinﬁnity. It is straightforward to see how the above formula simpliﬁes when n → ∞ . We recall a very usefulfact from calculus (see for instance [57]), namely that ∀ x ∈ R , lim n →∞ (cid:16) xn (cid:17) n → e x . (29)Under this premise, we see that lim n →∞ f → − e − , which is in complete agreement with a well-knownresult of the literature. Corollary 5.1.

If we assume that n → ∞ , then the following approximations hold, where t is the day inquestion. n t +1 ≈ ne − tm , t = 1 , , . . . , (30) A ut ≈ ne − tm , t = 1 , , . . . , (31) A st ≈ n (cid:0) − e − m (cid:1) e − ( t − m = ne − ( t − m − ne − tm , t = 1 , , . . . , (32) R vt ≈ ne − tm , t = 1 , , . . . , (33) R rt ≈ ne − ( t − m − ne − tm , t = 1 , , . . . , (34) V P t ≈ e − tm , t = 1 , , . . . , (35) f t ≈ − e − tm , t = 1 , , . . . . (36) f ∞ ≈ . (37)23 roof. The above approximations are easily proved as shown below.1. If we invoke property (29), we deduce that (cid:0) n − mn (cid:1) n −−−−→ n →∞ e − m and (cid:16) n t − mn t (cid:17) n t −−−−→ n t →∞ e − m . When n and n t do not take very large values, these limits can only serve as good approximations. Thus, itis more accurate to write (cid:18) n − mn (cid:19) n ≈ e − m and (cid:18) n t − mn t (cid:19) n t ≈ e − m . ( 5.1.i)Formulas (25) and (24) imply that n = n (cid:18) n − mn (cid:19) n ( . .i ) ≈ ne − m . ( 5.1.ii)In an identical manner, we see that n = n (cid:18) n − mn (cid:19) n ( . .i ) ≈ n e − m ( . .ii ) ≈ ne − m e − m = ne − m . ( 5.1.iii)Following this line of thought, it is now routine to see that (30) holds.2. The proof is trivial because the number of customers that have not eaten lunch by the end of day t is equal to the number of active agents at the beginning of day t + 1.3. Considering the fact that (cid:16) n t − mn t (cid:17) n t −−−−→ n t →∞ e − m , we may deduce that 1 − (cid:16) n t − mn t (cid:17) n t −−−−→ n t →∞ − e − m .In this way we have derived the next approximation.1 − (cid:18) n t − mn t (cid:19) n t ≈ − e − m . ( 5.1.iv)It follows from (30) that n t ≈ ne − ( t − m . ( 5.1.v)Together ( 5.1.iv ) and ( 5.1.v ) imply that A st = n t (cid:18) − (cid:18) n t − mn t (cid:19) n t (cid:19) ( . .v ) , ( . .iv ) ≈ n (cid:0) − e − m (cid:1) e − ( t − m = ne − ( t − m − ne − ( t − m e − m = ne − ( t − m − ne − tm , ( 5.1.vi)which proves equation (26), as desired.4. Again, this result is obvious because the number of vacant restaurants at the end of day t is equalto the number of unsatisﬁed customers at the end of day t .5. The number of restaurants that served a customer for the ﬁrst time during day t is equal to thenumber of agents that managed to get lunch for the ﬁrst time during day t . Hence, the result isimmediate.6. This can be easily shown by substituting in formula (19) the approximation ( 5.1.i ).7. A good approximation for the total number of restaurants that were utilized during day t can befound by subtracting from n the approximate expected number of customers that did not have lunchon day t . In view of (33), this implies that f t ( ) ≈ n − ne − tm n = 1 − e − tm , t = 1 , , . . . . ( 5.1.vii)Alternatively, one may use the expected number of agents that ate lunch on day t , which can becomputed as the sum P td =1 A sd . Using the relation (32), this sum can be written as t X d =1 n (cid:0) − e − m (cid:1) e − ( d − m = n (cid:0) − e − m (cid:1) t X d =1 e − ( d − m = n (cid:0) − e − m (cid:1) − e − tm − e m = n (cid:0) − e − tm (cid:1) , t = 1 , , . . . . ( 5.1.viii)24n the above derivation, we used a well-know fact (see for instance [58]), namely that the sum ofthe ﬁrst t terms of a geometric sequence with ﬁrst term g and ration ρ is given by the formula g + g ρ + g ρ + · · · + g ρ t − = g − ρ t − ρ . This second way to estimate the expected utilizationconﬁrms, as expected, that f t = t X d =1 A sd ( . .viii ) ≈ n (1 − e − tm ) n = 1 − e − tm , t = 1 , , . . . , ( 5.1.ix)which proves equation (36), as desired.8. A trivial consequence of (36), as t → ∞ .To demonstrate how the exact formulas (19) - (26) reﬂect the daily evolution of the m -DKPRG westudy ﬁve typical instances of the game. The ﬁrst four are instances of 2-DKPRG games with substantiallydiﬀerent number of players. In the ﬁrst four games, the number of steps m is 2, meaning that each agentmay visit two restaurants if the need arises. In the ﬁrst example the number of agents n is 100, a relativelysmall number, and its detailed progression is shown in Table 1. The steady state utilization is, as expected,1 and it is achieved by the end of day 3.Number of stops m n t ) n t V P t A st = R rt A ut = R vt f t Start of day 1 100 1 0 100 0End of day 1 13.26196 0.13262 86.738 13.262 0.86738Start of day 2 13.26196 1 0 13.262 0.86738End of day 2 1.51737 0.11442 11.74453 1.51743 0.98483Start of day 3 1.51737 1 0 1.51743 0.98483End of day 3 0 0 1.51737 0 1Table 1: This Table demonstrates the progression of the m -DKPRG for m = 2 and n = 100. It can beseen that all restaurants are utilized by the end of day 3.In the second example the number of agents n is 1000 and its progression is shown in Table 2. Thesteady state utilization is, as expected, 1 and it is achieved by the end of day 5, i.e., 2 days later comparedto the previous example.Number of stops m n t ) n t V P t A st = R rt A ut = R vt f t Start of day 1 1000 1 0 1000 0End of day 1 135.06452 0.13506 864.94 135.06 0.86494Start of day 2 135.06452 1 0 135.06 0.86494End of day 2 18.00766 0.13333 117.05637 18.00815 0.98199Start of day 3 18.00766 1 0 18.00815 0.98199End of day 3 2.1614 0.12003 15.8462 2.16146 0.99784Start of day 4 2.1614 1 0 2.16146 0.99784End of day 4 0.00793 0.00367 2.15347 0.00793 0.99999Start of day 5 0.00793 1 0 0.00793 0.99999End of day 5 0 0 0.00793 0 1Table 2: This Table demonstrates the progression of the m -DKPRG for m = 2 and n = 1000. One mayascertain that all customers eat lunch by the end of day 5.The third example is more meaningful and interesting because in this case the number n of agents is10 , which may be thought of as representing the average case. Table 3 contains the analytical evolutionof this instance. Even when confronted with a signiﬁcant number of agents, the steady state utilization 125umber of stops m n t ) n t V P t A st = R rt A ut = R vt f t Start of day 1 1000000 1 0 1000000 0End of day 1 135335.01256 0.13534 864660 135340 0.86466Start of day 2 135335.01256 1 0 135340 0.86466End of day 2 18315.33159 0.13533 117020.12531 18314.88725 0.98169Start of day 3 18315.33159 1 0 18314.88725 0.98169End of day 3 2478.43991 0.13532 15836.90092 2478.43067 0.99752Start of day 4 2478.43991 1 0 2478.43067 0.99752End of day 4 335.14966 0.13523 2143.28048 335.15943 0.99966Start of day 5 335.14966 1 0 335.15943 0.99966End of day 5 45.08663 0.13453 290.06198 45.08768 0.99995Start of day 6 45.08663 1 0 45.08768 0.99995End of day 6 5.82914 0.12929 39.25738 5.82925 0.99999Start of day 7 5.82914 1 0 5.82925 0.99999End of day 7 0.50323 0.08633 5.32591 0.50323 1Start of day 8 0.50323 1 0 0.50323 1End of day 8 0 0 0.50323 0 1Table 3: This Table demonstrates the progression of the m -DKPRG for m = 2 and n = 1000000. Althoughin this case it takes longer to reach the steady state, utilization upwards of 0 .

98 is established from day 2.is rapidly achieved by the end of day 8. Although it takes longer to reach that stage, utilization upwardsof 0 .

98 is established from day 2.The fourth example is instructive about the behavior of our strategy when a large number of agentsis involved. In this case the number n of agents is 10 and, unsurprisingly, it takes 11 days to reach thesteady state utilization 1. All the details of the progression of this game are given in Table 4. Carefulobservation of the data conﬁrms a major characteristic of our distributed game: for m = 2 steps the ﬁrstday utilization is at least 0 .

86 and it goes over 0 .

98 from day 2.It is quite straightforward to convince ourselves that playing a 3-stop game is better than playing a a2-stop game. A precise quantitative analysis of the resulting advantages can be performed by consideringthe exact formulas (19) - (26). Nonetheless, we believe it is expedient to showcase the diﬀerence with thefollowing example. The present example resembles the previous one in that the number of agents is thesame, namely 10 . However, this time each agent may visit up to three restaurants if need be. Such aninstance, with a large number of agents, can serve as the best demonstration of the dramatic improvementthat can be obtained by an increase in the number of steps. Indeed, the data in the Table 5 corroboratethis expectation, as one can now see that all restaurants are utilized by the end of day 8, compared to day11 before, the utilization at the end of ﬁrst day is already up to an impressive 0 .

95 and becomes 1, for allpractical purposes, at the end of day 6. This last example can be considered as a compelling argumentthat advocates the importance of topological analysis for the network of restaurants.The above examples were studied using the exact formulas (19) - (26). Tables 1 - 5 reﬂect the dailyevolution of the above ﬁve instances of the m -DKPRG according to rigorous mathematical descriptionprovided by formulas (19) - (26). The next Figure 10 is a graphical representation of the exact utilization f t from all the previous examples, as shown in the Tables 1 - 5. In this Figure, the generally excellentbehavior of this scheme can be easily veriﬁed. We point out the rapid convergence to the steady statein a matter of few days and, especially, the superiority of the three stop policy. The latter achieves 0 . .

99 from the second day.The above remarks must not diminish the value of the approximate formulas (30) - (36). Their valuelies on the fact that they can provide easy to compute and particularly good approximations for large n .A simple comparison of Figure 10 to the approximations shown in Figure 11, which corresponds to thecase m = 2, and in Figure 12, which depicts the case where m = 3, ascertains their accuracy. This work explored a completely new angle of the Kolkata Paise Restaurant Problem. The topologicallayout of the restaurants takes center stage in this new paradigm. Initially, we explicitly stated certain26umber of stops m n t ) n t V P t A st = R rt A ut = R vt f t Start of day 1 1000000000 1 0 1000000000 0End of day 1 135335281.88326 0.13534 864660000 135340000 0.86466Start of day 2 135335281.88326 1 0 135340000 0.86466End of day 2 18315638.40724 0.13534 117019004.83318 18316277.05008 0.98168Start of day 3 18315638.40724 1 0 18316277.05008 0.98168End of day 3 2478751.84042 0.13534 15836799.9052 2478838.50204 0.99752Start of day 4 2478751.84042 1 0 2478838.50204 0.99752End of day 4 335462.31172 0.13534 2143277.56634 335474.27408 0.99966Start of day 5 335462.31172 1 0 335474.27408 0.99966End of day 5 45399.6163 0.13533 290064.19707 45398.11465 0.99995Start of day 6 45399.6163 1 0 45398.11465 0.99995End of day 6 6143.89926 0.13533 39255.68623 6143.93007 0.99999Start of day 7 6143.89926 1 0 6143.93007 0.99999End of day 7 831.21566 0.13529 5312.69113 831.20813 1Start of day 8 831.21566 1 0 831.20813 1End of day 8 112.22203 0.13501 718.99323 112.22243 1Start of day 9 112.22203 1 0 112.22243 1End of day 9 14.91613 0.13292 97.30548 14.91655 1Start of day 10 14.91613 1 0 14.91655 1End of day 10 1.74198 0.11679 13.17408 1.74205 1Start of day 11 1.74198 1 0 1.74205 1End of day 11 0 0 1.74198 0 1Table 4: This Table demonstrates the progression of the m -DKPRG for m = 2 and n = 1000000000. Onecan see that the ﬁrst day utilization is at least 0 .

86 and it goes over 0 .

98 from day 2.Number of stops m n t ) n t V P t A st = R rt A ut = R vt f t Start of day 1 1000000000 1 0 1000000000 0End of day 1 49787067.67084 0.04979 950210000 49790000 0.95021Start of day 2 49787067.67084 1 0 49790000 0.95021End of day 2 2478751.91627 0.04979 47308169.57151 2478898.09933 0.99752Start of day 3 2478751.91627 1 0 2478898.09933 0.99752End of day 3 123409.56708 0.04979 2355334.85836 123417.05791 0.99988Start of day 4 123409.56708 1 0 123417.05791 0.99988End of day 4 6143.97651 0.04979 117265.00474 6144.56234 0.99999Start of day 5 6143.97651 1 0 6144.56234 0.99999End of day 5 305.66655 0.04975 5838.31368 305.66283 1Start of day 6 305.66655 1 0 305.66283 1End of day 6 14.99439 0.04905 290.67361 14.99294 1Start of day 7 14.99439 1 0 14.99294 1End of day 7 0.52749 0.03518 14.46689 0.5275 1Start of day 8 0.52749 1 0 0.5275 1End of day 8 0 0 0.52749 0 1Table 5: This Table demonstrates the progression of the m -DKPRG for m = 3 and n = 1000000000.27 day day day day day day day day day day day . . . . . . . . . . . . . Days t = 1 , , . . . E x p e c t e du t ili z a t i o n f t Agents Agents Agents Agents Agents

Figure 10: This ﬁgure depicts the exact expected utilization f t , as given by equation (26), for all ﬁveinstances of the m -DKPRG studied in Tables 1 - 5. day day day day day day day day day day . . . . . . . . . . . . . . . . . . . . . . Days t = 1 , , . . . N u m b e r o f s t o p s m = f t ≈ − e − t Figure 11: This ﬁgure depicts the approximate expected utilization f t ≈ − e − t as given by equation(36) for m = 2. 28 day day day day day day day day day day . . . . . . . . . . . . . . . Days t = 1 , , . . . N u m b e r o f s t o p s m = f t ≈ − e − t Figure 12: This ﬁgure depicts the approximate expected utilization f t ≈ − e − t as given by equation(36) for m = 3.assumptions that are implicitly present in the standard formulation of the game. Having done that, weundertook the radical step to go past them and create an entirely new setting. The critical examinationof the topological setting of the game unavoidably enhanced our perception regarding the locations of therestaurants and suggested a more realistic topological layout. We argued that their uniform distribution inthe entire city area is the most logical, fair, and probable situation. As a result, we deﬁned a new versionof the game that is spatially distributed and, for this, is is aptly named the Distributed Kolkate PaiseRestaurant Game (DKPRG).The uniform probabilistic distribution of the restaurants enabled us to rigorously prove that, as theirnumber n increases, the restaurants get closer and the distance between adjacent restaurants decreases. Insuch a network, every customer has the opportunity to pass through more than one restaurants within theallowed time window. The agents now become travelling salesmen and this led us to suggest the innovativeidea that TSP can be used to increase the chances of success in this game. We propose that each agentshould use metaheuristics to solve her personalized TSP because metaheuristics produce near-optimalsolutions very fast and as such can be easily used in practice. This culminated in the development of anew and more eﬃcient strategy that achieves greater utilization.After rigorously formulating DKPRG, we proved completely general formulas that assert the increasein utilization of our scheme. We established that utilization ranging from 0 .

85 to 0 .

95 is achievable. Thiswas shown in great detail in Tables 1 - 5, which depict the daily progress of characteristic instances of theDKPRG according to the rigorous mathematical description provided by the exact formulas (19) - (26).Apart from the exact formulas, we also derived the approximate formulas (30) - (36). They can be quiteuseful because they are considerably easier to compute and are exceedingly good approximations for large n . This fact is easily corroborated by comparing Figure 10 to the approximations shown in Figures 11 and12. Let us remark that the derived equations generalize previously presented formulas in the literature.It is worth mentioning that the fact that our strategy exhibits very rapid convergence to the steadystate of utilization 1 . References [1] W. B. Arthur, “Inductive reasoning and bounded rationality,”

The American economic review , vol. 84,no. 2, pp. 406–411, 1994.[2] C. H. Yeung and Y. C. Zhang, “Minority games,” 2008.[3] D. Challet and Y.-C. Zhang, “Emergence of cooperation and organization in an evolutionary game,”

Physica A: Statistical Mechanics and its Applications , vol. 246, no. 3-4, pp. 407–418, 1997.[4] P. Banerjee, M. Mitra, and C. Mukherjee, “Kolkata paise restaurant problem and the cyclically fairnorm,” in

Econophysics of Systemic Risk and Network Dynamics , pp. 201–216, Springer, 2013.[5] J. F. Bonnans and A. Shapiro,

Perturbation analysis of optimization problems . Springer Science &Business Media, 2013.[6] D. Feillet, P. Dejax, and M. Gendreau, “Traveling salesman problems with proﬁts,”

TransportationScience , vol. 39, pp. 188–205, may 2005.[7] B. K. Chakrabarti, “Kolkata restaurant problem as a generalised el farol bar problem,” in

Econophysicsof Markets and Business Networks , pp. 239–246, Springer, 2007.[8] A. S. Chakrabarti, B. K. Chakrabarti, A. Chatterjee, and M. Mitra, “The kolkata paise restaurantproblem and resource utilization,”

Physica A: Statistical Mechanics and its Applications , vol. 388,no. 12, pp. 2420–2426, 2009.[9] A. Ghosh, A. S. Chakrabarti, and B. K. Chakrabarti, “Kolkata paise restaurant problem in someuniform learning strategy limits,” in

Econophysics and Economics of Games, Social Choices andQuantitative Techniques , pp. 3–9, Springer, 2010.[10] B. K. Chakrabarti, A. Chatterjee, A. Ghosh, S. Mukherjee, and B. Tamir,

Econophysics of the KolkataRestaurant Problem and Related Games . Springer International Publishing, 2017.[11] A. Ghosh, A. Chatterjee, M. Mitra, and B. K. Chakrabarti, “Statistics of the kolkata paise restaurantproblem,”

New Journal of Physics , vol. 12, no. 7, p. 075033, 2010.[12] A. Ghosh, S. Biswas, A. Chatterjee, A. S. Chakrabarti, T. Naskar, M. Mitra, and B. K. Chakrabarti,“Kolkata paise restaurant problem: An introduction,” in

Econophysics of Systemic Risk and NetworkDynamics , pp. 173–200, Springer, 2013.[13] D. Ghosh and A. S. Chakrabarti, “Emergence of distributed coordination in the kolkata paise restau-rant problem with ﬁnite information,”

Physica A: Statistical Mechanics and its Applications , vol. 483,pp. 16–24, 2017.[14] P. Yang, K. Iyer, and P. I. Frazier, “Mean ﬁeld equilibria for competitive exploration in resourcesharing settings,” in

Proceedings of the 25th International Conference on World Wide Web , pp. 177–187, 2016.[15] S. Agarwal, D. Ghosh, and A. S. Chakrabarti, “Self-organization in a distributed coordination gamethrough heuristic rules,”

The European Physical Journal B , vol. 89, no. 12, p. 266, 2016.[16] I. Milchtaich, “Congestion games with player-speciﬁc payoﬀ functions,”

Games and economic behavior ,vol. 13, no. 1, pp. 111–124, 1996.[17] L. Martin, “Extending kolkata paise restaurant problem to dynamic matching in mobility markets,”

Junior Management Science , p. Bd. 4 Nr. 1 (2019), 2019.[18] F. Abergel, B. K. Chakrabarti, A. Chakraborti, and A. Ghosh,

Econophysics of systemic risk andnetwork dynamics . Springer, 2012. 3019] K. Sharma, A. S. Chakrabarti, A. Chakraborti, S. Chakravarty, et al. , “The saga of kpr: Theoreticaland experimental developments,” arXiv preprint arXiv:1712.06358 , 2017.[20] T. Park and W. Saad, “Kolkata paise restaurant game for resource allocation in the internet of things,”in , pp. 1774–1778, IEEE, 2017.[21] A. Sinha and B. K. Chakrabarti, “Phase transition in the kolkata paise restaurant problem,”

Chaos:An Interdisciplinary Journal of Nonlinear Science , vol. 30, p. 083116, aug 2020.[22] D. A. Meyer, “Quantum strategies,”

Physical Review Letters , vol. 82, no. 5, p. 1052, 1999.[23] J. Eisert, M. Wilkens, and M. Lewenstein, “Quantum games and quantum strategies,”

Physical ReviewLetters , vol. 83, no. 15, p. 3077, 1999.[24] A. Li and X. Yong, “Entanglement guarantees emergence of cooperation in quantum prisoner ' sdilemma games on networks,” Scientiﬁc Reports , vol. 4, sep 2014.[25] X. Deng, Q. Zhang, Y. Deng, and Z. Wang, “A novel framework of classical and quantum prisoner’sdilemma games on coupled networks,”

Scientiﬁc Reports , vol. 6, mar 2016.[26] K. Giannakis, G. Theocharopoulou, C. Papalitsas, S. Fanarioti, and T. Andronikos, “Quantum con-ditional strategies and automata for prisoners’ dilemmata under the EWL scheme,”

Applied Sciences ,vol. 9, p. 2635, jun 2019.[27] T. Andronikos, A. Sirokofskich, K. Kastampolidou, M. Varvouzou, K. Giannakis, and A. Singh,“Finite automata capturing winning sequences for all possible variants of the PQ penny ﬂip game,”

Mathematics , vol. 6, p. 20, feb 2018.[28] K. Giannakis, C. Papalitsas, K. Kastampolidou, A. Singh, and T. Andronikos, “Dominant strategiesof quantum games on quantum periodic automata,”

Computation , vol. 3, pp. 586–599, nov 2015.[29] K. Kastampolidou, M. N. Nikiforos, and T. Andronikos, “A brief survey of the prisoners’ dilemmagame and its potential use in biology,” in

Advances in Experimental Medicine and Biology , pp. 315–322, Springer International Publishing, 2020.[30] G. Theocharopoulou, K. Giannakis, C. Papalitsas, S. Fanarioti, and T. Andronikos, “Elements ofgame theory in a bio-inspired model of computation,” in , IEEE, jul 2019.[31] K. Kastampolidou and T. Andronikos, “A survey of evolutionary games in biology,” in

Advances inExperimental Medicine and Biology , pp. 253–261, Springer International Publishing, 2020.[32] P. Sharif and H. Heydari, “Strategies in a symmetric quantum kolkata restaurant problem,” in

AIPConference Proceedings 1508 , AIP, 2012.[33] P. Sharif and H. Heydari, “An introduction to multi-player, multi-choice quantum games: Quan-tum minority games & kolkata restaurant problems,” in

Econophysics of Systemic Risk and NetworkDynamics , pp. 217–236, Springer, 2013.[34] M. Ramzan, “Three-player quantum kolkata restaurant problem under decoherence,”

Quantum in-formation processing , vol. 12, no. 1, pp. 577–586, 2013.[35] B. F. Voigt, “”der handlungsreisende, wie er sein soll und was er zu thun hat, um auftr¨age zu erhaltenund eines gl¨ucklichen erfolgs in seinen gesch¨aften gewiss zu zu sein”,”

Commis-Voageur, Ilmenau. Neuaufgelegt durch Verlag Schramm, Kiel , 1981.[36] N. Ascheuer, M. Fischetti, and M. Gr¨otschel, “Solving the asymmetric travelling salesman problemwith time windows by branch-and-cut,”

Mathematical Programming , vol. 90, pp. 475–506, May 2001.[37] G. Gutin and A. P. Punnen,

The traveling salesman problem and its variations , vol. 12. SpringerScience & Business Media, 2006.[38] J. Jones and A. Adamatzky, “Computation of the travelling salesman problem by a shrinking blob,”

Natural Computing , vol. 13, pp. 1–16, oct 2013.3139] L. Bianchi, M. Dorigo, L. M. Gambardella, and W. J. Gutjahr, “A survey on metaheuristics forstochastic combinatorial optimization,”

Natural Computing , vol. 8, pp. 239–287, jun 2009.[40] C. Blum and A. Roli, “Metaheuristics in combinatorial optimization: Overview and conceptual com-parison,”

ACM computing surveys (CSUR) , vol. 35, no. 3, pp. 268–308, 2003.[41] C. Papalitsas, K. Giannakis, T. Andronikos, D. Theotokis, and A. Sifaleras, “Initialization methods forthe TSP with time windows using variable neighborhood search,” in , IEEE, jul 2015.[42] C. Papalitsas, P. Karakostas, and K. Kastampolidou, “A quantum inspired GVNS: Some preliminaryresults,” in

Advances in Experimental Medicine and Biology , pp. 281–289, Springer InternationalPublishing, 2017.[43] C. Papalitsas, P. Karakostas, T. Andronikos, S. Sioutas, and K. Giannakis, “Combinatorial GVNS(general variable neighborhood search) optimization for dynamic garbage collection,”

Algorithms ,vol. 11, p. 38, mar 2018.[44] C. Papalitsas, P. Karakostas, K. Giannakis, A. Sifaleras, and T. Andronikos, “Initialization methodsfor the TSP with time windows using qGVNS,” in , jun 2017.[45] C. Papalitsas and T. Andronikos, “Unconventional GVNS for solving the garbage collection problemwith time windows,”

Technologies , vol. 7, p. 61, aug 2019.[46] C. Papalitsas, T. Andronikos, and P. Karakostas, “Studying the impact of perturbation methodson the eﬃciency of GVNS for the ATSP,” in

Variable Neighborhood Search , pp. 287–302, SpringerInternational Publishing, 2019.[47] Papalitsas, Karakostas, and Andronikos, “A performance study of the impact of diﬀerent perturbationmethods on the eﬃciency of GVNS for solving TSP,”

Applied System Innovation , vol. 2, p. 31, Sep2019.[48] C. Papalitsas, T. Andronikos, K. Giannakis, G. Theocharopoulou, and S. Fanarioti, “A QUBO modelfor the traveling salesman problem with time windows,”

Algorithms , vol. 12, p. 224, Oct 2019.[49] J.-F. Mertens, “Supergames,” in

Game Theory , pp. 238–241, Springer, 1989.[50] R. Lawler, Lenstra,

The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization .John Wiley & Sons, 1985.[51] C. Rego, D. Gamboa, F. Glover, and C. Osterman, “Traveling salesman problem heuristics: Leadingmethods, implementations and latest advances,”

European Journal of Operational Research , vol. 211,no. 3, pp. 427 – 441, 2011.[52] R. Johnson and M. G. Pilcher, “The traveling salesman problem, edited by e. l. lawler, j. k. lenstra,a.h.g. rinnooy kan, and d.b. shmoys, john wiley & sons, chichester, 1985, 463 pp,”

Networks , vol. 19,pp. 615–616, aug 1989.[53] J. Pitman,

Probability . Springer New York, 1993.[54] A. Klenke,

Probability Theory . Springer London, 2014.[55] J. Munkres,

Topology . Upper Saddle River, NJ: Prentice Hall, Inc, 2000.[56] A. DasGupta,

Fundamentals of Probability: A First Course . Springer-Verlag GmbH, 2010.[57] C. E. Robert Adams,

Calculus . Pearson Education (US), 2016.[58] S. Axler,