[PDF] The Curse of Shared Knowledge: Recursive Belief Reasoning in a Coordination Game with Imperfect Information

Abstract

Common knowledge is a necessary condition for safe group coordination. When common knowledge can not be obtained, humans routinely use their ability to attribute beliefs and intentions in order to infer what is known. But such shared knowledge attributions are limited in depth and therefore prone to coordination failures, because any finite-order knowledge attribution allows for an even higher order attribution that may change what is known by whom. In three separate experiments we investigate to which degree human participants (N=802) are able to recognize the difference between common knowledge and nth-order shared knowledge. We use a new two-person coordination game with imperfect information that is able to cast the recursive game structure and higher-order uncertainties into a simple, everyday-like setting. Our results show that participants have a very hard time accepting the fact that common knowledge is not reducible to shared knowledge. Instead, participants try to coordinate even at the shallowest depths of shared knowledge and in spite of huge payoff penalties.

Full PDF

TThe Curse of Shared Knowledge: Recursive Belief Reasoning in a Coordination Gamewith Imperfect Information

Thomas Bolander, Robin Engelhardt, and Thomas S. Nicolet Department of Applied Mathematics and Computer Science, Technical University of Denmark,Richard Petersens Plads, building 324, DK-2800 Lyngby, Denmark Center for Information and Bubble Studies, Department of Communication,University of Copenhagen, Karen Blixens Plads 8, DK-2300 Copenhagen S.

Common knowledge is a necessary condition for safe group coordination. When common knowl-edge can not be obtained, humans routinely use their ability to attribute beliefs and intentions inorder to infer what is known. But such shared knowledge attributions are limited in depth andtherefore prone to coordination failures, because any ﬁnite-order knowledge attribution allows foran even higher order attribution that may change what is known by whom. In three separate ex-periments we investigate to which degree human participants (N=802) are able to recognize thediﬀerence between common knowledge and nth-order shared knowledge. We use a new two-personcoordination game with imperfect information that is able to cast the recursive game structure andhigher-order uncertainties into a simple, everyday-like setting. Our results show that participantshave a very hard time accepting the fact that common knowledge is not reducible to shared knowl-edge. Instead, participants try to coordinate even at the shallowest depths of shared knowledge andin spite of huge payoﬀ penalties.

Introduction

Successful group coordination requires complementarychoices among group members, which, in turn, requirescommunication of beliefs and intentions in such a waythat they become common knowledge [1]. A fact is saidto be common knowledge if everyone knows it, and ev-eryone knows everyone knows it, and everyone knows ev-eryone knows everyone knows it, and so on, ad inﬁnitum[2–5]. If the premise of “everyone knows” is not inﬁnitelynested, but only nested to ﬁnite depth, we instead have shared knowledge . If there is no nested knowledge aboutknowledge at all and not everyone necessarily knows thefact, we have private knowledge .Let us illustrate the diﬀerence between these notionswith an example: Two friends, Agnes and Bertram, aretaking diﬀerent trekking routes to the top of a moun-tain. In the morning they agree that if the weather getsbad, they will go back down and sleep in the mountainhut at the base. Otherwise, they strongly prefer to stayovernight at the top. The equipment essential for anovernight stay at the top has been divided between theirbackpacks. It is therefore crucial that if one of them de-cides to go to the top, the other one does the same. Onthe way to the top, they both observe a thunderstormapproaching, but are uncertain about whether the otherperson has seen it. At this point they both know thefact that “a thunderstorm approaches”, but don’t knowwhether the other knows. In this situation, we would saythat Agnes and Bertram both have private knowledge that a thunderstorm approaches, and since they bothknow it, it is also shared knowledge between them. Moregenerally, a fact is private knowledge in a group of agentsif some non-empty subset of the agents know the fact.For the special case where everybody in the group knowsit, we say that there is shared knowledge to depth one (or ﬁrst-order shared knowledge ) of the fact [3]. Since Bertram doesn’t know whether Agnes knowsabout the thunderstorm, he would like to warn her. Sohe sends a text message: “Thunderstorm approaching.Let’s meet at base.” However, due to the unstable mo-bile network signals, he is not certain that the messagewill go through. Therefore he asks Agnes to conﬁrmthat she has received the message. A few minutes later,he receives her conﬁrmation. At this point it has be-come shared knowledge to depth two (or second-ordershared knowledge ) that a thunderstorm is approaching:She knows that he knows, since she received his message,and he knows that she knows, because he received herconﬁrmation. To have shared knowledge to depth three ( third-order shared knowledge ), it would additionally berequired that A) he knows that she knows that he knows,and B) that she knows that he knows that she knows. Infact, A already holds, since she conﬁrmed receiving hismessage. However, B doesn’t hold, since she will be un-certain about whether her conﬁrmation was received ingood order. Thus we have an asymmetry in the level ofknowledge of the two agents. If she also asks him to con-ﬁrm her message, and she receives such a conﬁrmation,then of course she will get to know that he knows thatshe knows. Then there will be shared knowledge to depththree. However, there will still be a (higher-order) knowl-edge asymmetry, since Bertram can’t be certain that thelast message was received.How many messages back and forth does it take forAgnes and Bertram to coordinate going back to the basein the evening? At ﬁrst it might seem that it is suﬃ-cient for both of them to know that at least one of themplans to go to the base. However, that is not so. Af-ter the ﬁrst message has been received, both know thatBertram plans to go to the base, but he is still uncer-tain whether she knows. And if she doesn’t, he mightrisk leaving her alone at the top. So shared knowledgeto depth one is clearly not suﬃcient. Even shared knowl- a r X i v : . [ c s . M A ] A ug edge to depth two is not suﬃcient. After she has con-ﬁrmed receiving his original message, from her perspec-tive it is still entirely possible that he doesn’t know thatshe knows, and hence, he might decide to go to the topto not leave her alone. And in this case, she also has togo to the top in order not to leave him alone. This ar-gument can be generalised to prove that even n th-ordershared knowledge for any arbitrary large n is insuﬃcientfor safe coordination. The consequence is that no ﬁnitenumber of messages successfully delivered will guaranteethat Agnes and Bertram manage to meet at the base.In order to guarantee meeting at the base, they wouldneed to have shared n th-order knowledge for all n ∈ N .Shared n th-order knowledge of some fact for all n ∈ N iscalled common knowledge of the fact. For a more formaldeﬁnition of private, shared and common knowledge, seethe Supplementary Information.In practice, shouldn’t it be possible to coordinate meet-ing at the base after one or two messages being sent backand forth? Isn’t it a purely mathematical problem withno practical implications for humans trying to coordinatetheir actions? One of the main goals of this paper is toargue that the answer to both questions is no. In orderto make that argument, we have designed a coordinationgame that has a similar underlying mathematical struc-ture as the example just given, but cast in a simpler, moreeveryday-like setting, where the higher-order uncertaintyis established already when the game is initialised andnot, as above, through a series of message passings. Thegame was originally developed to illustrate how the dif-ference between shared and common knowledge can havea real impact on human behaviour. Humans generallyﬁnd the concept of common knowledge hard to grasp,and even harder to grasp the practical relevance of, dueto the unbounded nesting of knowledge involved. Thegame we developed intends to make it clear that humanintuitions about common knowledge can be misleading To derive a contradiction, suppose n th-order shared knowledgefor some n is suﬃcient to make it safe to go to the base, in thesense of guaranteeing that the other person will also go there.This implies that the succesful delivery of the n th message issuﬃcient to guarantee meeting at the base. There must thenexist a smallest number n such that the succesful delivery ofthe n th message is suﬃcient to guarantee meeting at the base.Since n is the smallest such number, the succesful delivery ofthe ( n − n th messagehas been successfully delivered, the sender of this message is stilluncertain about whether it was actually received, and hence thesender is only certain that the ﬁrst n − n thmessage considers it possible—even after the successful deliveryof the message—that only the ﬁrst n − not safe to go to the base.Hence that person will after having sent the n th message stilconsider it unsafe to go to the base, and will choose to go to thetop. This is a contradiction, completing the proof. This proof isstated in rather informal terms, but can be turned into a formal,mathematical proof [1]. and may have costly consequences. The curse of shared knowledge

Reasoning about the knowledge of others, their reason-ing about you, and your reasoning about their reasoning,and so on, is famous in cognitive science for its presumedcomputational intractability [6]. Because of this, coordi-nating species typically use heuristic shortcuts in order towork with nested knowledge states like common knowl-edge, such as joint perceptual cues and broadcasted sig-nals [7–9]. Humans may obtain common knowledge viamutually accessible ﬁrst-order sensory experiences [10–13], eye contact [14], public rituals and conventions [2],or salient focal points [15]. What most prominently isbelieved to distinguish human coordination from otheranimals, however, is the enormous ﬂexibility by whichhumans can imagine and articulate the mental states oftheir peers [16, 17]. The abilities to blush, to tell jokes,and to write novels testify that humans readily attribute higher-order beliefs, intentions, and reasoning capabili-ties to other people, such as thinking explicitly about themental states of others who think about the thoughts andbeliefs of others and so on, while at the same time ap-preciating that those thoughts and beliefs can diﬀer fromeach other and from reality. Such higher-order cognitionhas seen substantial scientiﬁc attention, and has broughtabout various technical terms such as “theory of mind”(ToM) [18], “mentalizing” [19], “mind reading” [20, 21],“mental models” [22], “mind perception” [23], “perspec-tive taking”, and “social intelligence” [24], which oftenare used interchangeably for studying the cognitive mech-anisms of shared knowledge, but sometimes focus onslightly diﬀerent ideas and associated meanings [25], de-pending on the ﬁeld of investigation.Looking at the ToM-literature, conclusions abouthuman belief reasoning abilities are rather heteroge-neous [26, 27]: Human reasoning about the reasoningprocesses of other humans is limited [18, 28–33], contex-tual, and possibly domain speciﬁc [34–36]. Three-year-old children tend to fail in the well-known ﬁrst-order falsebelief tasks by falsely assuming that their private infor-mation is shared by others, while second-order false be-lief tasks are mastered around ages 5-7 [37, 38]. Adultsmay reliably master up to four orders [39], but still havediﬃculties ignoring the private information they possesswhen assessing the beliefs of others, resulting in a curseof knowledge bias which can compromise their ability tomake predictions about other people’s beliefs and ac-tions [1, 40]. When the nested mental states representa succession of diﬀerent people, such as “Alice thinksthat Bob thinks that Carol is contemplating the ideathat David is thinking about Evelyn”, we have less prob-lems following along than when the nested mental states Especially when those successions of mental states are qualiﬁedby psychological attribute words such as ‘Alice thinks that Bob is are successions of the same people over and over again,and thus are truly recursive, such as “I think that youthink that I contemplate the idea that you are thinkingabout me”. We get confused more easily by the latterformulation, since we need to keep track of several rep-resentations of ourselves and of the other, each represen-tation diﬀering in its perspective and in the number ofmental states it presupposes [6]. When humans competeor try to detect cheaters, higher-order belief reasoningseems to perform better than lower-order belief reason-ing [44]. In negotiations and other mixed motive situa-tions, where innuendo, threads, bribes and other kindsof indirect propositions are common, humans are verygood at the strategic use of higher-order belief reasoning,for instance as a means to prevent common knowledge incertain groups of agents, or as a means to form speciﬁcknowledge alliances [45–47]. ToM proﬁciency may alsobe facilitated by providing games with stepwise increasein ToM [48].In pure coordination problems, such as pedestrianschoosing sides, or people agreeing on new words or onnew technical standards, common knowledge is the pref-ered informational state for all members of the group,because it ensures that all sides ﬁnd an optimal commonequilibrium. If there are no or limited means by whichto communicate, however, people face an equilibrium-selection problem for which neither game theory nor theToM literature has any clear solution. Although some ex-perimental evidence [49] suggests that higher-order ToMreasoning may improve coordination eﬀorts, other workseems to suggest that coordination favours lower ordersof ToM sophistication [50, 51]. The challenge of tacitcoordination is particularly relevant for artiﬁcial intelli-gence research and for social cognitive robotics, wherethe implementation of ToM-like processes into artiﬁcialsocial agents is believed to be an important step towardsreliable human-robot interaction [52–55].Recently, researchers have investigated whether hu-mans have adapted speciﬁcally to recognizing commonknowledge as a separate cognitive category, distinct fromboth private and shared knowledge [6]. Controlled purecoordination experiments in social settings on marketcollaboration [56], the bystander eﬀect [57], indirectspeech [58], self-conscious emotions [59], and charity [60],consistently ﬁnd that people indeed make strategicallydiﬀerent choices under common knowledge conditions(typically presented in the form of public anouncements),compared to situations in which there is only privateknowledge (in the form of private messages) or sharedknowledge (private messages that elaborate on the depthof knowledge of other participants). Apart from seeing aclear beneﬁt of common knowledge, some of these studiesalso showed that people have a hard time discriminating mistakenly worrying that Carol is oﬀended by misunderstandingsomething Dave had said to Evelyn [42]. between various orders of shared knowledge, and thatcoordination eﬀorts do not correlate with payoﬀ condi-tions [56], which is in contrast to the assumptions of stan-dard rational choice theory in which payoﬀs are expectedto be maximized [61].So if humans indeed have adapted to recognize com-mon knowledge in the wild , the question remains if theyare also able to recognize the diﬀerence between commonknowledge and n th-order shared knowledge for some (po-tentially large) n . In other words, while humans are ableto reliably detect proper common knowledge in a widerange of situations, how good are they at refraining frominferring common knowledge in situations with only n th-order shared knowledge? We do not know, in part be-cause many existing experimental designs stop after de-scribing 2-3 orders of belief reasoning to the participants,as higher orders require quite convoluted sentences thattend to become incomprehensible and increase experi-mental error. Or, as in the mountain trekking example,they require reasoning about the consequences of a highnumber of (message passing) actions that each changethe mental state of the involved agents.The latter has been explored theoretically in the ‘elec-tronic mail game’ by Rubinstein [62], a game versionof the mountain trekking example presented above (andof the structurally equivalent ‘coordinated attack prob-lem’ [1]). The Rubinstein paper shows that ‘almost com-mon knowledge’ in the sense of n th-order shared knowl-edge for some large n , leads to a very diﬀerent expectedplayer behaviour than ‘absolute common knowledge’. Es-sentially his conclusion, translated into the context of themountain trekking example, is that common knowledgewill make the two mountain hikers both go to the base,whereas if there is only n th-order shared knowledge forsome n , then both will meet at the top, independent of n and despite the bad weather condition (resulting in non-maximal payoﬀs). Rubinstein does a pure game-theoreticanalysis of the game with no experiments, and only spec-ulates what people playing the game might do. We ﬁndit interesting to dig deeper into how humans would playand reason about such games in practice. What wouldtheir intuition recommend them to do? Which depth ofshared knowledge (if any) would be enough to attemptrisky coordination that would lead to maximal payoﬀ ifsuccessful? How would they be certain that the personthey try to coordinate with thinks that the same depthis suﬃcient?The electronic mail game and the mountain trekkingexample are complicated in terms of the dynamics of it-erated message passing. In this paper, we devise a novelgame in which the higher orders of shared knowledge arenot achieved dynamically via actions, but are alreadypresent at the beginning of the game, using uncertaintyabout arrival times. This, we believe, makes the gameeasier to understand. Letting humans play our game,we have been able to address the previous questions inmore detail. Our results show that people indeed have avery hard time accepting the fact that common knowl-edge is not reducible to shared knowledge of ﬁnite depth.On the contrary, participants try to coordinate even atthe shallowest depths of shared knowledge and in spiteof huge payoﬀ risks. The reason, we believe, is that thesole presence of shared knowledge is enough to make par-ticipants try to coordinate, and that moderate depthsof shared knowledge become eﬀectively indistinguishablefrom common knowledge due to the recursive nature ofthe game. We call this eﬀect “the curse of shared knowl-edge” because even small depths of shared knowledgeraises the participant’s expectation of being able to co-ordinate in spite of repeated payoﬀ penalties for havingmiscoordinated before. Experimental Design

The experiment is designed as a two-player coordinationgame with imperfect information. The game is inspiredby the structure of the consecutive number riddle, alsocalled the Conway paradox, see e.g. van Emde Boas et al.[63], van Ditmarsch and Kooi [64]. Our game is framedas an everyday situation, where two colleagues arriveat their workplace in the morning, and have to decidewhether to meet in the canteen for a morning coﬀee orgo straight to their oﬃces and start working immediately.We call the game the ‘Canteen Dilemma’. The purposeof framing it in an everyday situation is to attempt tomake some of the recursive reasoning easier to compre-hend [65, 66]. The introductory story of the game goesas follows:“Every morning you arrive at work between8:10 am and 9:10 am. You and your colleaguewill arrive by bus 10 minutes apart. Example:You arrive at . Your colleague mayarrive at , or . Both of youlike to meet in the canteen for a cup of coﬀee.If you arrive before 9:00 am, you have timeto go to the canteen, but you should only goif your colleague goes to the canteen as well.If you or your colleague arrive at 9:00 am orafter, you should go straight to your oﬃces.”The game has 10 rounds on MTurk and participants aretold that at the beginning of each round they will knowonly their own arrival time, and based on this will haveto decide whether to go to the canteen or the oﬃce. Af-ter choosing an option, participants are asked to estimatetheir certainty that their colleague will choose the sameoption (on a ﬁve-point Likert scale). We call the valuechosen the certainty estimate of the participant. A ﬁxedparticipation fee of $2 is given to all players who ﬁnishthe game. Additional bonuses are calculated with a log-arithmic scoring rule whereby each participant is givenan initial bonus of $10, which is then reduced by a vari-able penalty in each round, depending on the players’decisions and certainty estimates.The game has three possible outcomes: 1) both choosethe canteen which we refer to as coordination into thecanteen ; 2) both choose their respective oﬃces which we refer to as coordination into the oﬃces ; 3) one choosesthe canteen and the other chooses the oﬃce which werefer to as miscoordination . Penalties are tiered in sucha way that a small penalty is deducted for successful co-ordination into the canteen (achieving the highest pay-oﬀ), which is doubled for coordination into the oﬃces(achieving the second-highest payoﬀ), while the penaltyfor miscoordination or forbidden choices, i.e. going to thecanteen at 9 am or after, is much larger (up to 921 timeslarger, meaning a signiﬁcantly lower payoﬀ than the pre-vious two). See Materials and Methods for details aboutthe payoﬀ structure. In the instructions shown to theparticipants beforehand, we also include payoﬀ examplesof both successful and failed coordinations. Screenshotsand full descriptions of the experimental setup can befound in the Supplementary Information.For the main experiment, we recruited a total of 680participants from Amazon Mechanical Turk (MTurk) toplay for a maximum of 10 rounds, making a total of n = 4260 choices. In addition, we conducted two supple-mentary classroom experiments with 80 students (DTU1- n = 2160) from the Technical University of Denmark(DTU) taking a course on Artiﬁcial Intelligence andMulti-Agent Systems, and 42 additional student (DTU2- n = 1012) taking an introductory course in Artiﬁ-cial Intelligence. The two classroom experiments diﬀeredslightly from the MTurk experiment in that the studentsgot an initial bonus (endowment) of $30 instead of $10,and played 30 rounds instead of 10. Also, in the class-room experiments, all students were told that they wouldnot receive any monetary rewards for playing the game,but they should still try to do their best. The studentsalso had to answer a few additional post-game questions,see the Supplementary Information for a full list of thosequestions. Game Strategies

What are the relevant strategies for this game? First notethat going to the canteen at 9:00 or after results in theworst possible payoﬀ. So both players should always goto the oﬃce if they arrive at 9:00 or after. How about ifboth arrive strictly before 9:00? If both choose canteen,they getter a better payoﬀ than if both choose oﬃce. Nowconsider a case where you are one of the players, and youarrive at 8:50. Then your colleague will be arriving ateither 9:00 or 8:40. If your colleague arrives at 9:00, shehas to choose oﬃce according to the previous argument,and then you would have to choose oﬃce as well to avoidthe large penalty of miscoordination. However, if yourcolleague arrives at 8:40, you may both choose the can-teen, and this will lead to the highest payoﬀ. In otherwords, depending on the arrival time of your colleague,a piece of information that you don’t have access to, thebest choice is either oﬃce or canteen. So which one tochoose?Since the penalty of miscoordination is very high, itwould seem best to choose oﬃce. What if you then in-stead arrive at 8:40? In this case, your colleague eitherarrives at 8:30 or 8:50. In both cases, you have timeto meet for a cup of coﬀee in the canteen, and doing sowill give you the highest payoﬀ. At ﬁrst, it might seemlike an easy choice. However, we just concluded that thebest strategy at 8:50 would be to go to the oﬃce. So, ifyou arrive at 8:40 and contemplate that your colleaguemight arrive at 8:50—and if you believe your colleaguewould reason as yourself and go to the oﬃce at 8:50—you also ought to go to the oﬃce at 8:40. This argumentcan of course be iterated, because if the optimal choiceat 8:40 is to go to the oﬃce, then the optimal choice at8:30 must also be to go to the oﬃce. In other words,the optimal strategy seems to be to always go to the of-ﬁce, independent of arrival time! And, indeed, so it is.If both players go to the oﬃce in all rounds and declarethe highest possible certainty in their decision, they willboth leave the experiment with $9.80, excluding the $2participation fee. This is the highest possible payoﬀ thatcan be guaranteed by any strategy in the game, and veryclose to the $10 that the players start out with. As wewill see later, the payoﬀs that people actually get whenplaying the game are signiﬁcantly lower than this.The all-oﬃce strategy described above, where you al-ways decide to go to the oﬃce independent of arrival time,is a safe strategy if both players follow it. By safe is meantthat there is never any risk of miscoordination, and henceno risk of getting the highest penalty (the penalty formiscoordination is up to $9.21 in a single round). It isactually the only safe strategy. The reason is that if atleast one of the players, say a , has the strategy of goingto the canteen at some time t before 9:00, then since theyboth have to go to the oﬃce at 9:00 or later, there mustexist at least one pair of arrival times for which the twoplayers are miscoordinated. The fact that the all-oﬃce strategy is the only safe oneis counter-intuitive to most people before being presentedwith the proof, and for some people even after. The issueis that, intuitively, it would seem to be safe to go to thecanteen at, say, 8:30. Why would you ever go to theoﬃce that early? You know that your colleague will thenbe arriving at 8:40, which is still plenty of time to geta cup of coﬀee before 9:00. The issue is of course thatif you take the perspective of your colleague, then yourcolleague arriving at 8:40 will consider it possible thatyou arrived at 8:50. And if you had indeed arrived at8:50, you would consider it possible that your colleaguehad arrived at 9:00. In that case you would be forced to Since a chooses to go to the canteen at time t , player b also has togo to the canteen at time t + 10, since otherwise whey would bemiscoordinated when a arrives at t and b at t + 10. But if b goesto the canteen at time t + 10, a also has to go to the canteen attime t + 20, since otherwise they would be miscoordinated when b arrives at t + 10 and a at t + 20. This can be generalized toconclude that a would have to go to the canteen at any time t + 20 x for x ≥ b would have to go to the canteen at anytime t + 10 + 20 y for y ≥

0. Clearly this implies going to thecanteen after 9:00. choose the oﬃce. A major point of our experiments is totest whether this kind of recursive perspective-taking isutilized by human players of the game.The argument of the all-oﬃce strategy being safe ofcourse relies on the other player following the same strat-egy. Since we don’t allow players to agree on a strategywith their co-player beforehand, the all-oﬃce strategydoesn’t necessarily in practice lead to the highest payoﬀfor a particular player. Another issue is that one mightdecide to play risky instead of safe. Consider the canteen-before-9 strategy of always going to the canteen before9:00 and going to the oﬃce at later times, all with thehighest certainty estimate. If both players choose thisstrategy and are fortunate to play 10 rounds without anyof them arriving at 9:00 or later, they will get the highestpossible payoﬀ of $9.90—slightly higher than the guaran-teed payoﬀ $9.80 of the all-oﬃce strategy. However, if allpairs of arrival times are equally likely, the probability ofmiscoordination is then 1 / . − $9 . · / − $0 . · / . t c and always go to the oﬃce if arrivingafter t c is called a cut-oﬀ strategy (with cut-oﬀ t c ). Thecanteen-before-9 strategy is a cut-oﬀ strategy with cut-oﬀ8:55 (see Materials and Methods for more details). Results

The maximal theoretical payoﬀ described in the previoussection was never observed in the experiments—actuallyquite far from it, despite doing the experiment with morethan 800 people. Recall that the payoﬀ of the all-oﬃcestrategy is $9 .

80 independent of arrival times. The av-erage bonus paid to our MTurk participants was a mere$2 .

36. Due to the penalty-based payoﬀ structure, only46 out of 340 MTurk groups (14%) were able to play 10rounds and still have any bonus left, while the averagenumber of rounds played was 6 .

3, see Table I. As soonas one of the players had no money left, the game wouldterminate.

Exp. N R ¯ r Ruin (%) Payoﬀ (%) ¯ s ($)MTurk 680 10 6.3 52.8 23.6 -1.59DTU1 80 30 27.0 17.5 27.0 -0.83DTU2 42 30 24.1 31.0 24.1 -0.98TABLE I: Exp. = experiment; N = number of subjects; R =maximum number of rounds; ¯ r = average number of roundsplayed; Ruin = percentage of participants loosing all theirbonus before (or in) round R; Payoﬀ = average earnings (givenas the retained percentage of the initial endowment); ¯ s =average penalty per player per round. Comparing the MTurk experiment with the DTU ex-periments in Table I, shows that the DTU participantswere slightly better on average. While more than half ofthe MTurk participants had lost their initial bonus andhad to end the game before the last round, only 18%and 31% of the DTU participants, respectively, had doneso. Especially the students from the Artiﬁcial Intelli-gence and Multi-Agent Systems course (DTU1) managedwell by retaining 27% of the initial endowment and loos-ing only $0 .

83 per round on average. This may come asno surprise, because these students were later into theirstudies, and had already been taught about social cogni-tion.Looking at Figure 1, we see the frequency of par-ticipants’ canteen choices as a function of their arrivaltime together with a ﬁtted binary logistic regression line.While DTU students (orange and green) show similarsteep proﬁles, MTurk participants have a slightly moregradual decline in canteen choices for increasing arrivaltimes. However, the point at which there is a 50% proba-bility of choosing the canteen or the oﬃce (see Materialsand Methods) is close to 8:50 in all three experiments.In the Discussion we therefore combine all three experi-ments in Figure 5 in order to understand the experimen-tal results in terms of degrees of shared knowledge.Figure 2 shows the distribution of certainty estimatesfor each arrival time. It clearly shows that it is exceed-ingly rare for any of the participants to consider it evenproblematic to go to the canteen when arriving early.Arriving at 8:40 or earlier is deemed suﬃciently early tovisit the canteen with high conﬁdence, arriving at 9:00or later is deemed oﬃce time with high conﬁdence, andarriving at 8:50 is deemed either oﬃce or canteen withat least being “somewhat certain”. The diﬀerence in cer-tainty estimates between MTurk participants and DTUstudents show that the latter tend to be more certain thattheir co-players follow a similar strategy (higher certaintyestimates for the early and late arrival times), and alsothat they are more aware of the danger of miscoordina-tion (lower certainty estimates around the cut-oﬀ). Thisis in particular the case for the DTU1 experiment thathas the steepest proﬁle. Being more certain that your co-players follow a similar strategy probably indicates that you believe such a strategy to be optimal. So, interest-ingly, the DTU1 participants are both the ones that ap-pear to be most aware of the danger of miscoordination,and at the same time those who most ﬁrmly believe acut-oﬀ strategy is optimal, i.e., believing that the risk ofmiscoordination is unavoidable. The diﬀerences betweenthe three experiments are however still relatively minor,and in the following we will combine data from all threeexperiments.

Group dynamics

Starting with the group dynamics in Figure 12, we seethe number of successful group coordinations into thecanteen/oﬃce (green/purple) together with the numberof miscoordinations (red) as a function of all possiblearrival time combinations. The ﬁgure shows clearly thatplayers are able to coordinate into the canteen more than80% of the time if both of them arrive before 8:50. Assoon as a group has a player who arrives at 8:50, however,the result changes drastically. Suddenly almost half ofsuch groups miscoordinate. As players experience harshpenalties for miscoordinating, one could perhaps expectto see a tendency of choosing oﬃce more often when ar-riving at 8:40 or 8:50 in subsequent rounds. That is,we might expect that players learn and converge to theall-oﬃce strategy in order to avoid miscoordination alto-gether. But this is not what we see.Figure 4 shows the mean frequency of canteen choicesas a function of rounds played for all three experiments.Each color corresponds to a certain arrival time. Clearly,the only arrival times that do not converge towards eitherthe canteen or the oﬃce are the arrival times of 8:40 and8:50, with the former ﬂuctuating around 90% canteenchoices and the latter ﬂuctuating around 50% canteen DUULYDO I U HTXHQ F\ R I F DQ W HHQ F KR L F H 07XUNQ '78Q '78Q FIG. 1: Frequency of canteen choices as a function of arrivaltimes. Circles indicate the mean frequency of participantschoosing the canteen at a certain arrival time with error bars.Colored lines are logistic regression lines with bootstrapped95% conﬁdence intervals (10.000 resamples) shown as translu-cent bands. Fitted parameters show signiﬁcant diﬀerences forall three experiments. DUULYDO YHU\XQFHUWDLQ VOLJKWO\FHUWDLQ VRPHZKDWFHUWDLQ TXLWHFHUWDLQ YHU\FHUWDLQ F H U W D L Q W \ R I F RR U G L QD W L RQ FIG. 2: Violin plots of certainty estimates. In each round,participants were asked how certain they were of successful co-ordination with their colleague. Blue areas show the resultsfrom MTurk ( n = 4260) and orange areas show the resultsfrom DTU1 and DTU2 combined ( n = 3172). We predeﬁneda ﬁve point likert scale of certainty estimates as: ‘very un-certain’, ‘slightly certain, ‘somewhat certain’, ‘quite certain’,and ‘very certain’, and translated them into the numericalvalues of probability estimates used in the payoﬀ calculations(see Materials and Methods). choices. This indicates that participants arriving at 8:40or 8:50 do not feel incentivized to change their behaviorsigniﬁcantly in subsequent rounds, even though there is ahigh risk of miscoordination. This is not to say that par-ticipants do not learn that canteen choices at 8:40 or 8:50are dangerous. Partitioning the data from Figure 1 intotwo bins, corresponding to groups having had no mis-coordination and groups having had one or more misco-ordinations (see the supplementary data analysis in the : / : : / : : / : : / : : / : : / : : / : arrival time nu m b e r o f c h o i c e s ( n ) FIG. 3: Number of coordinations and miscoordinations as afunction of arrival times. Green means coordinating into thecanteen, purple means coordinating into the oﬃce, and redmeans miscoordination. We use the notation 8:00/8:10 to de-note the union of the arrival pairs (8:00 , , URXQG I U HTXHQ F\ R I F DQ W HHQ F KR L F H DUULYDOWLPH :/6 FIG. 4: Mean frequencies of canteen choices for all possiblearrival times as a function of the number of rounds played.The ﬁtted straight lines are weighted linear squares (WLS)with the weights chosen to be the square root of the number ofdata points constituting the mean frequencies for each round,also shown by dot size.

Supplementary Information), shows somewhat decreas-ing certainty estimates around the critical arrival times,especially for DTU student. However, this does not aﬀecttheir actual choices. MTurk participants do choose thecanteen a little less often after a miscoordination (seeFigure S6 in the Supplementary Information), but thisdoes not translate into better payoﬀs as later miscoordi-nations just move to earlier arrival times. So even thoughparticipants learn that their choices are risky, they don’tsee any way to improve their strategy. Speciﬁcally, theynever converge to the optimal all-oﬃce strategy, and alsonot to the alternative canteen-before-9 strategy (cf. The-orem 4 in Materials and Methods). This apparent lack ofbehavioral change in higher-order social reasoning gamesis also shown in Verbrugge & Mol [67].

Discussion

Let us try to analyse the experimental results in terms ofthe depth of knowledge of the participants. The highestpayoﬀ is achieved when coordinating into the canteen be-fore 9 am. With the aim of achieving the highest possiblepayoﬀ, each participant can be expected to consider herown arrival time and try to assess whether there is stilltime to meet in the canteen. When a participant arrivesstrictly before 9 am, i.e. at 8:50 or earlier, she has privateknowledge that she arrives suﬃciently early to go to thecanteen. If participants only make choices based on theirprivate knowledge, we should then expect participants toalways go to the canteen at 8:50. This is not what wesee, cf. Figure 1. Thus, other considerations in additionto the player’s private knowledge must play a role in theirdecision-making.When participants know that they both have arrivedbefore 9:00, they have shared knowledge of having ar-rived in time for going to the canteen. This happens forany arrival pair ( t , t ) with t i ≤ i = 1 , players’ knowledge level ofsuﬃcient time for canteen:arrival time player 1:arrival time player 2: 8:208:20 8:308:30 8:408:40 8:508:50 9:009:00 9:109:10nonenoneprivateshareddepth 1shareddepth 2shareddepth 3FIG. 5: The solid black lines express indistinguishability for the players, e.g. the arrival time 8:40 for player 1 has a lineto both of the arrival times 8:30 and 8:50 for player 2, since these are the two arrival times for player 2 that player 1 willconsider possible when herself arriving at 8:40. Below each possible arrival time, we have marked the highest level of knowledgeconcerning whether there is suﬃcient time to go to the canteen, e.g. when arriving at 8:40 there is shared knowledge to depth 1of this fact, but not shared knowledge to depth 2. In blue, a binary logistic regression model was used to predict the probabilityof a participant going to the canteen (upper limit) or to the oﬃce (lower limit) at the shown arrival times. The width of theregression line indicates the 95% conﬁdence interval using 10.000 bootstrapped resamples of all choices in all three experiments( N = 7432). again, an arrival pair ( t , t ) denotes that player 1 arrivesat time t and player 2 at time t ). Note that for an ar-rival pair (8:50 , know there to be shared knowledge (to depth1). When arriving at 8:30 or before, the player addi-tionally knows there to be shared knowledge to depth 2.We illustrate this in Figure 5. Note that in general, if aplayer arrives at time 8:50 − n , n >

0, then that playerknows that there is n th-order shared knowledge, but theplayer doesn’t know there to be ( n + 1)st-order sharedknowledge. This follows a similar pattern as the moun-tain trekking example, except here the depth of sharedknowledge is determined by how early ahead of 9 amthe agents arrive, rather than how many messages havesuccessfully been delivered. No number of messages wassuﬃcient to achieve common knowledge in the mountaintrekking example. We similarly get that no arrival timeis suﬃciently early to establish common knowledge abouthaving time to meet in the canteen.The participants seem to clearly be able to distin-guish between private and shared knowledge, which issupported by their signiﬁcantly diﬀerent choices at 8:50and 8:40 (see again Figure 5). However, it is less clearwhether they are able to robustly distinguish diﬀerentlevels of shared knowledge, and whether they are able todistinguish that from common knowledge. Indeed, mostparticipants relatively robustly choose the canteen at 8:40and any time before that, despite the diﬀerence in depth of shared knowledge in those possible arrival times. Thecertainty estimates are however slowly decreasing from8:10 to 8:50 in all three experiments (see Figure 2), show-ing that the participants are not completely ignorant tothe diﬀerences. This could suggest that many partici-pants are aware that it is less safe to go to the canteenbased on n th-order shared knowledge than ( n +1)st-ordershared knowledge. However, very few seem to draw theconclusion that it is never safe to go to the canteen. Ourgame theoretic analysis showed that they ought to onlychoose the canteen when there is common knowledge thatit is safe, which in this case actually means never.Why do participants not regard earlier oﬃce choices asviable options? Why do participants not continue theirtrain of thought and deduce that when 8:50 turns out tobe unsafe, 8:40 will become unsafe as well, which meansthat 8:30 will be unsafe also, etc.? One reason may bethat the beneﬁts of an all-oﬃce strategy are cognitivelyunavailable for the participants in the sense that partic-ipants have a limited ability to take the perspective ofeach other recursively. Another reason may be that thebeneﬁts of an all-oﬃce strategy are (vaguely) understood,but participants do not believe that their colleague willreason the same way as they do themselves, and insteadtry to guess what their colleague will choose. One can-didate of such a (mixed) strategy may be the following:1) always go to the canteen before 8:50, 2) always go-ing to the oﬃce after 8:50, and 3) do some guessworkat 8:50. The arrival time combinations 8:40 / / (’I don’t know’, ’Thereis no such time’, 8:00, 8:10, 8:20, 8:30, 8:40,8:50, 9:00, 9:10) The results in Figure 6A show that most answers rangefrom 8:30 to 8:50 (approximately 75 % of all answers),giving support to the verdict that participants are notable to continue taking the perspective of each other re-cursively, or at least that they believe that shared knowl-edge of some modest ﬁnite depth is suﬃcient for the can-teen choice to be safe. Rather, they stop after one or two,possibly three, iterations, thus believing that as long asthey arrive suﬃciently early, they can be sure to coordi-nate safely in the canteen. Notice that the correct answer“there is no such time” is chosen by less than 4% of allparticipants, close to the margin of random error. I d o n ' t k n o w T h e r e i s n o s u c h t i m e :

00 8 :

10 8 :

20 8 :

30 8 :

40 8 :

50 9 :

00 9 : F r a c t i o n o f p a r t i c i p a n t s A: Cutoff point

N=798

Yes No Don'tknow0.00.20.40.60.8 F r a c t i o n o f p a r t i c i p a n t s B: Common Knowledge

FIG. 6: A) Frequencies of answers to the question: “Imagineyou could have agreed beforehand with your colleague abouta point in time where it is safe to go to the canteen. Whattime would that be?” Due to the pragmatics of language, weassume that an answer like 8:30 entails the belief that all ear-lier arrival times would also be deemed safe. B) Frequenciesof answers to the question: “Imagine you arrive at 8:10. Isit common knowledge between you and your colleague thatit is safe to go to the canteen, that is, that you both arrivedbefore 9:00?”

This indicates that participants indeed do believe thatthere exists a strategy that includes canteen choices with-out the risk of miscoordination, supporting the conjec-ture stated in the introduction that moderate depthsof shared knowledge become eﬀectively indistinguishablefrom common knowledge. Participants might of coursenot necessarily have a precise idea of the technical no-tion of common knowledge, but as discussed in the in-troduction, there is actually quite a number of studiesdemonstrating that humans have adapted to recognizecommon knowledge and making distinct strategic choicesdepending on whether there is common, shared or pri-vate knowledge. In our experiments, we see the playerbehavior stabilizing already at relatively modest depths of shared knowledge, both in terms of action choices andcertainty estimates. And that player behavior matcheswhat we would expect to see if they indeed wrongly infercommon knowledge from shared knowledge to some ﬁnitedepth.To speciﬁcally address the issue of whether theywrongly infer common knowledge, we asked a ﬁnal post-game question:“Imagine you arrive at 8:10 am. Is it commonknowledge between you and your colleaguethat it is safe to go to the canteen, that is,you both arrived before 9:00 am?”. (‘Yes’,‘No’, ‘Don’t know’)

This question inquires about participants’ understand-ing of the term ‘common knowledge’, and how it ap-plies to the given situation. In Figure 6B, the resultsshow that 89% of all participants responded that it wascommon knowledge that both players arrived before 9:00,when they themselves had arrived at 8:10. The answersmay signify that indeed they believe there to be commonknowledge in the strict technical (logical) sense. But ofcourse the answers could also pertain to the everyday lin-guistic usage of the term ’common knowledge’, which isless strict.

Conclusion

We have devised a new coordination game, the CanteenDilemma, to investigate human higher-order social rea-soning. Our experimental results show that high levelsof recursive perspective-taking are cognitively unavail-able to the vast majority of players of the game. We seea signiﬁcant amount of miscoordination, which seems tooccur due to a “curse of shared knowledge”: the guise ofcommon knowledge existing in situations where there isonly shared knowledge to some limited depth.Our experience from playing the Canteen Dilemmawith many people and explaining to them its unintuitiveresult, is that many players simply do not accept the ar-gument that they cannot at any time coordinate safelyinto the canteen. On top of this, the certainty of partici-pants that they will coordinate into the canteen at earlyarrival times indicates that when participants fail at n th-order reasoning, they do not default to agnosticism, butthe opposite. That is, when there is a suﬃciently largeorder of shared knowledge about a fact, it is possiblethat such a fact is mistaken for proper common knowl-edge. An interesting avenue for future research would beto investigate if there may be any social and psycholog-ical beneﬁts of having an illusion of common knowledge ,such as a higher willingness to cooperate. Thus, what wehave called the ’curse’ of shared knowledge in the Can-teen Dilemma, may turn out to be a blessing in othersettings.An obvious question is how often this illusion of com-mon knowledge occurs in real life. For instance, in thereal-world version of the Canteen Dilemma scenario, thetwo colleagues would be likely to simply coordinate their0actions via cell phone (“I’ll arrive at 8:50 today. Are youup for a cup of coﬀee in the canteen?”). This suggeststhat the advent of modern technology could have madethe information asymmetry inherent in shared knowledgesituations less widespread. However, modern technologyhas also given us the Internet and social media, wherethe ﬂow of information is much more complex, creatingmore intricate cases of information asymmetry than everbefore.That humans tend to confuse shared and commonknowledge could possibly be due to a limited evolution-ary importance of being able to make the distinction. Itcould also be due to the distinction requiring too manycognitive ressources. Or it could be that the confusionactually leads to evolutionary beneﬁts in terms of higherdegrees of cooperation in most practically occurring set-tings. What exactly has lead to the confusion, and towhat degree it has any practical importance today, weleave as open problems. MethodsExperimental design and data collection

Experiments on Amazon Mechanical Turk had a total of 714participants (including dropouts, see Supplementary Infor-mation), while the two classroom experiments at the Tech-nical University of Denmark (DTU1 and DTU2) had a to-tal of 106 and 50 participants, respectively. The averagepayout to MTurk workers was $4.17 (including a generalparticipation fee of $2). After accepting our task and pro-viding informed consent, participants from MTurk were putin a ’waiting room’ until they were paired up with an-other participant. After an instructions page, detailing therules of the game, participants were given an arrival time t ∈ { , , , , , , , } and asked tomake a decision between between going to the canteen or tothe oﬃce. Next, participants were asked to estimate how cer-tain they were that their ‘colleague’ made the same choiceas them, ranging from ‘very uncertain’ over ‘slightly certain’,‘somewhat certain’ and ‘quite certain’ to ‘very certain’, whichwere translated into numerical values, e i , used in the payoﬀcalculations (see below). A results page was shown betweeneach round, showing the results of the previous rounds, includ-ing arrival times for both players, their choices, their own cer-tainty estimates and resulting payoﬀs. After 30 seconds, thegame would automatically proceed to the next round. Afterthe last round, we asked all participants a few ﬁnal questionsabout their strategy and their understanding of the game.The experiments were implemented using oTree 2.1.35 [68].The two classroom experiments DTU1 and DTU2 diﬀeredfrom the MTurk experiment in a few aspects: 1) the maximumnumber of rounds played was increased from 10 to 30; 2) theinitial bonus given each participant was increased accordinglyfrom $10 to $30; 3) three additional questions were asked inorder to elicit more explicitly some of the implicit assumptionsand explicit behaviours by the students; 4) participants weretold that they would not receive any monetary rewards, butthat they should try to do their best. DTU1 received prizes.Screenshots, additional questions, experimental settings, anda detailed walk-through can be found in the SupplementaryInformation. Payoﬀs and penalties

All MTurk players ﬁnishing the game were paid a partici-pation fee of $2. In addition, a bonus could be earned ifplayers did well. Before the game started, the bonus was setto $10 for all participants. After each round, the bonus wasreduced by a personal penalty, depending on the two players’choices. Penalties are calculated using a logarithmic scor-ing rule and by ordering them to be minimized by success-ful coordinations into the canteen. Penalties are maximizedby any type of miscoordination or forbidden choice (i.e. go-ing to the canteen at 9 am or later). Oﬃce coordinationsare designed to have larger penalties than canteen coordina-tions, but smaller penalties than miscoordinations in order tomake sure that coordination remains the main objective of thegame. Penalties are deﬁned as negative utility values in thefollowing way. First, we deﬁne the chosen action a i by player i , i = 1 ,

2, to take binary values encoding the canteen option( a i = 0) and the oﬃce option ( a i = 1), and deﬁne their re-spective certainty estimates e i ∈ { . , . , . , . , . } .We can then express the utility u received by player 1 as u ( e , a , a ) = (1 −| a − a | + a a ) ln ( e )+2 | a − a | ln (1 − e ),and symmetric for player 2. If any of the players choose thecanteen at 9 am or after, the utility becomes u ( e , a , a ) =2 ln (1 − e ) for player 1 (and symmetric for player 2), cor-responding to a miscoordination. As an example, imagineplayer 1 arrives at 8:40 and choses the canteen, a = 0. Sheestimates the probability that her colleague also will to goto the canteen to “somewhat certain”, e = 0 .

75. If hercolleague indeed chooses the canteen, a = 0, her utilitywill be u ( e , a , a ) = ln ( e ) = − .

29, but if her predic-tion proves false and her colleague chooses the oﬃce instead,her utility will be u ( e , a , a ) = 2 ln (1 − e ) = − .

77. Ifshe goes to the oﬃce just like her colleague, her utility is u ( e , a , a ) = 2 ln ( e ) = − .

58. It should be noted thatthe logarithmic scoring rule used here is not strictly propersince oﬃce and canteen coordinations are penalized diﬀer-ently. Nevertheless, we ﬁnd a good match between estimatesand actual choices at arrival times diﬀerent from those thatare prone to miscoordinations, as seen in Figure 2, indicatingthat loss minimization remained a central concern and thatparticipants made their choices and estimates as honestly aspossible [69, 70].

Formal Analysis

The game can be represented as a game with three players, nature , player 1 and player 2 . Nature is the player that ini-tially decides the arrival times of player 1 and 2. Then player1 and 2 are each informed of their own arrival time, and eachhave to choose among two actions: o for going to the oﬃceand c for going to the canteen. Based on the choice of actionsby all three agents, player 1 and 2 receive a payoﬀ, and theyalways receive the same payoﬀ (we are disregarding the cer-tainty estimates for now). The action choice of nature can berepresented as an arrival pair t = ( t , t ) consisting of the ar-rival time t for player 1 and t for player 2. Any arrival pair t has to satisfy that | t − t | = 10 minutes (we will suppressmentioning the unit, minutes, in the following). In our spe-ciﬁc version of the game, we additionally have the restrictionthat 8:10 ≤ t i ≤ i = 1 ,

2. The analysis of optimalstrategies however doesn’t depend on the exact arrival timesavailable, so we will make things a bit more general and onlyassume that there is an earliest arrival time t min and a lat-est arrival time t max , and that t min ≤ t max ≥ Given t min and t max , the set of possible arrival pairs is de-ﬁned as T = { ( t, t + 10) | t min ≤ t ≤ t max − } ∪ { ( t, t − | t min + 10 ≤ t ≤ t max } .The game starts by nature choosing an element t ∈ T . Na-ture is not a strategic player, so we assume that t is chosenuniformly at random, which is exactly how t is chosen in ourexperiments. The participants do not know that the arrivaltimes are chosen uniformly at random, as this is left implicitin the description of the game. The following analysis of opti-mal strategies in the game could potentially change if arrivaltimes were chosen according to a highly skewed probabilitydistribution.When nature has chosen its action t ∈ T and player 1and 2 have chosen their actions a and a , player 1 and 2receive their payoﬀ, which we denote u t ( a , a ) (the utilityresulting from player 1 choosing a and player 2 choosing a given that nature played t ). We don’t need to make anyassumptions regarding the exact utility values (payoﬀ values),except that succesful coordination into the canteen is alwaysbetter than succesful coordination into the oﬃces, which againis always better than being miscoordinated. Hence, we putthe following constraints on the utility function, for all t ∈ T ,(U1) If t , t < u t ( c, c ) > u t ( o, o ) > u t ( c, o ) = u t ( o, c ).(U2) If t i ≥ i , then u t ( o, o ) > u t ( c, o ) = u t ( o, c ) = u t ( c, c ).A strategy for player i , i = 1 ,

2, is a mapping from arrivalpairs to actions, that is, a mapping s i : T → { c, o } . A strategyfor i simply determines which action i will choose given thearrival pair. Each agent only observes her own arrival time,that is, any two arrival pairs t and t (cid:48) with t i = t (cid:48) i will beindistinguishable to player i , i = 1 ,

2. This immediately leadsto the following formal deﬁnition of the indistinguishabilityrelation ∼ i for player i : t ∼ i t (cid:48) iﬀ t i = t (cid:48) i . We need to requirethe strategy of each player to be uniform , that is, any twoarrival pairs that are indistinguishable by that player shouldbe mapped to the same action: if t ∼ i t (cid:48) then s i ( t ) = s i ( t (cid:48) ).Due to the uniformity condition, we can allow ourselves tooverload the meaning of the symbol s i and write s i ( t i ) as anabbreviation of s i ( t , t ) for i = 1 , s and s , the pair s = ( s , s ) is calleda strategy proﬁle . Given an arrival pair t = ( t , t ), we use s ( t ) as a shorthand for ( s ( t ) , s ( t )). Hence s ( t ) denotesthe choices made by players 1 and 2 when their strategiesare given by s and their arrival times are given by t . Thepayoﬀ of those choices is then u t ( s ( t )). We will use u t ( s ) asan abbreviation of u t ( s ( t )), i.e., u t ( s ) is the utility receivedby player 1 and 2 when they play by strategy proﬁle s inthe game with arrival pair t . The expected utility EU ( s ) of a strategy proﬁle s is the average of the payoﬀs [71]: EU ( s ) = 1 | T | (cid:88) t ∈ T u t ( s ) . Note again that player 1 and 2 get the same payoﬀ (common-payoﬀ game), so there is only one expected utility value tobe computed. A strategy proﬁle s (cid:48) Pareto dominates anotherstrategy proﬁle s if EU ( s (cid:48) ) > EU ( s ) [71]. A strategy proﬁle is Pareto optimal if there does not exist another strategy proﬁledominating it. A strategy proﬁle s (cid:48) weakly Pareto dominates another strategy proﬁle s if EU ( s (cid:48) ) ≥ EU ( s ). The game iscooperative (between player 1 and 2), so both players shouldseek to play a Pareto optimal strategy proﬁle. Also, since itis a common-payoﬀ game, all Pareto optimal strategy proﬁleshave the same expected utility [71].Deﬁne subsets of arrival pairs T , T ⊆ T by: T = { (8:10 + 20 x, y ) ∈ T | y ≤ x ≤ y + 1 } T = { (8:20 + 20 y, x ) ∈ T | y ≤ x ≤ y + 1 } Note that T and T are disjoint and that T = T ∪ T . Wehence get, for any strategy proﬁle s , EU ( s ) = 1 | T | (cid:88) t ∈ T u t ( s ) = 1 | T | (cid:88) i ∈{ , } (cid:88) t ∈ T i u t ( s ) (1)Given a strategy proﬁle s , we let s (cid:22) T i be the restriction of s to T i , that is, s (cid:22) T i is as s except it is only deﬁned on thearrival pairs in T i . So s (cid:22) T i is the strategy proﬁle for thesubgame in which only the arrival pairs in T i can be chosen.We can now rewrite formula (1) as EU ( s ) = 1 | T | (cid:88) t ∈ T u t ( s ) = 1 | T | (cid:88) i ∈{ , } (cid:88) t ∈ T i u t ( s (cid:22) T i ) (2)Note that there exists no t ∈ T , t (cid:48) ∈ T and i ∈ { , } suchthat t ∼ i t (cid:48) . Hence the strategy proﬁle s (cid:22) T can be cho-sen completely independently of the strategy proﬁle s (cid:22) T .Using formula (2) it then follows that a strategy proﬁle s isPareto optimal in the full game if and only if each of thestrategy proﬁles s (cid:22) T and s (cid:22) T are Pareto optimal onthe subgames with arrival pairs only in T and T , respec-tively. When looking for Pareto optimal strategy proﬁles inthe game, we can hence look for Pareto optimal strategy pro-ﬁles on each of the two subgames independently (essentiallythe game consists of two disjoint subgames). Note also thatthe two games are completely symmetric, since ( t , t ) ∈ T ifand only if ( t , t ) ∈ T . Hence the two subgames necessarilyhave the exactly the same strategy proﬁles up to symmetry(swapping the roles of player 1 and 2). It is hence suﬃcientto only investigate Pareto optimal strategies for one of thesesubgames, say the subgame with arrival pairs in T .We will now try to determine the possible candidates forbeing Pareto optimal strategy proﬁles for the subgame witharrival pairs in T . We do this by iteratively removing strategyproﬁles that are not Pareto optimal. Lemma 1.

Going to the canteen at or after can neverbe part of a Pareto optimal strategy proﬁle. More precisely:No strategy proﬁle s with s i ( t i ) = c for some i ∈ { , } and t i ≥ can be Pareto optimal.Proof. Consider a strategy proﬁle s with s i ( t i ) = c for some i ∈ { , } and some t ∈ T with t i ≥ the case of i = 1, the case of i = 2 being proved similarly.Then we have s ( t ) = c and t = 9:10 + 20 x for some x (recall that we have restricted attention to the arrival pairsin T ). Now deﬁne a strategy proﬁle s (cid:48) which is identical to s except s (cid:48) j ( t (cid:48) j ) = o for all j ∈ { , } and t (cid:48) j ≥ s (cid:48) Pareto dominates s . First note that: u (9:10+20 x, x ) ( s (cid:48) )= u (9:10+20 x, x ) ( o, o ) def. of s (cid:48) > u (9:10+20 x, x ) ( c, · ) by U2= u (9:10+20 x, x ) ( s ) , where the last equality follows from s (9:10 + 20 x ) = s ( t ) = c . This proves the existence of an arrival pair for which s (cid:48) hasa strictly higher utility than s . To prove EU ( s (cid:48) ) > EU ( s ), wehence only need to prove that u t (cid:48) ( s (cid:48) ) ≥ u t (cid:48) ( s ) for all t (cid:48) ∈ T .When t (cid:48) , t (cid:48) < s (cid:48) ( t (cid:48) ) = s ( t (cid:48) ) by deﬁnition of s (cid:48) . When t (cid:48) j ≥ j , s (cid:48) ( t (cid:48) )will by deﬁnition of s (cid:48) necessarily have an o in each positionwhere s ( t (cid:48) ) also has one. It follows by constraint U2 that u t (cid:48) ( s (cid:48) ) ≥ u t (cid:48) ( s ), as required. Lemma 2.

Assume s i ( t ) = o and s − i ( t + 10) = c for somestrategy proﬁle s , some arrival time t , and some i ∈ { , } .Then s is not Pareto optimal.Proof. Let s , t and i be as stated above. We need to ﬁnd astrategy proﬁle s (cid:48) Pareto dominating s . We only consider thecase i = 1, the case of i = 2 being symmetric. Then we have s ( t ) = o and s ( t + 10) = c . If s j ( t (cid:48) j ) = c for some j ∈ { , } and t (cid:48) j ≥ s followsimmediately from Lemma 1. We can hence in the followingassume that s j ( t (cid:48) j ) = o for all j ∈ { , } and t (cid:48) j ≥ s ( t + 10) = c , we can thus also conclude that t + 10 < s (cid:48) to be identical to s except that we let s (cid:48) ( t − x ) = c for x ≥ s (cid:48) ( t − − y ) = c for y ≥ , recalling that we are only considering arrival pairs in T . Wewant to show that EU ( s (cid:48) ) > EU ( s ). First note that u ( t,t +10) ( s (cid:48) )= u ( t,t +10) ( c, s ( t + 10)) by def. of s (cid:48) = u ( t,t +10) ( c, c ) by def. of s > u ( t,t +10) ( o, c ) by U1, as t + 10 < u ( t,t +10) ( s ) by def. of s To prove EU ( s (cid:48) ) > EU ( s ), we hence only need to prove that u t (cid:48) ( s (cid:48) ) ≥ u t (cid:48) ( s ) for all t (cid:48) ∈ T . The only non-trivial cases arewhen either t (cid:48) = t − x for some x ≥ t (cid:48) = t − − y for some y ≥ s (cid:48) ( t (cid:48) ) = s ( t (cid:48) )). Considerﬁrst a t (cid:48) with t (cid:48) = t − − y for some y ≥

0. Then t (cid:48) = t − x for some x ≥

0, and hence u t (cid:48) ( s (cid:48) ) = u t (cid:48) ( c, c ).Constraint U1 now immediately gives u t (cid:48) ( c, c ) ≥ u t (cid:48) ( s ), andhence u t (cid:48) ( s (cid:48) ) ≥ u t (cid:48) ( s ), as required. Consider instead t (cid:48) with t (cid:48) = t − x for some x ≥

0. Then either t (cid:48) = t − − y for some y ≥ t (cid:48) = t and t (cid:48) = t + 10. Bothcases have already previously been covered. This completesthe proof. Deﬁnition 3. A cut-oﬀ strategy with cut-oﬀ t (cid:48) is a strategy s with s ( t ) = c for all t < t (cid:48) and s ( t ) = o for all t > t (cid:48) . A cut-oﬀ strategy proﬁle with cut-oﬀ t (cid:48) is a pair ( s , s ) whereboth s and s are cut-oﬀ strategies with cut-oﬀ t (cid:48) . A cut-oﬀstrategy (proﬁle) with cut-oﬀ before is called an all-oﬃcestrategy (proﬁle) . Note that the strategy we in the informal discussions abovereferred to as the “canteen-before-9” strategy is the cut-oﬀstrategy with cut-oﬀ 8:55. We now get the result on optimalstrategies claimed in the informal discussion.

Theorem 4.

Any Pareto optimal strategy proﬁle is eitherthe all-oﬃce strategy proﬁle or the cut-oﬀ strategy proﬁle withcut-oﬀ .Proof.

Let s be a Pareto optimal strategy proﬁle. Assumethat t min is of the form 8:50 − x for some x ≥ t max is of the form 9:00 + 20 y for some y ≥

0, the other cases be-ing treated symmetrically. Let σ be the following string overthe alphabet { o, c } , where we alternate between the strategychoices of player 1 and 2 from t min to t max : s ( t min ) s ( t min + 10) s ( t min + 20) s ( t min + 30) · · · s ( t max )Note that when ( t , t ) ∈ T , then s ( t ) and s ( t ) bothoccur in the string σ . From Lemma 2 it follows that σ cannotcontain the substring oc . Suppose the ﬁrst letter of σ is o .Then since σ does not contain the substring oc , we have σ = o | σ | (a string of only o s). Hence s ( t ) = s ( t ) = o for all t ∈ T . This means that s is the all-oﬃce strategy proﬁle.Suppose alternatively that the ﬁrst letter of σ is c . The lastletter of σ is necessarily o since t max ≥ s is Paretooptimal, cf. Lemma 1. Since σ is then a string that startswith c and ends with o , but doesn’t contain oc , it must havethe form c n o m for some n, m ≥ m + n = | σ | . Hencethere exists a t (cid:48) such that s i ( t ) = c for all t < t (cid:48) and s i ( t ) = o for all t > t (cid:48) (and all i ∈ { , } ). In other words, s is a cut-oﬀ strategy proﬁle with cut-oﬀ t (cid:48) . What is left to prove isthen only that s has cut-oﬀ 8:55. First note that we mustnecessarily have t (cid:48) < s would not bePareto optimal according to Lemma 1. From this it followsthat we for all arrival pairs ( t , t ) ∈ T must have1. If t , t < t (cid:48) , the two players coordinate into the can-teen, receiving the highest possible payoﬀ.2. If t < t (cid:48) < t or t < t (cid:48) < t , the two players aremiscoordinated (one chooses canteen, the other oﬃce),receiving the lowest possible payoﬀ.3. If t , t > t (cid:48) , the two players coordinate into their of-ﬁces, receiving a payoﬀ strictly between the lowest andhighest.Note that there will always be exactly one arrival pair in T of type 2, independent of t (cid:48) . Since arrival pairs of type 1 havea higher payoﬀ than arrival pairs of type 3, and s is Paretooptimal, s must have the maximal number of arrival pairs oftype 1, that is, it is the cut-oﬀ strategy with the latest possiblecut-oﬀ. That is exactly the cut-oﬀ 8:55 (or, more precisely,any cut-oﬀ strictly between 8:50 and 9:00).The theorem proves what was argued in the main text:There are only two candidates for an optimal strategy, theall-oﬃce strategy or the canteen-before-9 strategy. This doesnot in any way imply that we should expect human players toadopt any of these two strategies, but if two perfectly rationalplayers were to play the game, and if they knew they couldexpect the other player to play perfectly rational as well, ofcourse the optimal strategy would be played. And, as earliermentioned, in our particular version of the game, the optimalstrategy is the all-oﬃce strategy. Logistic regression

The experimental results were analyzed using a logistic regres-sion model with the arrival time t as predictor. The modelwas speciﬁed as µ i = α i + β i t , with µ i being the log-odds µ i = log( p i / (1 − p i )) and i = 1 , , p ( t ) = 1 / − α/β , which for theMTurk and DTU1 experiments is t = 8:48 and for DTU2 is t = 8:52. The regression line in Figure 5 is obtained similarlyby combining observations from all three experiments. Thehigh number of observations imply small conﬁdence bands.Hence, conclusions from the models can be viewed as robust. Acknowledgments

Server infrastructure and devops was handled by MikkelBirkegaard Andersen. The authors wish to thank Vincent F. Hendricks for enabling the project. This research was ap-proved by the Institutional Review Board at the Universityof Copenhagen and included informed consent by all partic-ipants in the study. The authors gratefully acknowledge thesupport provided by The Carlsberg Foundation under grantnumber CF 15-0212.

Author contributions

Thomas Bolander developed the game, made the theoreticalanalysis, and contributed as lead author; Robin Engelhardtdesigned the experiment, analyzed the data, and contributedas lead author; Thomas S. Nicolet designed the experiment,analyzed the data, and contributed as lead author. The au-thors declare no conﬂict of interest.[1] R. Fagin, J. Y. Halpern, Y. Moses, M. Y. Vardi, ReasoningAbout Knowledge, MIT Press, 1995.[2] D. Lewis, Convention: A Philosophical Study, 1969.[3] H. H. Clark, C. R. Marshall, Deﬁnite knowledge and mu-tual knowledge (1981).[4] T. C. Schelling, The strategy of conﬂict, Harvard univer-sity press, 1980.[5] R. J. Aumann, Agreeing to disagree, The annals of statis-tics (1976) 1236–1239.[6] I. van de Pol, I. van Rooij, J. Szymanik, Parameterizedcomplexity of theory of mind reasoning in dynamic epis-temic logic, Journal of Logic, Language and Information27 (2018) 255–294.[7] P. Milgrom, An axiomatic characterization of commonknowledge, Econometrica: Journal of the EconometricSociety (1981) 219–222.[8] H. H. Clark, Using language, Cambridge university press,1996.[9] J. W. Bradbury, S. L. Vehrencamp, et al., Principles ofanimal communication (1998).[10] M. Tomasello, Joint attention as social cognition, Jointattention: Its origins and role in development (1995) 103–130.[11] E. Lorini, L. Tummolini, A. Herzig, Establishing mutualbeliefs by joint attention: towards a formal model of publicevents, in: Proc. of CogSci, pp. 1325–1330.[12] T. Bolander, H. van Ditmarsch, A. Herzig, E. Lorini,P. Pardo, F. Schwarzentruber, Announcements to atten-tive agents, Journal of Logic, Language and Information(2015) 1–35.[13] H. Gintis, Rationality and common knowledge, Ratio-nality and Society 22 (2010) 259–282.[14] M. F. Friedell, On the structure of shared awareness,Behavioral Science 14 (1969) 28–39.[15] T. C. Schelling, Bargaining, communication, and limitedwar, Conﬂict Resolution 1 (1957) 19–36.[16] J. Tooby, L. Cosmides, Groups in mind: The coalitionalroots of war and morality, Human morality and sociality:Evolutionary and comparative perspectives (2010) 91–234.[17] Y. N. Harari, Sapiens: A brief history of humankind,Random House, 2014.[18] D. Premack, G. Woodruﬀ, Does the chimpanzee have a theory of mind?, Behavioral and brain sciences 1 (1978)515–526.[19] U. Frith, C. D. Frith, Development and neurophysiologyof mentalizing, Philosophical Transactions of the RoyalSociety of London. Series B: Biological Sciences 358 (2003)459–473.[20] K. Vogeley, P. Bussfeld, A. Newen, S. Herrmann,F. Happ´e, P. Falkai, W. Maier, N. J. Shah, G. R. Fink,K. Zilles, Mind reading: neural mechanisms of theory ofmind and self-perspective, Neuroimage 14 (2001) 170–181.[21] I. Apperly, Mindreaders: the cognitive basis of” theoryof mind”, Psychology Press, 2010.[22] P. N. Johnson-Laird, Mental models: Towards a cogni-tive science of language, inference, and consciousness, 6,Harvard University Press, 1983.[23] K. Gray, A. C. Jenkins, A. S. Heberlein, D. M. Wegner,Distortions of mind perception in psychopathology, Pro-ceedings of the National Academy of Sciences 108 (2011)477–479.[24] S. Baron-Cohen, H. A. Ring, S. Wheelwright, E. T. Bull-more, M. J. Brammer, A. Simmons, S. C. Williams, So-cial intelligence in the normal and autistic brain: an fmristudy, European journal of neuroscience 11 (1999) 1891–1898.[25] S. M. Schaafsma, D. W. Pfaﬀ, R. P. Spunt, R. Adolphs,Deconstructing and reconstructing theory of mind, Trendsin cognitive sciences 19 (2015) 65–72.[26] I. A. Apperly, S. A. Butterﬁll, Do humans have two sys-tems to track beliefs and belief-like states?, Psychologicalreview 116 (2009) 953.[27] R. Saxe, L. Young, Theory of mind: How brains thinkabout thoughts, The Oxford handbook of cognitive neu-roscience 2 (2013) 204–213.[28] A. Gopnik, J. W. Astington, Children’s understanding ofrepresentational change and its relation to the understand-ing of false belief and the appearance-reality distinction,Child development (1988) 26–37.[29] D. O. Stahl, P. W. Wilson, On players’ models of otherplayers: Theory and experimental evidence, Games andEconomic Behavior 10 (1995) 218–254.[30] R. Nagel, Unraveling in guessing games: An experi-mental study, The American Economic Review 85 (1995) Supplementary Information

MTurk Walkthrough and Screenshots

After accepting our ‘Human Intelligence Task’ (HIT) and providing informed consent, participants from MTurk wereput in a ’waiting room’ until they were paired up with another participant. After a group of two was formed,participants were directed to an initial instruction page which detailed the rules of the game with a time limit of 240seconds, see screenshot in Fig. 7.After reading the instructions, participants were directed to round 1 (of 10) where they were given their own arrivaltime and asked to make a decision between between going to the canteen or the oﬃce. Each round had a time limitof 61 seconds and rules from the instructions were repeated on the bottom of the page, see example screenshots forround 1 (Fig. 8).After making their decision (‘Canteen’ or ‘Oﬃce’), participants were asked to estimate how certain they were thatthe other player made the same choice as them, ranging from ‘very uncertain’ over ‘slightly certain’, ‘somewhatcertain’ and ‘quite certain’ to ‘very certain’, see Fig. 9.After both players have made their choices and their certainty estimates, they are prompted to a results pageshowing them the results of the previous rounds, including arrival times for both players, their choices, their owncertainty estimate and resulting payoﬀ, see example screenshot after round 6 (Fig. 10). After 30 seconds, the gamewould automatically proceed to the next round.In many instances, groups were not able to play the maximal number of rounds, because one or both of theparticipants had lost all their bonuses. An example of such a situation is shown in Fig. 11. In such cases, the gamewould end for both players and they were asked to answer a follow-up question:1. “The game is over. Do you think it was your fault it is over, your colleagues fault, or do you think itwas because of some other reason?” (Possible answers: ‘My fault’, ‘Other’s fault’, ‘Other reason’)

This question probes into participants’ ability to rise above their possibly myopic understanding of the game.In addition to this question, all participants were asked three additional post-game questions about their strategyof play and their understanding of the game. The ﬁrst was:2. “What strategy did you use while playing this game?” (open ended)

The answers to this question provided insight into the reasoning and thoughts of the participants. The next questionwas used to gauge the depths of recursive reasoning and reads:3. “Imagine you could have agreed beforehand with your colleague about a point in time where it is safeto go to the canteen. What time would that be?” (‘I don’t know’, ‘There is no such time’, 8:00, 8:10,8:20, 8:30, 8:40, 8:50, 9:00, 9:10)

A ﬁnal question pertaining all participant’s understanding of the concept of common knowledge was the following:4. “Imagine you arrive at 8:10 am. Is it common knowledge between you and your colleague that it is safeto go to the canteen, that is, you both arrived before 9:00 am?”. (‘Yes’, ‘No’, ‘Don’t know’)

DTU Experiments

The DTU students were asked three additional post-game questions:5. “Did you ever go to the canteen at an arrival time later than what was safe according to your previousanswer? Why or why not?” (open ended)

6. ‘Did you ever choose diﬀerently after seeing the same arrival time again at a later point in the game?Why or why not?” (open ended)

7. ‘Imagine you arrived at [8:40/9:00] and you have been secretly informed that your colleague’s arrivaltime is 8:50. Where do you think your colleague will go?” (‘Canteen’, ‘Oﬃce’)

In the last question, half of the participants were given 8:40 as their own arrival time while the other half were given9:00. The question concerns whether player’s own knowledge of the other’s arrival time aﬀect their prediction of theother player’s decision. It relates to the curse of knowledge [1] since participants might attribute their own belief(that it is early enough or too late to go to the canteen) to the other player.6

MTurk Settings

Looking at Table II, the average payout to MTurk-workers was $4.17 (including a general participation fee of $2) whichamounts to an average of more than $20 per hour, which is considered very generous according to MTurk guidelinesand certainly above the recently estimated average of $6 per hour when excluding un-submitted and rejected work[2]. Students in the DTU experiments (DTU1 and DTU2) did not receive any monetary reward, but were told to tryto maximize their payoﬀ, and awarded prizes for doing well.

Experiment Participants Attrition rate N Rounds AvgPayout ($)MTurk 714 0.02 680 10 4.36DTU1 106 0.13 80 30 (prizes)DTU2 50 0.08 42 30 -TABLE II: The main experiment on Amazon Mechanical Turk (MTurk) had 714 participants of which 17 participants (2,4 %)quit prematurely, some of them quite early in the game. These quitters were told (in the consent form) that they would receiveno bonus and no participantion fee. They are excluded from the data analysis. Their “lucky” colleagues however, got boththeir bonuses and participation fee, but are likewise excluded from the data analysis. Therefore the ﬁnal number of subjects, N , is reduced by twice the attrition rate. In the two DTU experiments with students from the Technical University of Denmark(DTU1 and DTU2) attrition rates were slightly higher, mainly due to the higher number of rounds played. Participants quitting a study before completing it is prevalent on MTurk, and varies systemically across experimentalconditions [3]. In our experiments on Amazon Mechanical Turk attrition rates were 2 %, witnessing that we hadmanaged to design the experiment in a way that minimized drop-out rates. A combination of high payouts, alogarithmic scoring rule taking advantage of loss aversion biases, a consent form stipulating the revocation of theparticipation fee after dropout, and minimization of waiting times may have been the main reasons.All participants automatically received a ‘qualiﬁcation’ when accepting a HIT. This qualiﬁcation ensured thatparticipants could not play the game again. In addition, we required that participants should have had completed atleast 500 HITs, have an accepted HIT rate of 98% or above, and should be from the United States or Canada. Thisensured that we would get relatively experienced and qualiﬁed participants.MTurk participant attention was expected to be equal to or better than undergraduate participant’s attention [4],while various forms of dishonesty (practical joking or trying to pair up with a friend) was expected to be rare, dueto the high turnover rate experienced for our HITs. In addition, during the experiment, participants had easy accessto our email for questions and possible bug reports. Apart from a few timeouts, participants had no comments orcomplaints.

Formal deﬁnitions of Private, Shared and Common Knowledge

We can deﬁne the notion of common knowledge and related notions a bit more precisely as follows, following theconventions from epistemic logic (see e.g. Herzig and Mauﬀre [5]). Given a proposition p and an agent i , we use K i p to denote that agent i knows p . Given a group of agents G = { , . . . , m } , we say that p is private knowledge in G if at least one of the agents know p , that is, if K i p is true for some i ∈ G . We use E G p to denote that everybodyin G knows p , that is, for all i ∈ G , it is true that K i p . Whenever it is not necessary to be explicit about the groupof agents G , we will just write Ep and say “everybody knows p ”. For all n , we then recursively deﬁne E n p to beshorthand for EE n − p , where E p is shorthand for Ep . So for instance E p expresses that “everybody knows thateverybody knows that p ”, and in general E n p means we have n iterations of “everybody knows that” in front of p .We read E n p as “everybody knows p to depth/order n ”. We also call this shared knowledge (of p ) to depth/order n , or n th-order shared knowledge . When we say that p is shared knowledge , we mean that it is shared knowledge todepth n for some n ≥ Common knowledge of p then means that E n p is true for all n ∈ N .In epistemic logic, the three notions—private, shared and common knowledge—are usually not considered to bemutually exclusive. So if p is common knowledge, it is also automatically both shared and private, since whenthe conditions for p being common knowledge are satisﬁed, also the conditions for it being shared and private aresatisﬁed. However, in many cases, as in our paper, we want to make an exclusive distinction between the three typesof knowledge. We can deﬁne p to be shared knowledge only if it is shared knowledge but not common knowledge.Thus, p is shared knowledge only if for some n we have E n p but not E n +1 p . Similarly, we can say that p is privateknowledge only if p is private but not shared knowledge. Thus, p is private knowledge only if K i p holds for some, butnot all, i . In most texts, as in ours, it is left implicit whether private and shared knowledge are interpreted inclusiveor exclusive, that is, one doesn’t explicitly distinguish between “shared knowledge” and “shared knowledge only”.Normally it is clear from the context whether one intends the concept to be interpreted exclusively or inclusively.In our paper, we interpret the concepts exclusively, although we make an exception for private knowledge. When p is known by all agents, we say that p is both private and (ﬁrst-order) shared knowledge. The exact border betweenprivate and shared knowledge vary signiﬁcantly between diﬀerent papers. De Freitas [6] consider the case Ep to still7only be private knowledge, and for p to be considered shared knowledge furthermore requires that there is at leastone agent i knowing Ep to be true (that is, requires K i Ep to be true for some i ∈ G ). The point of De Freitas et al.is that if only Ep is true, it is not really shared knowledge, but only private knowledge held by everyone in G . In ourpaper, we have sought a compromise between the terminology by De Freitas et al. and the standard terminology inepistemic logic, and hence we have the overlap between private and shared knowledge. Supporting data analysis

When a group experiences a round of miscoordination, we expect some kind of learning to take place. ‘Why didmy colleague choose diﬀerently than I did?’ should be an obvious question a player asks herself, prompting deeperperspective-taking and possibly an understanding of the lack of common knowledge. We investigate this by partitioningdecisions into those in which a participant never before has experienced a miscoodination with her colleague ( m = 0)and those in which a participant has experienced one or more miscoodinations ( m > Datasets

MTurk anonymous.xlsx, DTU1 anonymous.xlsx, and DTU2 anonymous.xlsx: Anonymized data set of all MechanicalTurk experiments. Parameters: session = name of experiment; code = anonymized participant id; group = groupnumber in session; id in group = player id in group; round = round number; arrival = arrival time; choice = choicemade by participant; certainty = certainty estimate by participant; bonus = penalty in dollars; strategy = free textquestion after game has ended; simple = answers to question 4, cutoﬀ = answers to question 3; fault = answers toquestion 1; payoﬀ = money left after game has ﬁnished. [1] S. A. Birch, P. Bloom, The curse of knowledge in reasoning about false beliefs, Psychological Science 18 (2007) 382–386.[2] K. Hara, et al., A data-driven analysis of workers’ earnings on amazon mechanical turk, Proc. of the 2018 Conference onHuman Factors in Computing Systems ACM (2018) 449.[3] H. Zhou, A. Fishbach, The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yetfalse) research conclusions, Journal of Personality and Social Psychology 111 (2016) 493–504.[4] D. G. Rand, P. Bloom, The promise of Mechanical Turk: how online labor markets can help theorists run behavioralexperiments, Journal of Theoretical Biology 299 (2012) 172–9.[5] A. Herzig, F. Maﬀre, How to share knowledge by gossiping, Springer (2015) 249–263.[6] J. De Freitas, K. A. Thomas, P. DeScioli, S. Pinker, Common knowledge, coordination, and strategic mentalizing in humansocial life, Proceedings of the National Academy of Sciences 116 (2019) 13751–13758. FIG. 7: Screenshot of instructions page. FIG. 8: Screenshot of choice page, round 1.FIG. 9: Screenshot of page where participants had to estimate the probability that their colleague would make the same choice. FIG. 10: Screenshot of results page, round 6. FIG. 11: Screenshot after a player has lost all her bonus. DUULYDO I U HTXHQ F\ R I F DQ W HHQ F KR L F H 07XUN PLVFRRUGLQDWLRQV Q !Q (a) DUULYDO I U HTXHQ F\ R I F DQ W HHQ F KR L F H '78'78 PLVFRRUGLQDWLRQV Q !Q (b) FIG. 12: Participant’s decisions of going to the canteen as a function of their arrival time, here partitioned into those groupswho previous have experienced zero (blue) or one or more (orange) miscoordinations. MTurk participants are shown one theleft, DTU students on the right DUULYDO YHU\XQFHUWDLQ VOLJKWO\FHUWDLQ VRPHZKDWFHUWDLQ TXLWHFHUWDLQ YHU\FHUWDLQ F H U W D L Q W \ R I F RR U G L QD W L RQ 07XUN (a) DUULYDO YHU\XQFHUWDLQ VOLJKWO\FHUWDLQ VRPHZKDWFHUWDLQ TXLWHFHUWDLQ YHU\FHUWDLQ F H U W D L Q W \ R I F RR U G L QD W L RQ '78'78 (b)(b)