[PDF] A Game-Theoretic Approach for Hierarchical Policy-Making

Abstract

We present the design and analysis of a multi-level game-theoretic model of hierarchical policy-making, inspired by policy responses to the COVID-19 pandemic. Our model captures the potentially mismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments) with respect to two main cost components that have opposite dependence on the policy strength, such as post-intervention infection rates and the cost of policy implementation. Our model further includes a crucial third factor in decisions: a cost of non-compliance with the policy-maker immediately above in the hierarchy, such as non-compliance of state with federal policies. Our first contribution is a closed-form approximation of a recently published agent-based model to compute the number of infections for any implemented policy. Second, we present a novel equilibrium selection criterion that addresses common issues with equilibrium multiplicity in our setting. Third, we propose a hierarchical algorithm based on best response dynamics for computing an approximate equilibrium of the hierarchical policy-making game consistent with our solution concept. Finally, we present an empirical investigation of equilibrium policy strategies in this game in terms of the extent of free riding as well as fairness in the distribution of costs depending on game parameters such as the degree of centralization and disagreements about policy priorities among the agents.

Full PDF

AA Game-Theoretic Approach for Hierarchical Policy-Making

Feiran Jia, Aditya Mate, Zun Li, Shahin Jabbari, Mithun ChakrabortyMilind Tambe, Michael Wellman, Yevgeniy VorobeychikFebruary 23, 2021

Abstract

We present the design and analysis of a multi-level game-theoretic model of hierarchical policy-making, inspired by policy responses to the COVID-19 pandemic. Our model captures the potentiallymismatched priorities among a hierarchy of policy-makers (e.g., federal, state, and local governments)with respect to two main cost components that have opposite dependence on the policy strength, suchas post-intervention infection rates and the cost of policy implementation. Our model includes a crucialthird factor in decisions: a cost of non-compliance with the policy-maker immediately above in thehierarchy, such as non-compliance of state with federal policies. Our ﬁrst contribution is a closed-formapproximation of a recently published agent-based model to compute the number of infections for anyimplemented policy. Second, we present a novel equilibrium selection criterion that addresses commonissues with equilibrium multiplicity in our setting. Third, we propose a hierarchical algorithm basedon best response dynamics for computing an approximate equilibrium of the hierarchical policy-makinggame consistent with our solution concept. Finally, we present an empirical investigation of equilibriumpolicy strategies in this game in terms of the extent of free-riding as well as fairness in the distributionof costs, depending on parameters such as the degree of centralization and disagreements about policypriorities among the agents.

Democratic governments and institutions typically have a hierarchical structure. For example, policies inthe U.S. emerge from complex interactions among the federal and state governments, as well as countyboards and city councils and mayors. Similar structure exists in Canada and in European democracies.Such policy interactions are hierarchical, with higher levels in the hierarchy able to impose some constraintson the policies immediately below (e.g., the U.S. federal government can constrain what state policies canbe). Violations of these constraints, in turn, entail a non-compliance cost to the violator, such as legal costs,penalties, or reputation loss. Many examples of such hierarchical policy structure commonly arise, such as ineducational and vaccination decisions, as well as in devising policies for controlling a pandemic. Take COVID-19 social distancing policies as a concrete example. These policies commonly include recommendations atthe national level, guidelines and restrictions at the state/province/district level, and policies for speciﬁccounties or cities. Moreover, a common feature of such hierarchical policy-making is that what ultimatelymatters are the policies actually deployed at the lowest level, since these are often most practical to enforce.In general, policies are contentious. Agents at all levels of the policy-making hierarchy may disagree aboutthe best policies, or more fundamentally, about the particular tradeoﬀs made in devising policies. For exam-ple, COVID-19 social distancing measures have considerable costs, both economic and socio-psychological,but lack thereof results in more people who become infected; diﬀerent institutions disagree on how to tradeoﬀ these concerns.We propose a general model of hierarchical policy-making as a game among the policy-makers at all levelsof the hierarchy. In this game, policies at the higher levels have an impact by imposing non-compliance costson lower levels, but ultimate implementation of policies happens at the lowest level. Each agent in thisgame trades oﬀ two types of costs: policy implementation cost (e.g., socio-psychological or economical1 a r X i v : . [ c s . G T ] F e b mpacts of lockdowns) and policy impact cost (e.g., number of COVID-19 infections). Besides the impacton the structure of agent utilities, the hierarchy also impacts the sequence of moves: agents at higher levelsprecede lower levels (e.g. by announcing guidelines), with the latter observing and reacting to the policyrecommendations by levels above them.In our game-theoretic model, each agent’s action/pure strategy is a single, bounded scalar that representsthe degree of social distancing in the pandemic-response context. Our ﬁrst contribution is a novel solutionconcept which reﬁnes the subgame perfect Nash equilibrium, accounting for commonly occurring indiﬀer-ences. Our second contribution is an analytic version of a recently proposed agent-based model (ABM) forCOVID-19 pandemic spread estimation that accounts for social distancing [28]; we show that our analyticmodel closely mirrors short-term behavior of the ABM with a much shorter computational cost comparedto the ABM.We use our modeling framework to experimentally investigate possible phenomena arising from decen-tralized policy-making.One of our questions relates to policy free-riding : Is it possible that (in equilibrium) a player lower inthe hierarchy adopts a weak policy with a low implementation cost while imposing a negative externalityon another player (perhaps on the same level) and hence also enjoying lower infection numbers owing to thelatter player’s stronger policy? Can a higher-level policy-maker mitigate such free-riding via non-compliancepenalties? We show that the answer depends in a complex manner on diﬀerent parameters such as initialinfection rates, degree of contact among diﬀerent parts of the population, weights on diﬀerent types ofcost, and the non-compliance cost structure. Our second set of experiments measures the fairness in thedistribution of costs as a function of model parameters as well as degrees of centralization. Our work is related to the line of research applying the social and behavioral sciences to the cost-beneﬁtanalysis of both centralized and decentralized decision-making under pandemic/epidemic conditions [24].Some papers approach these trade-oﬀs from an optimal control perspective [5, 17–19]. Others study theequilibria of various game-theoretic models of individuals deciding whether to follow guidelines for preventivemeasures (distancing, vaccination, etc.) and treatment, possibly against the (perceived) aggregate behaviorof the population, under various models of disease propagation; e.g. the diﬀerential game model [15], the“wait and see” model of vaccinating behavior [1], evolutionary game-theoretic models [2, 10], and variousothers [3, 4, 8, 22] (see, e.g. [23] for a summary). We distinguish from these works by modeling the strategicinteractions among ideologically diverse, hierarchical policymakers with explicit non-compliance penalties ,and experimentally assess the impact of such interactions upon the actually implemented policies undervarious parameter settings.Also related is the literature on ABM for pandemic spread and response policies that account for pref-erences/incentives of individuals [6, 9, 28]. [28] is of particular importance since our policy impact cost iscomputed by a closed-form approximation to their model. Other recent work includes the assessment of theimpact of prevention and containment policies on the spread COVID-19 via causal analysis [12], Gaussianprocesses [14], and state-of-the-art data-driven non-pharmaceutical intervention models [20].Instead of an analytic treatment, we empirically compute the (approximate) equilibrium of our complex,multi-level, continuous-action game using algorithmic approaches that exploit the structure of the problem.Thus, our methods belong to the category of empirical game-theoretic analysis (see, e.g. [7, 25–27]).

Consider as a running example COVID-19 hierarchical policy-making with three levels of decision-makers:Government (a single agent), States, and Counties. Each agent is a player in a game and chooses a socialdistancing policy (recommended or enforced). Next, we present a formal game theoretic model of this kind2 𝒂 Government 𝒂 𝒂 𝒂 Counties … States 𝒂 &𝒎 𝒂 &𝟐 𝒂 &𝟏 … 𝒂 Figure 1: Hierarchy of policy-makers with 3 levels: Government, States and Counties.of hierarchical policy-making, focusing on strategic interactions among players both within and across thelevels in this hierarchy.Let [ m ] denote the set { , , . . . , m } for any m ∈ Z + . We represent the players in the hierarchicalpolicy-making game (HPMG) by nodes in a directed rooted tree (we will use the terms players and nodesinterchangeably), as illustrated in Figure 1 for our running example. A general HPMG has

L > l ∈ [ L ] is associated with a set of nodes/players, denoted by L l , with n l = |L l | thenumber of players in level l . Without loss of generality, let n = 1 (we can always add a dummy layer witha single player who has a single strategy); we call the player in level 1, denoted by a the root-node . The i th node/player in an arbitrary level l is denoted by a l,i . For each player in each level l < L , a ∈ L l , let χ ( a ) bethe set of its children in the tree; clearly, | χ ( a ) | ≥ a , and (cid:80) a ∈L l | χ ( a ) | = n l +1 for each l . Likewise,for every player a ∈ L l and for each l >

1, let π ( a ) be its unique parent in the tree, where a directed edgegoes from a parent to its child. In our running example in Figure 1, a is the Government in level 1, level 2consists of two States, both children of a , χ ( a ) = { a , , a , } , and level 3 consists of Counties.Each player a can take a scalar action α a ∈ [0 , α denotes the proﬁle of actions of all players, and α l the restriction of this proﬁle to a particular level l . In our pandemic-response policy-making example, α a is an abstraction of the policy adopted by a , capturing the the extent of overall activity (conversely, 1 − α a represents the extent of social distancing implemented/recommended by a ). Thus, small α a corresponds tothe greatest reduction in infection spread (due to stricter social distancing). On the other hand, a large α a will entail a higher policy implementation cost, such as socio-economic and psychological costs of socialdistancing. At the extremes of our illustration, α a = 1 signiﬁes no intervention, while α a = 0 correspondsto a complete lockdown. HPMG is a sequential game in which players make strategic decisions following the sequence of layers.Speciﬁcally, the player in level 1 moves (i.e., chooses a strategy) ﬁrst, followed by all players in level 2, whoﬁrst observe the strategy of a and simultaneously choose a joint strategy proﬁle in response. This is thenfollowed by all players in level 3, and so on. Thus, all players in the same level l make strategic choicessimultaneously.Because all the utilities in our main application of this model (COVID-19 social distancing policies) arenegative (i.e., costs), we next deﬁne the general model in terms of costs (negative utilities). The cost functionof each player a has three components: policy impact cost , C inc a ( α ), policy implementation cost , C dec a ( α ), and,for each player in levels l > non-compliance cost , C NC a ( α a , α π ( a ) ). In the COVID-19 example, policy impactcost is a measure of infection spread (number of people infected in the player’s geographic area, say), whileimplementation cost can be a psychological and economic costs of a lockdown. The non-compliance cost,in turn, is a penalty imposed by a policy-maker upon an agent within its jurisdiction for deviating from itsrecommendation (e.g., a ﬁne, litigation costs, or reputational harm). An important piece of structure to thepolicy implementation and impact costs is that they directly depend for a player a not on the full proﬁleof strategies by all players, but only on the layer l of the player a if l = L , and only the layer immediatelybelow otherwise. To formalize, we introduce for each player a the notion of its share µ a ∈ [0 , µ ( a ) = 1, while the shares of the nodes in the lowest level L are arbitrary, except for the constraint Σ a ∈L L µ a = 1. For a level 1 < l < L , we have µ a = Σ a (cid:48) ∈ χ ( a ) µ a (cid:48) forevery a ∈ L l . We now use the notion of shares to formally deﬁne the impact and implementation costs of3olicies. • For each lowest-level player a ∈ L L , C inc a ( α ) depends only on α L , lies in [0 , α a ∈ α L ; we provide further speciﬁcs of this function for our pandemic-response example inSection 2.3. For a higher-level player a ∈ L l , l < L , this cost is the share-weighted aggregate of thoseof its child-nodes: C inc a ( α ) = µ a Σ a (cid:48) ∈ χ ( a ) µ a (cid:48) C inc a (cid:48) ( α ) . • For each a ∈ L L , C dec a ( α ) ∈ [0 ,

1] depends only on, and is non-increasing, in α a ; in particular, in ourpandemic-response example, we simply focus on the function C dec a ( α ) = 1 − α a . Also, for each a ∈ L l , l < L , C dec a ( α ) = µ a Σ a (cid:48) ∈ χ ( a ) µ a (cid:48) C dec a (cid:48) ( α ) . Finally, we consider two variants of the non-compliance cost: one-sided under which there is no penaltyfor an α lower than that of the parent (capturing scenarios such as a policy-maker only punishing policyresponses weaker than its recommendation), and two-sided under which any deviation is penalized regardlessof direction [16], with the discrepancy being measured by the Euclidean distance for either variant: C NC a ( α, α (cid:48) ) = (cid:40) (max { , α − α (cid:48) } ) , if one-sided;( α − α (cid:48) ) , if two-sided.Finally, each player a ∈ L l for l > weights κ a ≥ η a ≥ a is given by C a ( α ) := κ a C inc a ( α ) + η a C dec a ( α ) + γ a C NC ( α a , α π ( a ) ) , where γ a = 1 − κ a − η a . The player a obviously has no non-compliance issues, hence it has only one weight κ a >

0, its overall cost being C a ( α ) := κ a C inc a ( α ) + (1 − κ a ) C dec a ( α ) . The solution concept we are primarily interested in is a pure-strategy subgame perfect Nash equilibrium (PSPNE) [21] of our continuous-action game which is sequential-move between levels and simultaneous-movewithin a level. However, the game may have multiple such equilibria, leading to an equilibrium selectionproblem .An extreme but simple motivating scenario which gives rise to a multiplicity of equilibria, many of whichare unreasonable, is when a lowest-level player a has non-compliance weight γ a = 1 under a one-sided coststructure: player a would be indiﬀerent among all values α a ∈ [0 , α π ( a ) ] since any such value induces anoverall cost of 0. Such indiﬀerence could also characterize the best response of a higher-level player. Considera two-level variant of the game in Figure 1 (e.g., when counties are constrained to be compliant with therespective states); for each state a ∈ { a , , a , } , let κ a = 0 and η a = 0 .

6, hence γ a = 0 .

4. Straightforwardcalculations show that the local minimum α ∗ a of the overall cost of any such any state a over [0 , α a ] is α ∗ a = α a with a cost of 0 . − α a ) and that over ( α a ,

1] is α ∗ a (cid:48) = (cid:40) , cost = 0 . − α a ) , α a ≥ . α a + 0 . , cost = 0 . − . α a , otherwise.Thus, the unique best response of either state (whose costs are independent of each other) to any governmentpolicy α a ≥ .

25 is 1, i.e., there are inﬁnitely many equilibria with the government recommending any α a ≥ .

25 but each state choosing 1 regardless. The fact that the government would recommend a policy4ntervention (which could be as strong as α a = 0 .

25) knowing fully well that both states would choose nointervention even under the threat of a non-compliance penalty seems absurd, but this absurdity cannot beeliminated by the above solution concept.With this in mind, we propose and use the following equilibrium selection criterion. For any player a ∈ L l , l < L , deﬁne its social cost SC a ( α χ ( a ) ) for any action proﬁle α χ ( a ) of its children as the share-weightedaggregate of the overall costs of its children, that is: SC a ( α χ ( a ) ) := µ a (cid:80) a (cid:48) ∈ χ ( a ) µ a (cid:48) C a ( α ). Evidently, thisquantity is, in general, distinct from C a ( α ).If multiple values of α a induce equilibria for a particular α χ ( a ) , then we will pick the α a which minimizes SC a ( α χ ( a ) ), breaking further ties in favor of a higher α a (i.e., smaller policy impact). We refer to this solutionconcept, which is a reﬁnement of PSPNE, as minimal-impact pure-strategy subgame perfect Nash equilibrium(MI-PSPNE) .In general, a MI-PSPNE will not exist. Consequently, we will seek to compute an (cid:15) -MI-PSPNE, where (cid:15) is the highest beneﬁt from deviation by any player a . Below (Section 3) we present a general approach forﬁnding such approximate equilibria in our setting. We now come to the particular instantiation of C inc a ( · ) for each of the lowest-level players a ∈ L L (Counties inFigure 1). Recently, Wilder et al. [28] developed and analyzed an agent-based model (ABM) for COVID-19spread that accounts for the degree of contact (both within and between households) among individualsfrom diﬀerent parts of a population. However, this ABM is computationally expensive, making its use forequilibrium computation impractical at scale. In this section, we will derive a closed-form model of infectionspread that (as we show below) relatively closely mirrors the expected number of infections of the ABM overa short horizon.Let N a and I a denote the ﬁxed population of County a and the number of infections in a before policyintervention respectively. An individual who is not currently infected but can develop an infection oncontact with someone infected is susceptible . We call an individual from County a (cid:48) active in County a ifthat individual is capable of making contact (through travel etc.) with a susceptible individual in County a ; if a (cid:48) = a , we say that the individual is active within County a . A major parameter of the ABM is the transport matrix R = { r aa (cid:48) } a,a (cid:48) ∈L L , where r aa (cid:48) ≥ a (cid:48) that isactive in County a in the absence of an intervention. Thus, in the absence of policy intervention, the totalnumber of individuals from County a (cid:48) active in County a (cid:54) = a (cid:48) is N a (cid:48) r aa (cid:48) and the total number of infectedindividuals from County a (cid:48) active in County a (cid:54) = a (cid:48) is I a (cid:48) r aa (cid:48) .The policy α a aﬀects the population in two ways: it scales down both the susceptible and active sub-populations. In other words, under the policy intervention, County a has ( N a − I a ) α a susceptible individuals,and there are N a (cid:48) α a (cid:48) r aa (cid:48) active individuals in County a from County a (cid:48) , out of whom I a (cid:48) α a (cid:48) r aa (cid:48) are (initially)infected. Hence, the proportion of infected active individuals in County a is given by ρ a ( α L ) := (cid:80) a (cid:48) ∈L L I a (cid:48) α a (cid:48) r aa (cid:48) (cid:80) a (cid:48) ∈L L N a (cid:48) α a (cid:48) r aa (cid:48) . We will now focus on an arbitrary susceptible individual in County a and lay down our assumptions onthe process why which she may contract an infection: This individual makes actual contact with a randomsample of X active individuals drawn from a Poisson distribution with mean C , which is a parameter in ourmodel [28]; this distribution is ﬁxed across all individuals in all Counties, and all these contacts are mutuallyindependent. The next assumption is that, in this sample of X contacts for a susceptible individual inCounty a , the proportion of infected individuals is ρ a ( α L ).Let p ∈ (0 ,

1) denote the probability that a susceptible individual becomes infected upon contact with aninfected individual, i.e. the probability that contact with an infected individual does not infect a susceptibleindividual is (1 − p ). Since all Xρ a ( α ) infected contacts of an arbitrary susceptible individual are mutually The model in Wilder et al. [28] is an individual-level variant of the well-known susceptible-exposed-infectious-recovered orSEIR model but this paper assumes that every exposed person eventually becomes infected after an incubation period. − (1 − p ) Xρ a ( α ) . Wealso interpret this as the proportion of the ( N a − I a ) α a susceptible individuals in County a who end upgetting infected. Let Infect a ( α ) denote the expected number of additional, post-intervention infections inCounty a . Thus, Infect a ( α ) = E X [( N a − I a ) α a (1 − (1 − p ) Xρ a ( α ) )] = ( N a − I a ) α a (1 − E X [((1 − p ) ρ a ( α ) ) X ]) . Deﬁne y a ( α L ) := (1 − p ) ρ a ( α ) . Since X ∼ Poisson ( C ), Proposition 1 in Appendix A tells us that Infect a ( α ) = ( N a − I a ) α a (1 − e − C (1 − y a ( α L )) ) . (1)Finally, we deﬁne the infection cost to be C inc a ( α ) = Infect a ( α ) /N a . (a) New infections vs. policy (shared by Counties). (b) New infections vs. initial infection rate (shared byCounties). Figure 2: Comparison of ABM output (solid lines) with closed-form approximation (dashed lines).We ran some preliminary experiments comparing Equation (1) with the actual output of the ABM [28];partial results are shown in Figure 2. Note that Equation (1) is a one-shot formula, whereas the ABMcomputes contacts and infections recursively over several time-periods with an initial incubation period so that the eﬀect of the ﬁrst-period contacts are manifested only after a delay. Hence, we contrast theABM output after 8 periods (to account for the average incubation period of 7 days [11]) with the aboveclosed-form estimation. In the experiments we report, we have 2 States under the Government, each Statehaving 2 Counties (4 Counties in total); each County a has a population of N a = 250; the transport matrix issymmetric, given by r aa (cid:48) = 0 .

25 for every pair of Counties a, a (cid:48) . We set p = 0 .

047 [28] and C = 15 (calculatedbased on Prem et al. [13]). For each set of experiments (represented by a separate color in Figure 2), eachCounty has the same initial infection rate I a /N a and applies the same policy α . In Figure 2a, we vary α onthe x-axis, for diﬀerent (ﬁxed) values of I a /N a which is the same for all Counties; similarly, in Figure 2b,for diﬀerent policies, we vary the initial infection rates. The plots indicate qualitative similarity betweenthe ABM and our approximation; a salient point of similarity is that the additional number of infectionsdecreases as the initial infection rate gets higher or lower than a middling point, everything else remainingthe same. This is because a higher infection rate implies less “room for growth” due to a ﬁxed population,whereas a lower value of the same rate causes fewer further infections over the same horizon. Our HPMG model is essentially an extensive-form game model endowed with one-dimensional action space foreach agent resulting in a non-convex strategic landscape. To seek for PSPNE in the hierarchical game(HG),we propose a backward induction algorithm incorporated with a payoﬀ point query interface and a bestresponse computation component solving for a joint-policy proﬁle in equilibrium. The algorithm exploits6

LGORITHM 1:

HG-PSPNE

Input : α l − . Parameter :param l = { T l : L , k l : L , e l : L } . Let t ← , (cid:15) l ← ∞ . Initialize α l randomly. while t ≤ T l or (cid:15) l ≤ e l do for a l,i in L l do if l is the lowest level L then α (cid:48) a l,i ← arg min α l,i C a l,i ( α ) else α (cid:48) a l,i ← arg min α l,i C a l,i (HG-PSPNE( α l )) end if end for Calculate (cid:15) l for proﬁle α l and update (cid:15) l if lower than the current value. Pick k l agents to best respond to α (cid:48) l,i . t ← t + 1. end while return α ∗ where α ∗ l has the lowest (cid:15) l . the hierarchical structure by propagating strategic information between consecutive levels, detailed as fol-lows. Given a joint action proﬁle at levels 1 , . . . , l −

1, the players at level l compose a simultaneous-movegame whose payoﬀs emerge from the strategic interactions from levels below them. To obtain payoﬀs fora certain action proﬁle at level l , we recursively call to the next level l + 1 till we reach the bottom level L . Then at level l , we use these payoﬀs to solve for an approximate Nash equilibrium. Since every suchsimultaneous-move game lacks the tractable analytic payoﬀ structure for gradient-based optimization, in ourcurrent implementation we discretize the inﬁnite strategy space and adopt best response dynamics (BRD)for equilibrium computation.Algorithm 1 computes the (cid:15) − equilibrium among players at a single level l given tunable parameters. An (cid:15) − equilibrium at level l is an action proﬁle α l where no agent α a l,i can decrease their cost by more than (cid:15) by a unilateral deviation (Lines 3–6). Let α l : l denote the sequence of actions α l , ..., α l ; T l : L = T l , ..., T L the maximum numbers of steps of BRD at each level l > e l : L = e l , ..., e L the limits of (cid:15) for each level. Ateach round t < T , we randomly select a subset of k l out of n l agents to best respond simultaneously to theexisting proﬁle; we call the variant with k l = n l synchronous BRD . We report some experimental results onthe dependence of the number of BRD steps to reach equilibrium on the sample size k l in Appendix B. Toincrease eﬃciency, we should pick a subset of agents that can improve their payoﬀs after best-responding(Line 7). The synchronous BRD might get trapped in a cycle of moves. The way we solve this issue is tokeep a memory of the moves, then check whether the new proﬁle already exists in the memory and, if yes(i.e. a cycle is detected), we jump to a new proﬁle and resume the BRD. Finally, the algorithm returns theproﬁle with the lowest (cid:15) l when the termination condition is met.To search for the best strategy, we discretize the continuous strategy space and use grid search withtie-breaking (smaller policy impact) to recover the optimum value. However, in the experiments shownFigure 2a, we observe that our approximation of the infection cost is nearly linear. Although we have noguarantees, it is reasonable to ask whether the overall cost of a lowest-level player is almost convex in itspolicy (given the particular closed forms we use for the implementation and non-compliance costs) and hencewhether we could use binary search (i.e. the bisection methods) to speed up our BRD. Figure 3 shows therun-time of a two-level game for n = 10 to 100 players in the second level when we replace the grid searchin the lowest level with the binary search under a symmetric setting (i.e. equal populations and symmetrictransport matrix in level 2). In those experiments, we all ﬁnd the PSPNEs with (cid:15) = 0. Binary search yieldssame results as grid search, but is more eﬃcient. 7 n T i m e ( s ec s ) Binary Search vs Grid Search: Runtime

Binary SearchGrid Search

Figure 3: Run-time in secs (y-axis) of binary search and grid search as a function of n (x-axis) in a symmetricsetting. In this section, we describe two sets of experiments on our HPMG framework using the methodology discussedin Section 3. Given the large and complex set of game parameters, we report the most insightful experimentalresults we obtained, deferring additional results to the full version. In Section 4.1, we quantify the notion of free-riding and explore conditions under which free-riding appears in equilibrium and can be circumventedby non-compliance penalties. In Section 4.2, we study how diﬀerent degrees of centralization and mismatchedpriorities of players in HPMG can impact fairness in the distribution of costs. In all our experiments, we usethe 3-level HPMG of Figure 1 with 2 States, denoted simply by 1 and 2 (also in subscripts), and a number ofCounties (to be speciﬁed) with equal population. Moreover, we say that a setting has transport symmetry ifthe transport matrix R is proportional to an identity matrix i.e. r aa (cid:48) = 1 /n L for any two lowest-level players a, a (cid:48) . There are inﬁnite ways in which R could be asymmetric; we focus on a particular type of asymmetrywhere one subset of Counties (or more generally lowest-level players) F are globally favorite destinations (and equally popular) and all others are equally (un)popular, i.e. for each County j , r ij = r H > r L = r kj for each i ∈ F and each k ∈ L \ F for some 0 < r L < r H <

1, and (cid:80) i ∈L r ij = 1. (a) Counties constrained to comply. (b) Counties free to not comply. (c) Counties free to not comply. Figure 4: Free-riding (y-axis) as a function of non-compliance cost weight (x-axis). Each curve correspondsto a diﬀerent initial infection rate of State 2 (Init Inf) as indicated in the legend.We begin with the rationale for our measurement and visualization of free-riding. Suppose State 2 has8 higher initial infection (than State 1); then, intuitively, it may prefer a weak distancing policy ( α (cid:29) α (cid:28) α − α approaches − κ a = 0 .

5) and an even split of the population between States (i.e. N = N = 500, hence µ = µ =0 . I a /N a (as we discuss shortly). In all experiments, each State consists of 5 Counties.Figure 4 depicts our results under transport symmetry and other conditions which we will now detail.First, we assume that all Counties are constrained to comply with their respective States so that the policiesset by States 1 and 2 actually get implemented in their respective jurisdictions (i.e. we essentially havea 2-level HPMG, hence the number and weights of Counties are immaterial). In Figure 4a, we plot thevariation in this policy-diﬀerence against the States’ shared non-compliance weight γ a under the followingconditions: State 1’s initial infection rate is ﬁxed at 0 . { . , . , . } ;for either State a ∈ { , } , we have κ a = 0 . − γ a ) and η a = 0 . − γ a ) for each value of the non-complianceweight. We observe that free-riding is exacerbated as State 2’s initial infection rate becomes larger, althougha high enough non-compliance weight will mitigate the problem. However, interestingly, lower values of thenon-compliance weight also exhibit a lower degree of free-riding. Increasing the non-compliance weight forcesboth States to monotonically weaken their policies (towards 1) but State 1 is more conservative, maintaininga strict policy (at 0) up to a non-compliance weight of (at least) 0.15 and only then weakening its policyto the level of State 2, as shown in Figures 8 and 9 in Appendix C. This accounts for the non-monotonicdependence of free-riding on non-compliance cost weight as observed in Figure 4a.What happens when we allow Counties to not comply with the respective States? We report results fora setting where each County’s initial infection rate and weight vector is identical to that of its correspondingState. Recall that, with Counties no longer constrained to comply, State policies are recommendations policies whereas those that are implemented are County actions. With this mind, we report in Figures 4band 4c the diﬀerence in State policies α − α as well as the diﬀerence (cid:104) α (cid:105) − (cid:104) α (cid:105) , where (cid:104) α a (cid:105) is the averageof the equilibrium policies set by all Counties ins State a ∈ { , } , over the same combinations of weightsand initial infection rates as Figure 4a. We ﬁnd virtually no evidence of free-riding from either measure (inthe extreme case represented by the lowest curve Figure 4c is perhaps better interpreted as State 2 givingup on policy intervention rather than free-riding oﬀ State 1). This indicates that distributing autonomouspolicy-making among several smaller-scale actors may also have a mitigating eﬀect on free-riding, makingthe impact on free-riding of non-compliance penalties from the highest level weaker.We now repeat these experiments but in a speciﬁc setting violating transport symmetry with Countiesconstrained to comply: we make State 1 the favorite destination with r H = 0 . κ = 0 . − γ ) and κ = 0 . − γ ) respectively (still with γ = γ ), State 1’s initial infection rate is ﬁxedat 0 . { . , . , . , . , . , . } , and Counties are constrained tocomply. While it is true that the States’ aversion to non-compliance is able to lessen free-riding monotonicallyand more readily as State 2’s initial infection rate grows, the most salient feature is the sudden reversal inthe status of the apparent free-rider as State 2’s initial infection rate crosses a (high) threshold. Furtherinspection reveals that, although State 1 has a higher proportion of active individuals even from State 2 andcares about infections only slightly less than State 2 (but still with κ as high as 0 . . α ≈

1) while State 2 weakens its policy more graduallywith increase in the non-compliance weight γ a , enabling State 1 to free-ride. However, once State 2’s initial We only report results for two-sided compliance costs at all levels; we did not observe any evidence of free-riding mitigationusing the one-sided variant in our experiments. γ a , forcing State 1 tostrengthen its policy — this is reﬂected in the sign reversal and increased magnitude of the policy diﬀerence(Figures 10, 11, and 12 in Appendix C). (a) Transport symmetry . . . . . Non-compliance weight among counties . . . G i n i c o e ﬃ c i e n t Misaligned statesAligned States Fully Decentralized (b) Transport asymmetry (1 favorite County per State)

Figure 6: Gini coeﬃcient of costs averaged over (a) 50 trials for each scenario; (b) 30 trials for

Aligned States ,50 for each other scenario. Error bars show one standard error.Another property of the equilibria of an HPMG worth studying is how fair the distribution of costs isamong the Counties for diﬀerent degrees of centralization and diﬀerent priorities of the States. Of the manyfairness concepts that exist in the literature, we apply the popular measure, the Gini coeﬃcient, to Counties’overall costs at proﬁle α returned by Algorithm 1: Gini ( α ) = (cid:80) a ∈L (cid:80) a (cid:48) ∈L | C a ( α ) − C a (cid:48) ( α ) | n L (cid:80) a ∈L C a ( α ) . We report experiments with 5 Counties under each of 2 States. For the Government, κ a = η a = 0 . b ∈ { , } , γ b = 0 .

5, and there are two diﬀerent scenarios of the full game based on the ratios10 b /η b : (1) Misaligned States if this ratio is 20 /

80 for State 1 and 80 /

20 for State 2, (2)

Aligned States if itis 50 /

50 for either state. A third scenario we study is full decentralization where we set each County’s non-compliance weight to 0 so that HPMG degenerates into a simultaneous-move game among Counties. For eachscenario, we apply two treatments with respect to the transport matrix: transport symmetry and a speciﬁcasymmetry where each State has 1 County that is (universally) favorite with r H = 0 .

35. In each situation,we vary the shared non-compliance weight γ a of every County a as an independent variable (unless it is ﬁxedat 0), draw a uniform random sample κ (cid:48) a ∼ U [0 , κ a = κ (cid:48) a (1 − γ a ). Each setof draws for all Counties constitutes one trial. Figure 6 provide Gini coeﬃcient scatter plots for transportsymmetry and our speciﬁc asymmetry respectively. The distribution of overall costs seems reasonably andcomparably fair (lower is better) across scenarios. See Appendix D for further details. We have initiated the study of a new game-theoretic model motivated by decentralized, strategic policy-making under pandemic conditions, and experimentally uncovered interesting aspects of its equilibria. Thereare several immediate directions for future work: more extensive experimentation for other parameter con-ﬁgurations, including the formulation and testing of (causal) hypotheses; using the actual ABM [28] insteadof our closed-form infection estimation and handling the resulting computational eﬃciency issues; consid-ering more complex policies (e.g. temporally evolving strategies) and invoking more sophisticated EGTAapproaches. It would also be interesting to apply HPMG or its natural variants to other problems of hierar-chical decision-making within, say, a corporate or ideological (e.g. political) organization, where “superiors”can impose a non-compliance penalty or oﬀer a compliance bonus.

References [1] Samit Bhattacharyya and Chris Bauch. ”Wait and see” vaccinating behaviour during a pandemic: Agame theoretic analysis.

Vaccine , 29(33):5519–5525, 2011.[2] Martin Br¨une and Daniel Wilson. Evolutionary perspectives on human behavior during the coronaviruspandemic: insights from game theory.

Evolution, medicine, and public health , 2020(1):181–186, 2020.[3] Frederick Chen. A mathematical analysis of public avoidance behavior during epidemics using gametheory.

Journal of Theoretical Biology , 302:18–28, 2012.[4] Frederick Chen, Miaohua Jiang, Scott Rabidoux, and Stephen Robinson. Public avoidance and epi-demics: insights from an economic model.

Journal of Theoretical Biology , 278(1):107–119, 2011.[5] Eli Fenichel. Economic considerations for social distancing and behavioral based policies during anepidemic.

Journal of Health Economics , 32(2):440–451, 2013.[6] Eli Fenichel, Carlos Castillo-Chavez, Graziano Ceddia, Gerardo Chowell, et al. Adaptive human behaviorin epidemiological models.

Proceedings of the National Academy of Sciences , 108(15):6306–6311, 2011.[7] Nicola Gatti and Marcello Restelli. Equilibrium approximation in simulation-based extensive-formgames. In

International Conference on Autonomous Agents and Multiagent Systems , pages 199–206,2011.[8] Mark Gersovitz. Disinhibition and immiserization in a model of SIS diseases.

Unpublished Papers, JohnsHopkins University , 2010.[9] Nicolas Hoertel, Martin Blachier, Carlos Blanco, Mark Olfson, et al. A stochastic agent-based model ofthe SARS-CoV-2 epidemic in France.

Nature medicine , 26(9):1417–1421, 2020.1110] Ariful Kabir and Jun Tanimoto. Evolutionary game theory modelling to represent the behaviouraldynamics of economic shutdowns and shield immunity in the COVID-19 pandemic.

Royal Society OpenScience , 7(9):201095, 2020.[11] Stephen Lauer, Kyra Grantz, Qifang Bi, Forrest Jones, Qulu Zheng, et al. The incubation period of coro-navirus disease 2019 (COVID-19) from publicly reported conﬁrmed cases: estimation and application.

Annals of Internal Medicine , 172(9):577–582, 2020.[12] Atalanti Mastakouri and Bernhard Sch¨olkopf. Causal analysis of COVID-19 spread in Germany.

Ad-vances in Neural Information Processing Systems , 33, 2020.[13] Kiesha Prem, Alex Cook, and Mark Jit. Projecting social contact matrices in 152 countries using contactsurveys and demographic data.

PLoS Computational Biology , 13(9):e1005697, 2017.[14] Zhaozhi Qian, Ahmed Alaa, and Mihaela van der Schaar. When and How to Lift the Lockdown?Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes.

Advances in Neural Information Processing Systems , 2020.[15] Timothy Reluga. Game theory of social distancing in response to an epidemic.

PLoS ComputationalBiology , 6(5):e1000793, 2010.[16] Rick Rojas. Trump criticizes Georgia governor for decision to reopen state.

New York Times , 2020.Accessed: 2020-01-19.[17] Bob Rowthorn and Flavio Toxvaerd. The optimal control of infectious diseases via prevention andtreatment.

CEPR Discussion Paper No. DP8925 , 2012.[18] Robert Rowthorn and Jan Maciejowski. A cost–beneﬁt analysis of the covid-19 disease.

Oxford Reviewof Economic Policy , 36(Supplement):S38–S55, 2020.[19] Suresh Sethi. Optimal quarantine programmes for controlling an epidemic spread.

Journal of theOperational Research Society , pages 265–268, 1978.[20] Mrinank Sharma, S¨oren Mindermann, Jan M Brauner, Gavin Leech, Anna B Stephenson, Tom´aˇsGavenˇciak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, and Yarin Gal. How Robust are theEstimated Eﬀects of Nonpharmaceutical Interventions against COVID-19?

Advances in Neural Infor-mation Processing Systems , 2020.[21] Yoav Shoham and Kevin Leyton-Brown.

Multiagent systems: Algorithmic, game-theoretic, and logicalfoundations . Cambridge University Press, 2008.[22] Flavio Toxvaerd. Rational disinhibition and externalities in prevention.

International Economic Review ,60(4):1737–1755, 2019.[23] Flavio Toxvaerd. Equilibrium Social Distancing, 2020. Cambridge–INET Working Paper Series No.2020/08.[24] Jay Van Bavel, Katherine Baicker, Paulo Boggio, Valerio Capraro, et al. Using social and behaviouralscience to support COVID-19 pandemic response.

Nature Human Behaviour , pages 1–12, 2020.[25] Yevgeniy Vorobeychik and Michael Wellman. Stochastic search methods for Nash equilibrium approxi-mation in simulation-based games. In

International Conference on Autonomous Agents and MultiagentSystems , pages 1055–1062, 2008.[26] Yevgeniy Vorobeychik, Daniel Reeves, and Michael Wellman. Constrained automated mechanism designfor inﬁnite games of incomplete information. In

Conference on Uncertainty in Artiﬁcial Intelligence ,pages 400–407, 2007. 1227] Michael Wellman. Methods for empirical game-theoretic analysis. In

AAAI Conference on ArtiﬁcialIntelligence , pages 1552–1556, 2006.[28] Bryan Wilder, Marie Charpignon, Jackson Killian, Han-Ching Ou, Aditya Mate, Shahin Jabbari, et al.Modeling between-population variation in COVID-19 dynamics in Hubei, Lombardy, and New YorkCity.

Proceedings of the National Academy of Sciences , 117(41):25904–25910, 2020.

A Omitted Details from Section 2.3

We will state and prove a useful property of Poisson distributions.

Proposition 1.

Consider a random variable Y ∼ Poisson ( λ ) . Then, for any non-zero real number b independent of Z , E Z [ b Z ] = e − λ (1 − b ) . Proof.

From deﬁnitions, E Z [ b Z ] = ∞ (cid:88) z =0 a z Pr[ Z = z ] = (cid:88) z =0 a z · e − λ λ z z != e − λ ∞ (cid:88) z =0 ( bλ ) z z ! = e − λ e bλ which equals the desired expression. B Omitted Details from Section 3 . . . . . . . . . Fraction of Players Best Responding N u m b e r o f B e s t R e s p o n s e s The Number of Best-Responses Per Fraction of Best-Responding Players n = 5 n = 15 n = 25 n = 35 n = 45 Figure 7: The number of best response steps (y-axis) as a function of the fraction of players that are bestresponding (x-axis). Each curve corresponds to a game with n players (ranging from 5 to 45) in the secondlevel which is also the lowest level of game.We conduct this experiment on a two-level game with one Government and a variable number n ofStates. We focus on symmetric settings, i.e. all states have equal population, equal initial infections, andequal weight vectors. The transport matrix is also symmetric. Figure 7 shows that setting k = n inAlgorithm 1 (synchronous BRD) results in fastest convergence to equilibrium for several choices of n .13 Omitted Details from Section 4.1

We will ﬁrst look at the variation in State 1 and 2’s policies separately (Figures 8 and 9 respectively),rather than their diﬀerence, as we vary their shared non-compliance weight for the experimental set-up inSection 4.1. Recall that the initial infection rate of State 1 is ﬁxed at 0 . α close to 1) while State 1 adopts astrong policy ( α close to 0).The most striking observation is that State 1 maintains a maximally strict policy in equilibrium for awide range of parameters (up to a compliance weight of around 0 .

15 for all initial infection rates of State 2studied here) while jumps to a fairly weak policy quickly over the same range.Figure 8: The Policy of State 1 (y-axis) as a function of non-compliance weight (x-axis) under transportsymmetry. Each curve corresponds to a diﬀerent initial infection rate for State 2 as speciﬁed in the legend.Figure 9: The Policy of State 2 (y-axis) as a function of non-compliance weight (x-axis) under transportsymmetry. Each curve corresponds to a diﬀerent initial infection rate for State 2 as speciﬁed in the legend.Now turning to the setting with transport asymmetry where State 1 is the favorited destination, Figures 10and 11 provide similar insights into the behaviors of States 1 and 2 individually. Even when State 2’s initialinfection rate exceeds that of State 1 (which now has the moderate value 0 . D Omitted Details from Section 4.2

We will ﬁrst take a closer look at the policies of the Counties and the resulting Gini coeﬃcients for theexperiments in Section 4.2.Under transport symmetry, Figures 13 give us scatter plots of the realized Gini coeﬃcient and theequilibrium policies of all Counties for each trial respectively; Figures 15 and 16 give us the corresponding15igure 12: The Policy of State 2 (y-axis) as a function of its own initial infection rate when its weight onthe non-compliance cost is 0 under our speciﬁc transport asymmetry.scatter plots for the speciﬁc asymmetry where State 1 is the favorite destination. Important observationsFigure 13: 50 trials for each of 3 scenarios.on these scatter plots are that the values of the policies tend to be extreme and realized Gini coeﬃcientsseem to take on values from a small discrete set. The somewhat unique scenario is

Aligned States which hasa higher variability or granularity in values for the symmetric setting — a property that is lost under ourspeciﬁc asymmetry. 16igure 14: 50 trials for each of 3 scenarios. . . . . . Non-compliance weight among counties . . . . . . . G i n i c o e ﬃ c i e n t Fully DecentralizedMisaligned StatesAligned States

Figure 15: 30 trials for

Aligned States , 50 trials for each other scenario. . . . . . Non-compliance weight among counties . . . . . . α Fully DecentralizedMisaligned StatesAligned States

Figure 16: 30 trials for