[PDF] A Decomposition Approach to Counterfactual Analysis in Game-Theoretic Models

Abstract

When a government announces a new policy in a strategic setting, such as changes to taxes or tariffs, it creates additional signals through media reports or studies on the policy which may be used by economic agents. This may result in a new predicted outcome that is not captured by the analysis based on Bayes Nash equilibria. In this paper, we adopt Bayes Correlated Equilibrium of Bergemann and Morris (2016) and develop a method of counterfactual analysis that can be used for such situations. We show that, under an invariance condition for the equilibrium selection rule, the equilibrium-based prediction in this setting is interval-identified by the decomposition-based prediction which can be estimated without specifying the details of the game such as payoffs or distribution of unobserved heterogeneity. This approach greatly simplifies the counterfactual analysis in strategic settings, including those with multiple equilibria, as it eliminates the need to recover the set of equilibria in the post-policy game. We illustrate our message by revisiting an empirical analysis in Ciliberto and Tamer (2009) on entry decisions of firms in the airline industry.

Full PDF

AA D

ECOMPOSITION A PPROACH TO C OUNTERFACTUAL A NALYSIS IN G AME -T HEORETIC M ODELS

Nathan Canen and Kyungchul Song

University of Houston and University of British Columbia A BSTRACT . Decomposition methods are often used to produce counterfactual pre-dictions of outcomes when a policy variable changes. The methods project theobserved relationship between the policy variable and the endogenous variable to acounterfactual environment. However, when the endogenous variable is generatedby an agent’s strategy in a game-theoretic setting and the agent has an incentiveto deviate from his strategy after the policy, such predictions are hard to justify.In this paper, we use a generic model of a Bayesian game with the solution con-cept of Bayes Correlated Equilibria of Bergemann and Morris (2016) and show thatif the information structure is rich enough and the equilibrium selection rule sat-isﬁes a weak invariance property, decomposition-based predictions are identical toequilibrium-based predictions. This result opens up the possibility for applying a de-composition method for counterfactual analysis in various strategic settings withoutspecifying and estimating details of the game. We illustrate our message by revisit-ing an empirical analysis in Ciliberto and Tamer (2009) on entry decisions of ﬁrmsin the airline industry.K

EY WORDS : Counterfactual Analysis, Multiple Equilibria, Bayes Correlated Equilib-ria, Decomposition MethodJEL C

LASSIFICATION : C30, C57

Date : October 20, 2020.We thank Aimee Chin, Sukjin Han, Chinhui Juhn, Arvind Magesan, Daniela Scur, Eduardo Souza-Rodrigues, Andrea Szabo, Xun Tang, and participants at the UBC econometrics lunch seminar forvaluable comments. All errors are ours. Song acknowledges that this research was supported bySocial Sciences and Humanities Research Council of Canada. Corresponding address: KyungchulSong, Vancouver School of Economics, University of British Columbia, Vancouver, BC, Canada. Emailaddress: [email protected]. a r X i v : . [ ec on . E M ] O c t

1. Introduction

One of the central goals of empirical research in economics is to quantify the ef-fects of new policies yet to be implemented. Examples include analyzing the effectsof increasing minimum wages on labor outcomes, the effects of different govern-ment legislation in healthcare and the effects of different market characteristics onﬁrm entry, to name only a few.Decomposition methods have often been used to produce counterfactual predic-tions of a policy in labor economics - see Oaxaca (1973) and Blinder (1973), Juhn,Murphy, and Pierce (1993), and DiNardo, Fortin, and Lemieux (1996). See Fortin,Lemieux, and Firpo (2011) for a survey. The methods project the observed relation-ship between the policy variable and the endogenous variable to a counterfactualenvironment. To the best of our knowledge, however, these methods have rarelybeen used in strategic settings where agents’ payoffs are interdependent.To illustate a decomposition method in a strategic setting, consider an environ-ment where there are two ﬁrms considering entry into a market. Let Y i be anindicator of a ﬁrm i ’s entry into the market and X i an observable cost componentof the ﬁrm’s production. We assume that X i is exogenous, taking values in a ﬁniteset, and Y i is generated as follows: for a map g i , Y i = g i ( X , X , ε i ) , i = 1 , , (1)where ε i is the unobserved exogenous payoff component of the ﬁrm, independentof X = ( X , X ) . (The map g i describes how the ﬁrm’s characteristics affect itsentry decisions. This can be made explicit using an equilibrium strategy from agame-theoretic model.)If the government introduces a new policy that transforms X into f ( X ) , whatwould be the average counterfactual prediction of Y i ? Making use of the causalrelationship in (1) (with the assumption of independence between ε i and X ), wecan ﬁnd the prediction as (cid:88) x E [ Y i | X = f ( x )] P { X = x } . (2)In this paper, we call this a decomposition-based prediction , as this form is used inthe decomposition methods in labor economics. Insofar that the primary goal isto obtain this prediction, one does not need to estimate the map g i : the causal relationship between Y i and X is recovered from the data directly. In other words,one does not need to specify functional forms of payoffs and derive equilibriumstrategies.One of the assumptions justifying this prediction is that the causal relationship(1) remains the same after the new policy. This is no longer true if ﬁrms havean incentive to deviate from the relation in (1) after the policy. In other words,we cannot justify the decomposition-based prediction if it is no longer incentivecompatible after the policy.Game-theoretic models are useful for elucidating an incentive problem in a strate-gic setting. Using such models, we can derive ﬁrms’ decision rules that are invariantto the policy and use them to make predictions. In doing so, however, we often needto specify further details such as the shape of payoff functions and the informationstructure. Some details may be well motivated, but others simply ad hoc . It isgenerally hard to tell how sensitively the prediction depends on the ad hoc compo-nents of the model. Furthermore, the model is usually set-identiﬁed due to multipleequilibria. This causes signiﬁcant computational costs, as one needs to simulate theoutcomes for each value of the parameters in the estimated set. It is not unusualthat after all this the results turn out to be ambiguous (e.g., see Aguirregabiria andNevo (2012), p.111).In contrast, decomposition-based predictions have several practical merits. First,they use results from nature’s experiments as reﬂected in the data, and are notsensitive to speciﬁcation details of the game as such details are not used at all tobegin with. Second, decomposition-based predictions are point-identiﬁed, and thecomputation of the predictions is relatively less costly. Third, inference is easy - astandard nonparametric bootstrap procedure usually works. The inference trans-parently reﬂects the variations of the data without relying on parametric functionalform restrictions.In this paper, we study conditions for decomposition-based predictions to coin-cide with equilibrium-based predictions in a Bayesian game. As we seek to ﬁnd aminimal set of conditions, we consider a generic Bayesian game stripped of variousspeciﬁcation details used in the literature, and adopt correlated equilibria as thesolution concept. A correlated equilibrium, a solution concept originally proposedby Aumann (1974), is more ﬂexible than a Nash equilibrium because it permits This was noted by Blundell and Powell (2003), p.317-318. communication between agents or commonly observed signals before the play ofthe game in a way that an analyst cannot describe accurately in a model (Myer-son (1994)). In our context which includes incomplete information games, we ﬁndthat the framework of Bergemann and Morris (2016) and their solution concept ofBayes Correlated Equilibrium (BCE) are convenient for our purpose for three rea-sons. First, they distinguish two components in the game: the basic game whichspeciﬁes the payoffs and the payoff-relevant state, and the information structurewhich speciﬁes how signals to the agents are generated given a value of the pay-off state. This separation enables us to see the role of the information structure incounterfactual predictions separately from the change of the payoff state due to thepolicy. Second, as shown in Bergemann and Morris (2016), the solution concept ofBCE exhibits a tight relationship between the set of admitted causal relations be-tween Y i and X i and the information structure. Using this relationship, we study thevalidity of decomposition-based predictions when the policy affects the informationstructure, for example, through increased communications between the players.This facilitates the same analysis for other solution concepts in a unifying way, asthe latter concepts are simply a restricted version of the BCE.Using a generic version of the Bayesian game, we show that if the following threeconditions are satisﬁed:(a) the information structure is “rich enough”,(b) the equilibrium selection rule has a weak invariance property, and(c) the support of X contains that of f ( X ) ,then the decomposition-based predictions are identical to the equilibrium-basedpredictions. The rich information structure condition (a) is already satisﬁed ingames with complete information. It also permits private information as long asthe policy variable is publicly observed. This includes many of the entry-exit gamesestimated in the empirical IO literature, as well as games that change observedreserve prices in auctions. The condition (b) is a weak regularity condition which Alternative extensions of correlated equilibria of Aumann (1974) to incomplete information gameshave been proposed by, for example, Cotter (1991), Forges (1993, 2006), Stinchcombe (2011), andLiu (2015), among others. Correlated equilibria in communication games are explained in Myerson(1994). Later in this paper, we discuss how part of our results extend to other solution conceptsincluding Bayes Nash Equilibria. Although our primary focus in this paper is on strategic settings, our results also hold naturally insingle-agent settings. requires that if two sets of causal relationships between Y i and X from the twogames agree on the support of f ( X ) , the probability of selecting one set shouldbe the same as the probability of selecting the other set. In other words, if a setof causal relationships remains invariant to the policy, so does the probability of itbeing selected. The condition prevents the equilibrium selection rule from behavingarbitrarily as we move from one game to another. The support condition (c) issatisﬁed when policies are set to those observed elsewhere. This support conditionis violated, if the policy changes X into a value outside of its support in the data.In this case, we show that one can still use the decomposition-based predictions toconstruct upper and lower bounds for the equilibrium-based predictions. The lengthof the interval deﬁned by these bounds can be viewed as the maximal extrapolationerror that one faces without relying on any functional form restrictions on the causalrelation between the endogenous variable and the policy variable.Using decomposition-based predictions does not eliminate entirely the need touse a game-theoretic model for counterfactual analysis. On the contrary, to checktheir validity, one needs to clarify the strategic environment and the agents’ infor-mation structure. We believe that our results are useful for this step, as they showwhich speciﬁcations are needed (and which ones not) for the use of decomposition-based predictions. Once the validity of decomposition-based predictions is con-ﬁrmed, one does not need to specify further details of the game.As an illustration, we revisit the empirical application of Ciliberto and Tamer(2009). Using a model of a complete information entry game, they studied theU.S. airline market and assessed the effects on entry from a repeal of the Wrightamendment, a law that restricted entry in routes using Dallas Love Field Airport.Due to the multiplicity of equilibria, they pursued a set identiﬁcation approach,and reported maximum entry probabilities as counterfactual predictions. In ourapplication, we produce a decomposition-based prediction and compare it with theprediction from Ciliberto and Tamer (2009). We also compare these predictionswith the actual results after the repeal of the Wright amendment in 2014. Weﬁnd that the decomposition-based predictions using the pre-policy data performreasonably well out-of-sample. Related Literature

The idea that one may not need to identify a full structural model to identifypolicy effects of interest goes back at least to Marschak (1953), as pointed out by Heckman (2010a). Recent approaches exploring similar insights include thesufﬁcient statistics approach (most notably in public ﬁnance, e.g. Chetty (2009);see Kleven (2020) for a recent overview) and the Policy Relevant Treatment Effect(PRTE) of Heckman and Vytlacil (2001). See Heckman (2010a) for details in thisliterature. We convey a similar message in this paper by studying conditions for thevalidity of decomposition-based predictions in strategic settings.The decomposition approach is widely used in economics, most notably in laboreconomics. In the decomposition approach, the researcher estimates their model,and then decomposes the variation in the outcome into the effects from differentcovariates. One example is the study of the effects of minimum wages on wageinequality (e.g. DiNardo, Fortin, and Lemieux (1996)): while there might be othermechanisms that affect wage inequality (e.g. unionization), once the model hasbeen estimated, one can check how the distribution of wages would have differedif minimum wages had increased. Salient examples of the decomposition approachinclude the early work of Oaxaca (1973) and Blinder (1973), Juhn, Murphy, andPierce (1993), the nonparametric/semiparametric approach of DiNardo, Fortin, andLemieux (1996), and other extensions which have been very widely applied, seeFortin, Lemieux, and Firpo (2011) for a survey. More recently, Chernozhukov,Fern´andez-Val, and Melly (2013) provided an approach for inference on the fullcounterfactual distributions. However, this body of literature does not considerstrategic settings.A common way to generate counterfactual predictions in a strategic environmentis to estimate specify a game-theoretic model and use the predictions from its equi-libria. In many cases, point identiﬁcation of these models is not possible due to mul-tiple equilibria. Researchers often either pick one equilibrium from the game (e.gJia (2008)) or conduct inference on the identiﬁed set allowing for all the equilibria(e.g. Ciliberto and Tamer (2009)). In light of these challenges, a recent literaturestudies point-identiﬁcation of counterfactual predictions without identifying the fulldetails of the model. These developments have been centered around dynamic dis-crete choice models – e.g. Aguirregabiria and Suzuki (2014), Norets and Tang(2014), Arcidiacono and Miller (2020), Kalouptsidi, Scott, and Souza-Rodrigues(2020). While many structural models within this class are unidentiﬁed , some This is often due to failure in identifying ﬂow payoffs, or in separately identifying ﬁxed costs, entrycosts and exit costs - see the discussion in Aguirregabiria and Suzuki (2014). counterfactuals are point identiﬁed such as those characterized by linear changesin payoffs (the so called “additive transfers counterfactuals” in Kalouptsidi, Scott,and Souza-Rodrigues (2017)). See also Kalouptsidi, Kitamura, Lima, and Souza-Rodrigues (2020) for partial identiﬁcation of counterfactuals in a similar context.The message in this paper is related to that of Kocherlakota (2019). Using amodel of a dynamic game between the private sector and the government, heshowed that to obtain an optimal policy, one can just use predictions by regress-ing policy payoffs on policies using historical data, without relying on a structuralmacroeconomic model. There are major differences between his framework andours. First, his model is a dynamic model where the policy-maker is a player of thegame, whereas ours is a static one in which the policy-maker is outside the model.Second, his model is a macro model where the analysis is based on one equilibriumwhich generated the data. In our case, many independent copies of a static gameis played so we need to deal with the problem of multiple equilibria. Third, in ourframework, the information structure of a game plays a prominent role, whereas itsrole is limited in his dynamic model where the forward looking private sector needsto form beliefs about future government policies.There is a small body of literature that uses the solution concept of BCE in an em-pirical setting. Magnolﬁ and Roncoroni (2019) use it in studying entry decisions inthe supermarket sector. They focus on characterizing the identiﬁed set and use theset to study a policy that changes a covariate (presence of large malls). Bergemann,Brooks, and Morris (2019) also consider the problem of computing counterfactualsfor games using BCE, providing a characterization of a sharp (counterfactual) iden-tiﬁed set under their incentive compatibility constraints. Other empirical examplesusing BCE and obtaining partial identiﬁcation include Gualdani and Sinha (2020)on discrete choice models and Syrgkanis, Tamer, and Ziani (2018) on auctions.This paper is organized as follows. In Section 2, we give an overview of ourmain insights using a simple entry game model. In Section 3, we present formalresults of the validity of decomposition-based predictions for general incompleteinformation games. In Section 4, we discuss the situations where the payoff stateis only partially observed as in many empirical applications. Section 5 is devoted tothe empirical application. In Section 6, we conclude. The mathematical proofs anddetails on the implementation of decomposition-based predictions are found in theonline appendix to this paper.

2. Overview of the Main Insights

In this section, we present an overview of our main results using a simple settingof n ﬁrms which decide on whether to enter a market or not. (Formal deﬁnitionsand results in a more general setting are presented in Section 3.) The action spaceis a two-point set Y i = { , } , where denotes the decision of not entering and entering the market. The payoff for ﬁrm i is given by u i ( y i , y − i , w ) , where w denotesthe exogenously given payoff-relevant state (called simply ‘payoff state’ hereafter)such as the ﬁxed cost of entering the market. Nature draws the payoff state W anddraws signals T = ( T i ) ni =1 from a conditional distribution given W . This conditionaldistribution constitutes the information structure of the game. For example, if theconditional distribution of T given W = w concentrates highly around w for each w ∈ W , the agent has large information about the payoff state.The predictions from this game take the form of a conditional joint probabilitymass function p ( ·| W, T ) of actions given ( W, T ) . Consider what predicted actions areincentive compatible, i.e., no agent has an incentive to deviate from the predictions.Suppose that this function satisﬁes that for all players i , (cid:88) ( y ,...,y n ) ∈ Y E [ u i ( y i , y − i , W ) p ( y , ..., y n | W, T ) | T i ] (3) ≥ (cid:88) ( y ,...,y n ) ∈ Y E [ u i ( τ ( y i ) , y − i , W ) p ( y , ..., y n | W, T ) | T i ] , for any map τ : Y i → Y i , where Y = Y × ... × Y n . The term on the right handside represents the expected payoff of player i when she deviates from the action y i recommended according to p ( ·| W, T ) and chooses instead τ i ( y i ) while all the otherplayers follow the actions recommended to them. If p ( ·| W, T ) satisﬁes (3) for each i , no player has an incentive to deviate from the action recommended to her. Inthis case, p ( ·| W, T ) becomes a Bayes Correlated Equilibrium (BCE) introduced byBergemann and Morris (2016).If we restrict the conditional joint probability mass function to be of form: p ( y , ..., y n | W, T ) = p ( y | W, T ) × ... × p ( y n | W, T ) , (4) then the BCE is reduced to a (mixed strategy) Bayes Nash Equilibrium (BNE). Themain distinction of BCE from BNE is that the predicted actions from BCE are per-mitted to be correlated (conditional on ( W, T ) ). This correlation may come fromcommunications between agents before the play of the game. Correlated equilibriaas opposed to Nash equilibria accommodate these possibilities without specifyingthe detailed protocols of communication (Myerson (1994)).The observed action proﬁle Y = ( Y , ..., Y n ) is generated from BCE as follows. Let { p s : s ∈ S } be the set of BCE’s . First, nature draws W = w , selects a BCE p s withprobability, say, e ( s | w ) , and then draws signals t = ( t , ..., t n ) from a conditionaldistribution of T given W = w . The observed action proﬁle Y is drawn from theconditional distribution p s ( ·| w, t ) .The set of BCE’s together with an equilibrium selection rule e ( s | w ) gives us aprediction rule as the conditional distribution of Y given W = w : ρ ( A | w ) = (cid:88) s ∈ S (cid:88) ( y ,...,y n ) ∈ A E [ p s ( y , ..., y n | W, T ) | W = w ] e ( s | w ) (5) = P { Y ∈ A | W = w } , for any subset A ⊂ Y . We call the map ρ ( ·| w ) the (randomized) reduced form in thepaper, because it reveals the (stochastic) causal relationship between exogenousvariable W and the endogenous variable Y . Indeed, the reduced form summarizeshow Y is generated once the payoff state W is realized to be w . In our entry-exit example, the payoff state W , including market size and other ﬁrm and marketcharacteristics, is drawn ﬁrst. Then entry decisions Y are drawn from the reducedform ρ ( ·| W ) . Naturally the reduced form represents a prediction rule: for any givenset A of action proﬁles, the probability of the action proﬁle Y realizing in A whenthe payoff state turns out to be w is ρ ( ·| w ) . Suppose that the payoff state is changed from W to f ( W ) by some transform f : W → W , where W denotes the support of W . For example, we could change A randomized reduced form corresponds to what Bergemann and Morris (2016) called an outcome .We use the term “reduced form” to emphasize the causal structure: W → Y , where W is an externalvariable and Y is an internal variable in the sense of Heckman (2010b) (p.56). The term of “reducedform” in this sense appears in econometrics at least as early as in Koopmans (1949). market size (as part of the payoff state W ) in an entry-exit example, and see itseffects on entry. See Jia (2008) for an example.Let p fs ( y , ..., y | W, T ) , s ∈ S f , be the counterfactual equilibria after the policy.Then the prediction rule in this counterfactual game is obtained by ρ f ( A | w ) = (cid:88) s ∈ S f (cid:88) ( y ,...,y n ) ∈ A E [ p fs ( y , ..., y n | W, T ) | W = w ] e f ( s | w ) , (6)where e f ( s | w ) denotes the equilibrium selection rule of the counterfactual game.We call ρ f ( A | w ) an equilibrium-based prediction . The prediction is calculated fromaveraging over actions prescribed in equilibria, and over all possible equilibria ac-cording to the equilibrium selection rule e f . The average counterfactual predictionis given by (cid:88) w ∈ W ρ f ( A | f ( w )) P { W = w } . (7)Alternatively, we may consider the Rosenbaum-Rubin potential outcome approachas follows. Suppose that Y ◦ ( w ) is the potential action proﬁle when the payoff stateis ﬁxed to be w . Then the observed action proﬁle Y is given by Y = (cid:88) w ∈ W Y ◦ ( w )1 { W = w } . (8)If we reconcile this approach with the way Y is generated from the original game,we must have P { Y ◦ ( w ) ∈ A } = ρ ( A | w ) . (9)Now, for counterfactual analysis, we take the reduced form ρ as policy-invariant,and consider the average counterfactual prediction as follows: (cid:88) w ∈ W P { Y ◦ ( f ( w )) ∈ A } P { W = w } = (cid:88) w ∈ W ρ ( A | f ( w )) P { W = w } (10) = (cid:88) w ∈ W P { Y ∈ A | W = f ( w ) } P { W = w } . We call ρ ( A | f ( w )) a decomposition-based prediction . The conditional probability P { Y ∈ A | W = f ( w ) } is originated from the pre-policy game that we observe inthe data. For example, if we observe W , this probability is readily recovered fromdata, without specifying details of the game. (Later we discuss how this approachextends to the case where W is only partially observed.) The approach of decomposition-based predictions is commonly found in laboreconomics, although it is also widely applied in other ﬁelds - see Fortin, Lemieux,and Firpo (2011) for a survey. The estimation of the last term in (10) is simple inpractice. We can estimate the conditional probability of Y ∈ A given W = w usingthe data nonparametrically and then use the estimates setting w to be f ( w ) to obtaincounterfactual predictions. We use this procedure in Section 5. The implementationdetails are found in the online appendix. Despite the merits mentioned in the introduction, decomposition-based predic-tions are hard to justify if any of the ﬁrms wants to deviate from the predictionsafter the policy. When do they coincide with the equilibrium-based prediction? Itdoes when ρ ( A | w ) = ρ f ( A | w ) , (11)for all w ∈ f ( W ) . In other words, the reduced forms from the original game and thecounterfactual game coincide for every possible payoff state in the support of f ( W ) .In this case, one can obtain the equilibrium-based predictions simply by recoveringthe decomposition-based predictions from data.When does condition (11) hold? To answer this question, deﬁne ρ s ( A | w ) = (cid:88) ( y ,...,y n ) ∈ A E [ p s ( y , ..., y n | W, T ) | W = w ] , s ∈ S , and(12) ρ fs (cid:48) ( A | w ) = (cid:88) ( y ,...,y n ) ∈ A E [ p fs (cid:48) ( y , ..., y n | W, T ) | W = w ] , s (cid:48) ∈ S f . These are conditional probabilities given the payoff state, obtained by integrat-ing out the signals from the conditional probabilities of actions given the payoffstate and signals. The reduced forms ρ and ρ f are weighted averages of ρ s and ρ fs weighted by the equilibrium selection rules e and e f . Thus, Condition (11) does not For two recent examples, see Kleven, Landais, and Søgaard (2019) and Stanton and Thomas(2016). Kleven, Landais, and Søgaard (2019) study the role of having children on wage inequalityacross genders (“children penalties”) : they estimate a counterfactual where the number of childrenis set to 0 for existing parents. They look at the effects of this change on male-female wage inequal-ity. Meanwhile, Stanton and Thomas (2016) study the role of job intermediaries (i.e. being part ofa labor agency) in driving up wages, comparing those with the same resume and job characteristics,but different afﬁliations to agencies. hold unless there is a connection between the sets of equilibria from the originalgame and the counterfactual game and their equilibrium selection rules.To motivate such a connection, suppose that f ( W ) = W for simplicity. Then, it isnot hard to see that condition (11) is satisﬁed if, for each w ∈ f ( W ) , there exists abijection h ( ·| w ) : S → S f such that p s ( y , ..., y n | w, t ) = p fh ( s | w ) ( y , ..., y n | w, t ) (13)for all t ∈ T , and e ( s | w ) = e f ( h ( s | w ) | w ) . (14)The condition (13) says that the set of equilibria from the original game is trans-ferable to the set of equilibria from the counterfactual game. The only differencebetween the two sets is that they are differently labeled. The condition (14) re-quires that if a set of equilibria remains the same after the policy, the probability ofselecting the set should remain the same as well. When equilibrium selection rules e and e f satisfy this condition, we say that they are coherent . In this case, the policy f essentially does not change anything on the set of equilibria other than relabelingthem, and hence the equilibrium selection rule should not be affected either.There is a natural way to extend the transferability and coherence conditions (13)and (14) to the case where the policy f : W → W now alters the support W so that f ( W ) (cid:40) W . Suppose that for each w ∈ W , there exists a surjection h ( ·| w ) : S → S f such that (13) holds for all t ∈ T and w ∈ f ( W ) . Furthermore, let us assume thatthe equilibrium selection rules e and e f are connected as follows: e f ( s (cid:48) | w ) = (cid:88) s ∈ h − ( s (cid:48) | w ) e ( s | w ) . (15)Indeed, in this case, for all w ∈ W , ρ ( A | f ( w )) = (cid:88) s ∈ S ρ s ( A | f ( w )) e ( s | f ( w )) (16) = (cid:88) s (cid:48) ∈ S f ρ fs (cid:48) ( A | f ( w )) (cid:88) s ∈ h − ( s (cid:48) | f ( w )) e ( s | f ( w ))= (cid:88) s (cid:48) ∈ S f ρ fs (cid:48) ( A | f ( w )) e f ( s (cid:48) | f ( w )) = ρ f ( A | f ( w )) . Hence, condition (11) follows. The Original Game (Before the Policy)p(y|w,t) f( 𝕎 ) w The Counterfactual Game (After the Policy)p f (y|w,t) f( 𝕎 ) w F IGURE

1. Coherence of the Equilibrium Selection Rules

The right panel depicts p fs (cid:48) ( y | w, t ) for some s (cid:48) ∈ S f as we vary w . The left panel depict multiple p s ( y | w, t ) ’s with s ∈ h − ( s (cid:48) | w ) such that p fs (cid:48) ( y | w, t ) and p s ( y | w, t ) coincide on f ( W ) . If e and e f are coherent, it means that the probability of selecting an equilibrium p fs (cid:48) given W = w is thesame as the probability of selecting the set of equilibria { p s : s ∈ h − ( s (cid:48) | w ) } each member ofwhich coincides with p fs (cid:48) on f ( W ) by (13). Under Condition (13), any equilibrium p s ( ·| w ) with s ∈ h − ( s (cid:48) | w ) remains thesame after the policy once we ﬁx w to be in f ( W ) . Thus the set of equilibria { p s : s ∈ h − ( s (cid:48) | w ) } is transferable to p fs (cid:48) when the payoff state W is restricted to f ( W ) . Then as before, the coherence condition in (15) says that the probability of p fs (cid:48) being selected in the counterfactual game is the same as the probability of selectingsome equilibrium in { p s : s ∈ h − ( s (cid:48) | w ) } in the original game. Again, this coherencecondition is an invariance condition where if a set of equilibria remains the sameafter the policy when the payoff state W is restricted to f ( W ) , the probability ofthe set being selected should remain the same. Figure 1 provides an illustration ofthe coherence condition. Thus, the remaining question is: when does the transfer-ability condition (13) hold? It holds if the introduction of policy f does not changethe posteriors of the ﬁrms once the posteriors are restricted to w ∈ f ( W ) . If pos-teriors were altered by the new policy, then the agents could wish to deviate fromactions extrapolated from the original game, since those actions maximize the ex-pected utility under the posteriors of the original game. This condition of restrictedposterior invariance holds if the information structure is rich enough. To see this, suppose that the game is a complete information game such that T i = W for each i = 1 , ..., n , and this is common knowledge among the ﬁrms. Then,the posterior of W given T i = w assigns the point mass of one to w . Therefore, asfor (3), (cid:88) ( y ,...,y n ) ∈ Y E [ u i ( y i , y − i , W ) p ( y , ..., y n | W, T ) | T i = w ] (17) = (cid:88) ( y ,...,y n ) ∈ Y E [ u i ( y i , y − i , W ) p ( y , ..., y n | W ) | W = w ] (18) = (cid:88) ( y ,...,y n ) ∈ Y u i ( y i , y − i , w ) p ( y , ..., y n | W = w ) . On the other hand, in the counterfactual game, (cid:88) ( y ,...,y n ) ∈ Y E (cid:2) u i ( y i , y − i , f ( W )) p f ( y , ..., y n | f ( W ) , T ) | T i = f ( w ) (cid:3) (19) = (cid:88) ( y ,...,y n ) ∈ Y u i ( y i , y − i , f ( w )) p f ( y , ..., y n | f ( W ) = f ( w )) . This shows that the decomposition-based predictions coincide with the equilibrium-based predictions if the game is a complete information game, the equilibrium se-lection rule e f of the counterfactual game is induced by the equilibrium selectionrule e in the original game and f ( W ) is contained in W .As we shall see later, the result here can be generalized to the case where eachﬁrm i observes a publicly known payoff component ˜ W and a private component W i , and the policy affects only the public component ˜ W . For example, in the caseof auctions, one counterfactual of interest is the effect of changing the observablereserve price on bids. Even though players have private information about their val-uations, the decomposition-based predictions coincide with the equilibrium-basedpredictions if the equilibrium selection rule in the counterfactual game is inducedby the one in the original game. We revisit this example in Section 2.6.In practice, we rarely observe the entire payoff state W . Suppose that W = ( X, ε ) , (20) Comparing this with (17), it is not hard to see that for each s (cid:48) ∈ S f and each w ∈ f ( W ) ⊂ W , p fs (cid:48) ( ·| w, t ) is an equilibrium if and only if p fs (cid:48) ( ·| w ) = p s ( ·| w ) for some s ∈ S . Therefore, for each w ∈ f ( W ) , there is a surjective map h ( ·| w ) : S → S f which assigns each s (cid:48) ∈ S f to s ∈ S such that p fs (cid:48) ( ·| w ) = p s ( ·| w ) . where X is observed by the researcher, and ε is not. Furthermore, assume that f ( W ) = ( ˜ f ( X ) , ε ) for some map ˜ f , i.e., the policy variable is observed by the re-searcher.If X and ε are independent, the decomposition-based prediction is equal to (cid:88) w ∈ W P { Y ∈ A | W = f ( w ) } P { W = w } = (cid:88) x ∈ X P { Y ∈ A | X = ˜ f ( x ) } P { X = x } , (21)where X denotes the set of values X can take. However, if X and ε are not in-dependent, but an instrumental variable is available, one can point-identify thedecomposition-based predictions using the control function approach (Blundell andPowell (2003), Imbens and Newey (2009)). Details are found in Section 4.1. So far, we have assumed that the policy affects the payoff state (from W into f ( W ) ) but that it does not change the information structure (i.e., the conditionaldistribution of T given W ). However, the introduction of a new policy could in-duce additional communication between the players, which would affect equilib-rium behavior through changing the players’ ability to coordinate in the counterfac-tual. This is the case, for example, with the literature on sunspots. (For a generaloverview of sunspots, see Shell (1989). See also Peck and Shell (1991) for anexample of sunspot equilibria in strategic settings.)Later in this paper, we present two classes of information structures in which thedecomposition-based prediction identiﬁes the equilibrium-based prediction evenwhen the information structure changes. The ﬁrst case is when the policy increasesthe information structure without revealing new information on the other players’signals or the payoff state. The second case is when each player observes the payoffstate W in the original game G . Once the coherence condition on the equilibriumselection rule is appropriately generalized, we show that the decomposition-basedpredictions coincide with the equilibrium-based predictions even when the infor-mation structure is changed by the policy this way. Sometimes, one may be interested in the effect of a policy variable which is setoutside its support in the data. For example, consider the effect of changing class sizes on student achievement (e.g. Angrist and Lavy (1999)). The policy mayinvolve a class size beyond what is observed in the data (for instance, in Angristand Lavy (1999) class sizes are always smaller than 40). Another example wouldbe the effect of an increase in the minimum wage beyond what is observed in thedata on wage inequality (e.g. DiNardo, Fortin, and Lemieux (1996)). This wouldinclude the effects of the much debated increase of the minimum wage in the U.S.to US$15 - a policy that even passed the House of Representatives in 2019 in the“Raise the Wage Act”, with its ﬁrst implementation set for Seattle in 2021.Suppose that the policy sends the payoff state potentially outside the supportof the state in the original game. We show that decomposition-based predictionsconstitute an upper and lower bounds for the equilibrium-based predictions, undera mild condition: (cid:88) w ∈ W : f ( w ) ∈ W ρ ( A | f ( w )) P { W = w } ≤ (cid:88) w ∈ W ρ f ( A | f ( w )) P { W = w }≤ (cid:88) w ∈ W : f ( w ) ∈ W ρ ( A | f ( w )) P { W = w } + P { W ∈ W ∗ } , where W ∗ = f − ( f ( W ) \ W ) . The last probability P { W ∈ W ∗ } can be viewed as amaximal extrapolation error which one faces without relying on any parametriza-tion or functional form restrictions in the game such as restrictions on the payofffunctions and the distribution of unobserved heterogeneity. The extrapolation error P { W ∈ W ∗ } tends to increase when the payoff state f ( W ) after the policy is fartheraway from the support of W . Entry-Exit Games.

Consider a two-player entry game, commonly studied inthe literature since Bresnahan and Reiss (1991). There are two ﬁrms, i = { , } ,who choose a binary action y i ∈ { , } , where y i = 1 represents entry in the market, To illustrate the usefulness of the bound approach, consider the minimum wages example men-tioned above. While a minimum wage of $15 is beyond the support of the data, there are multiplestates with minimum wages between $12-14 (e.g. California, Minnesota, Washington). As a result,a counterfactual policy of $15 is close to the support of the data. The extrapolation error boundwill be very small, yielding a very tight interval prediction. By comparison, an exercise setting theminimum wage to $25 would generate bounds that are much wider and less informative. and y i = 0 staying out of the market. The payoff for ﬁrm i is given by: u i ( y, w i ) = y i (∆ y − i + X (cid:48) i β + ε i ) , where ∆ < is a parameter and ε i is an i.i.d. shock. Hence the payoff stateis given by W = ( W , W ) , where W i = ( X (cid:48) i β, ε i ) . As in Ciliberto and Tamer(2009) and Jia (2008), one could endow the game with complete information,so that T i = ( W , W ) , i = 1 , . Or as in Grieco (2014) and Magnolﬁ and Ron-coroni (2019), one could consider a public-private dichotomy of information sothat T i = ( X β, X (cid:48) β, ε i ) , i = 1 , . A typical counterfactual analysis in this setting considers a change in covariates X i and studies its effects on ﬁrm entry. For example, Ciliberto and Tamer (2009)considered a change in the binary indicator of Wright Amendment, Jia (2008) achange in the market size, Grieco (2014) a change in the presence of a super-center, and Magnolﬁ and Roncoroni (2019) a change in large malls. In these cases,decomposition-based predictions are valid as long as the conditions on the equilib-rium selection rules and the support condition for f ( W ) are satisﬁed.2.6.2. Bounds on Labor Elasticities.

Chetty (2012) provides a partial identiﬁca-tion approach to recovering the elasticity of labor supply in a model with optimiza-tion frictions. This elasticity is useful to study counterfactuals including the effectsof changing tax policy on labor supply.In his set-up, observed labor supply decisions, x i,t , can differ from the optimalchoices, x ∗ i,t , due to “optimization frictions”. Denote the labor supply elasticity by λ , the optimal labor supply by x ∗ i,t and prices (e.g. taxes, or wages) by p t . This isa single-agent model where worker i = 1 , ..., n chooses x ∗ i,t ∈ R + at every period t = 1 , , ... . The state variable is W i,t = p t , and since each agent has completeinformation, the signal of each person is the vector of state variables. From theﬁrms’ optimization with a quasi-linear utility, it follows that log( x i,t ) = α − λ log( p t ) + φ i,t + ν i,t , where ν i,t is the deviation for the optimal choices for i from their mean and φ i,t =log( x ∗ i,t ) − log( x i,t ) are the optimization frictions. Since φ i,t is unobserved and a func-tion of prices, one cannot generally identify λ without further information. Propo-sition 1 in Chetty (2012) characterizes bounds for this elasticity instead. Despiteits advantages (e.g. the identiﬁed set can rationalize different estimates of labor elasticity in macro and microeconomic studies), the bounds approach makes coun-terfactual predictions more difﬁcult. In fact, Chetty (2012) says that the boundsapproach “does not permit as rich an analysis of short-run counterfactuals becauseit only partially identiﬁes the model’s parameters” (p.973). Our results imply thatmany of his counterfactuals of interest can be estimated using the decompositionapproach.2.6.3. Competition in Prices.

Another example involves price competition amongﬁrms producing heterogeneous goods. These games include the supply side of Berry,Levinsohn, and Pakes (1995) and part of the pricing game in bargaining modelswith competition in prices (see Gowrisankaran, Nevo, and Town (2015) for an ex-ample in hospital pricing). In these models, parameters are not identiﬁed due tomultiple equilibria if one uses only pricing data without making further assump-tions. In these examples, pricing equilibria can be written as solutions to equationsof the form: for each ﬁrm i = 1 , ..., n , p ∗ i = mc i + b i ( p ∗ , mc, X, ζ, θ ) , where p ∗ = ( p ∗ i =1 ) ni =1 is the vector of equilibrium prices, mc = ( mc i ) ni =1 is the vectorof marginal costs, X = ( X i ) ni =1 is a vector of observable characteristics, θ is a vectorof parameters, ζ = ( ζ i ) ni =1 is a random variable unobserved to the econometrician,and b i ( · ) is some known (usually nonlinear) function. In this case, the action is p i ∈ R + , the game is of complete information where the payoff state is the vector W i = ( X i , mc i , ζ i ) and signals are T i = ( W j ) nj =1 . If one were interested in the effectof changes to X i (e.g. market size) on prices, one could use the decompositionapproach.2.6.4. Changing the Reserve Price in Auctions.

The literature of the empiricalstudies on auctions is often interested in the effect of reserve prices on bids andrevenue (see Paarsch, 1997; Haile and Tamer, 2003 for two examples). Since the Reserve prices are set by the seller before the auction takes place. They are the minimum valuefor which the seller is willing to sell the good: if no bid is higher than the reserve price during theauction, then the good remains unsold. As a result, in empirical work, they are usually considered asan auction characteristic (primitive). The reserve price is often observed by the bidders and by theeconometrician. If the researchers were interested in auctions with unobserved reserve prices (as inElyakime, Laffont, Loisel, and Vuong (1997)) or if they wanted to implement a reserve price beyondits support in the data (which is suggested in Table 4 of Haile and Tamer (2003)), the decomposition-based prediction would not be incentive compatible, as we describe in the next section. work of Guerre, Perrigne, and Vuong (2000) in particular, a common approach forcounterfactuals in ﬁrst-price auctions has been to ﬁrst nonparametrically identifyand estimate the distribution of valuations from the distribution of bids using equi-librium restrictions, and use those estimates for counterfactual analysis. However,since the reserve price is often a publicly observed state variable in practice, onecould use the decomposition approach to generate counterfactual predictions, ifthe reserve price does not affect private valuations (and as long as the counterfac-tual reserve price is within the support of the data). This observation extends toother auction formats, such as English auctions (oral ascending bids) studied byHaile and Tamer (2003). They relied on a partial identiﬁcation approach to studythe effect of alternative reserve prices on revenue and the probability of there beingno bids. Instead, one may use the decomposition-based predictions to estimate theeffect without computing bounds on the distribution of valuations. There are many other situations of counterfactual analysis where decomposition-based predictions are not incentive compatible.The approach of decomposition-based predictions does not apply when the policyalters the structure of the game such as payoff functions or the set of players. Oneexample is Hortac¸su and McAdams (2010), who study the effect of different auctionformats for selling treasuries on bidder expected surplus. Their counterfactual anal-ysis involves the change of the auction format observed in the data (discriminatoryauction) into a different one such as uniform pricing auction or Vickrey auction.Another example is Roberts and Sweeting (2013), who study the effect of changingthe mechanism by which a good is sold (from an auction set-up to a sequential bid-ding design) on expected revenues and payoffs in an incomplete information game.A change in the mechanism alters (unobserved) payoffs and expected revenues.Decomposition-based predictions also do not apply in general when the coun-terfactual analysis involves a change of the action space or the set of players (e.g.mergers). (See Kalouptsidi, Scott, and Souza-Rodrigues (2020) for identiﬁcationresults when the choice sets of agents change by the policy.)Decomposition-based predictions do not apply if the policy affects the payoff com-ponent of the agents that is private information. In particular, they do not apply totax changes on a ﬁrm’s unobservable costs/proﬁt in strategic settings, or the effect of a policy that affects an individual’s (unobservable) income in a strategic choiceof labor supply. This includes the large literature of the effects of corporate taxchanges on proﬁts. When ﬁrms are competing with each other in certain markets,corporate tax changes affect their (unobserved) cost structure and then their proﬁts(for recent studies, see Dowd, Landefeld, and Moore (2017); Bilicka (2019)).

3. Counterfactual Analysis Using Game-Theoretic Models

We introduce a generic ﬁnite player Bayesian game. Each player i has a set ofactions denoted by Y i ⊂ R d Y , d Y ≥ . Deﬁne Y = Y × ... × Y n . Each player i isendowed with a payoff function u i : Y × W → R . Let u = ( u , ..., u n ) . Let W bethe payoff state taking values in W ⊂ R d W for some d W ≥ , and be drawn from adistribution µ W . The payoff state W can include unobserved payoff components. We follow Bergemann and Morris (2016) (BM, hereafter) and call B = ( Y , W , u, µ W ) the basic game . Without loss of generality, for any basic game ( Y , W , u, µ W ) , weassume that W is equal to the support of the distribution µ W .Each player i observes a signal vector T i taking values in the space T i ⊂ R d T , d T ≥ . Deﬁne the signal vector T = ( T , ..., T n ) ∈ T , where T = T × ... × T n .These signals are generated according to a conditional distribution µ T | W of T given W . The informativeness of the signals on the payoff state W is summarized by the(conditional) distribution µ T | W . As in BM, we call I = ( W , T , µ T | W ) informationstructure . A Bayesian game G consists of the basic game and information structure,i.e., G = ( B, I ) .Let Y i be a random vector taking values in Y i , and set Y = ( Y , ..., Y n ) . Let σ ( ·| w, t ) be the conditional distribution of Y given ( W, T ) = ( w, t ) . Let Σ( W , T ) be acollection of such conditional distributions. For each i = 1 , ..., n , we let σ i ( ·| w, t ) bethe marginal conditional distribution of Y i given ( W, T ) = ( w, t ) . Again, followingBM, we call each σ ∈ Σ( W , T ) a decision rule . BM considered the case where the state space and the signal space are ﬁnite sets for simplicity. Inthis paper, we consider more general spaces for the state and signal spaces, because the econome-trician’s models often involve both discrete and continuous variables. Throughout the section, we suppress measure-theoretic qualiﬁers, such as Borel sets, measurablefunctions, or a statement holding almost everywhere. Details of the mathematical set-up are foundin the online appendix. For each i = 1 , ..., n , t i ∈ T i and σ ∈ Σ( W , T ) and any transform τ i : Y i → Y i , wewrite the expected payoff of player i as U i ( τ i , t i ; σ ) = (cid:90) (cid:90) u i ( τ i ( y i ) , y − i , w ) dσ ( y i , y − i | w, t i , t − i ) dµ W,T − i | T i ( w, t − i | t i ) , where µ W,T − i | T i ( · , ·| t i ) denotes the conditional distribution of ( W, T − i ) given T i = t i under the joint distribution µ W,T of ( W, T ) obtained from ( µ T | W , µ W ) . The quantity U i ( τ i , t i ; σ ) denotes the conditional expected payoff of player i given her signal T i = t i when the player i deviates from the action y i recommended according to thedecision rule σ and chooses τ i ( y i ) instead. We say that a decision rule σ ∈ Σ( W , T ) is a Bayes Correlated Equilibrium (BCE) for G if for each i = 1 , ..., n , and each τ i : Y i → Y i and t i ∈ T i , U i ( Id , t i ; σ ) ≥ U i ( τ i , t i ; σ ) , (22)where Id is the identity map on Y i . Denote by Σ BCE ( G ) the set of BCE’s of game G .The notion of BCE here is an adaptation of BCE in BM to our set-up with generalaction and signal spaces. A decision rule σ is a rule according to which a mediatoruses to recommend an action to each individual depending on their signals andpayoff states. This rule is BCE if no individual player has an incentive to deviatefrom the recommendation. Predictions from a Bayesian Game.

To generate predictions from a game G , the econometrician only needs to know the distribution of observable action pro-ﬁles Y conditional on the payoff state W , when such action proﬁles are “induced”by an equilibrium of the game G . Let R be the collection of conditional distribu-tions ρ ( ·| w ) of Y ∈ Y given W = w . We call each member ρ of R a (randomized)reduced form . Given σ ∈ Σ( W , T ) and the information structure I = ( W , T , µ T | W ) ,we say a reduced form ρ σ ∈ R is induced by σ , if for all A ⊂ Y , ρ σ ( A | w ) = (cid:90) σ ( A | w, t ) dµ T | W ( t | w ) . (23)Given G = ( B, I ) , we deﬁne R BCE ( G ) to be the collection of all reduced forms thatare induced by a BCE of game G .The econometrician observes the action proﬁle Y from the game G . In order tocomplete the description of how Y is generated, let us introduce a generic form of the equilibrium selection rule. Given a game G and w ∈ W , an equilibriumselection rule (denoted by e G ( ·| w ) ) of game G is a conditional distribution over Σ BCE ( G ) given W = w . We deﬁne a (randomized) reduced form of game G as ρ G ( A | w ) = (cid:90) Σ BCE ( G ) (cid:90) σ ( A | w, t ) dµ T | W ( t | w ) de G ( σ | w ) , w ∈ W = (cid:90) Σ BCE ( G ) ρ σ ( A | w ) de G ( σ | w ) . Thus the generation of Y is described as follows. First, nature draws the valueof the payoff state W = w from the distribution µ W . Second, nature draws anequilibrium σ ∈ Σ BCE ( G ) from the distribution e G ( ·| w ) . Third, the observed actionproﬁle Y is drawn from the distribution ρ σ ( ·| w ) through the play of the game. Thisis the causal structure of the game-theoretic model for observed actions Y . Hence,the generation of Y is completely described by the couple ( ρ G , µ W ) and can berepresented by ﬁrst drawing W = w from µ W , then drawing Y from the distribution ρ G ( ·| w ) .The reduced form ρ G gives the prediction rule of the game. It shows how we aregoing to predict actions Y when the exogenous payoff state W realizes as w . Theprobability of Y taking a value in a set A when the payoff state W is w is ρ G ( A | w ) .3.2.2. Counterfactual Predictions.

Our counterfactual experiment involves a changeof the payoff state W into f ( W ) for some map f : W → W . Thus each map f trans-forms the original game G by transforming the marginal distribution µ W to µ W ◦ f − .This transformation leads to the change of the basic game B into f ( B ) = ( Y , f ( W ) , u, µ W ◦ f − ) . The integral with respect to the equilibrium selection rule is an integral over a real function overthe space of conditional distributions which we can topologize appropriately. Details can be foundin the online appendix. Here µ W ◦ f − is the distribution of the transformed payoff state f ( W ) when W follows distribution µ W . Let us denote the transformed Bayesian game by f ( G ) =( f ( B ) , I ) . Counterfactual predictions refer to predictions from the counterfactual game f ( G ) using data generated from the original game G . As with G , a counterfac-tual prediction at f ( W ) = w (cid:48) in game f ( G ) can be made from the reduced forms ρ σ f induced by σ f ∈ Σ BCE ( f ( G )) , i.e., ρ σ f ( A | w (cid:48) ) = (cid:90) σ f ( A | w (cid:48) , t ) dµ T | W ( t | w (cid:48) ) . (24)Let e f ( G ) ( ·| w ) be the equilibrium selection rule as a conditional distribution on Σ BCE ( f ( G )) given W = w . Then the equilibrium-based prediction of the coun-terfactual game f ( G ) is given by ρ f ( G ) ( A | f ( w )) = (cid:90) Σ BCE ( f ( G )) ρ σ ( A | f ( w )) de f ( G ) ( σ | f ( w )) , A ⊂ Y . The pair ( ρ f ( G ) , µ W ◦ f − ) completely describes the generation of the counterfactualactions, say Y f , in a counterfactual game f ( G ) : ﬁrst, nature draws W = w (cid:48) from dis-tribution µ W ◦ f − and then Y f is drawn from the conditional distribution ρ f ( G ) ( ·| w (cid:48) ) .Thus, the prediction from the counterfactual game is given by ρ f ( G ) , where the pre-dicted probability of observing Y f in A when the payoff state is W = f ( w ) is givenby ρ f ( G ) ( A | f ( w )) . This is similar to the case of ( ρ G , µ W ) in the original game.3.2.3. Decomposition-Based Predictions.

An alternative way of generating pre-dictions is to use a potential outcome approach. Recall that ρ G describes the causalrelation between the outcome Y and the exogenous payoff state W in the originalgame. Then, we take this causal structure in game G as policy invariant and useit for causal inference as we change the payoff state W into f ( W ) . Let Y ◦ ( w ) ∈ Y be the potential outcome of the game G when the payoff state W is ﬁxed to be w . Note that we assume that the policy does not alter the way the signals are generated. (This doesnot mean the distribution of the signals T remains the same after the policy.) In other words, theinformation structure does not become ( f ( W ) , T , µ T | f ( W ) ) after the policy, where µ T | f ( W ) denotesthe conditional distribution of T given f ( W ) under the joint distribution µ W,T of ( W, T ) in theoriginal game G . For example, if f ( W ) = a for a constant a , then the conditional distribution µ T | f ( W ) now becomes a marginal distribution of T as T and f ( W ) are independent, regardless ofthe value of a . However, in our experiment, signals are generated from the distribution µ T | W ( ·| a ) ,and hence the distribution of the signal T depends on the constant a ﬁxed in the counterfactualexperiment. See the online appendix for its implication on the computation of the posteriors in thecounterfactual game. Using the policy invariance of the causal relation ρ G in game G , we write: P { Y ◦ ( w ) ∈ A } = ρ G ( A | w ) , (25)for any set A ⊂ Y . Then, the potential outcome when the payoff state w is changedto f ( w ) is given by Y ◦ ( f ( w )) with distribution: P { Y ◦ ( f ( w )) ∈ A } = ρ G ( A | f ( w )) . (26)We call ρ G ( ·| f ( · )) the decomposition-based prediction from game G along f . Thisprediction can be used to deﬁne causal effects. such as the average treatment effectof a change of W into f ( W ) can be taken to be τ ATE = (cid:90) E [ Y ◦ ( f ( w )) − Y ◦ ( w )] dµ W ( w ) . Here, the exogeneity of W can be expressed as independence of { Y ◦ ( w ) : w ∈ W } and W . Then, we can rewrite the average counterfactual prediction as (cid:90) ρ G ( A | f ( w )) dµ W ( w ) = (cid:90) P { Y ◦ ( f ( w )) ∈ A } dµ W ( w ) (27) = (cid:90) P { Y ∈ A | W = f ( w ) } dµ W ( w ) . To identify the latter quantity, one does not need to identify the details of the gamesuch as the payoff functions or the equilibrium selection rule. All we need is torecover the relation between Y and W from data.3.2.4. Validity of Decomposition-Based Predictions.

Decomposition-based pre-dictions take the reduced form ρ G from the original game as policy-invariant. How-ever, if the policy creates incentives for agents to deviate from the BCEs of theoriginal game G , the decomposition-based predictions cannot be justiﬁed. We saythat decomposition-based predictions are valid , if they coincide with the equilib-rium based predictions, i.e., if ρ G agrees with ρ f ( G ) on f ( W ) , in the sense thatfor each A ⊂ Y , ρ G ( A | w ) = ρ f ( G ) ( A | w ) for all w ∈ f ( W ) . In other words, if thedecomposition-based predictions are valid, one can simply obtain the equilibrium-based predictions by recovering the decomposition-based predictions from data. Aswe see from the example below, however, decomposition-based predictions are notgenerally valid. Various other causal parameters such as quantile treatment effects can be similarly deﬁned usingdecomposition-based predictions. Example 3.1 (A Finite Player Entry Game) . We consider an entry game of twoﬁrms, based on the example in Section 2.2 in Magnolﬁ and Roncoroni (2019). Thisgame has two ﬁrms, i = { , } , who choose a binary action y i ∈ { , } , where y i = 1 represents entry in the market, and y i = 0 staying out of the market. Payoffs forﬁrm i are given by: u i ( y, W i ) = y i ( δy − i + W i ) , where δ < is a parameter and W i is an i.i.d. random variable following a standardnormal distribution (i.e. µ W is the product of two standard normal distributions).For simplicity, we consider a model without covariates. Consider the case where W i is private information and T i = W i for i = 1 , . The counterfactual experiment is tochange the distribution of the state W i to f ( W i ) = α + W i for α > .Consider the following Bayes Correlated Equilibrium characterized by a mediatorwho suggests the strategy to ﬁrm i of entering the market if W i > ¯ w , where thethreshold ¯ w is such that ﬁrm i is indifferent between entering and not entering themarket. (See Magnolﬁ and Roncoroni (2019)). For simplicity, we will consider anequilibrium selection rule that always picks this BCE in the original game (or itsanalogue in the counterfactual).The decomposition-based prediction of ( Y , Y ) = (1 , along f is: ρ G ( { , }| f ( w ) , f ( w )) = 1 { f ( w ) > ¯ w, f ( w ) > ¯ w } = 1 { w + α > ¯ w, w + α > ¯ w } . (28)However, as we show in the online appendix, there exists a proﬁtable deviationfor ﬁrm i from the mediator’s recommendation. The proﬁtable deviation entailsonly entering the market in the counterfactual at a strictly larger threshold than ¯ w .On the other hand, if the information structure is complete information such that T i = ( W , W ) , for i = 1 , , the decomposition-based prediction can be shown tocoincide with the equilibrium-based prediction in this case. (cid:4) The example above suggests that a decomposition-based prediction does not nec-essarily coincide with an equilibrium-based prediction in the counterfactual gameunless the information structure on the game is “rich enough”. In the next sec-tion, we explore the relation between the information structure and the validity ofdecomposition-based predictions. The validity of decomposition-based predictions depend on whether the relation-ships between endogenous and exogenous variables in the original game are “trans-ferable” to the counterfactual game. We ﬁrst explore transferability between gamesin terms of their equilibria.3.3.1.

Policy-Invariance of Equilibria.

The policy-invariance of equilibria meansthat the set of BCE’s remains invariant after the policy f once the payoff state W is restricted to f ( W ) . Given any decision rule σ ∈ Σ( W , T ) , we let σ | f ( W ) be therestriction of σ to f ( W ) × T , i.e., for any A ⊂ Y , σ | f ( W ) ( A |· ) is a map on f ( W ) × T such that σ | f ( W ) ( A | w, t ) = σ ( A | w, t ) for ( w, t ) ∈ f ( W ) × T . Deﬁne Σ BCE ( G ) | f ( W ) = (cid:8) σ | f ( W ) : σ ∈ Σ BCE ( G ) (cid:9) , (29)so that Σ BCE ( G ) | f ( W ) is the set of BCE’s with the payoff state W restricted to f ( W ) .We say that G is strongly transferable to f ( G ) , if Σ BCE ( G ) | f ( W ) = Σ BCE ( f ( G )) . If G is strongly transferable to game f ( G ) , the set of equilibria remains invariantwhen the payoff state W is restricted to f ( W ) .The following theorem gives a sufﬁcient condition for the information structurethat ensures the strongly transferability of G to f ( G ) . Theorem 3.1.

Suppose that W = ( W , ..., W n ) and T = ( T , ..., T n ) are the payoffstate and the signal proﬁle respectively, jointly distributed as µ W,T in game G . Further-more, suppose that the following conditions hold.(i) For i = 1 , ..., n , T i = W i and W i = ( ˜ W , W i ) , for a common random vector ˜ W .(ii) f ( W ) = (( ˜ f ( ˜ W ) , W ) , ..., ( ˜ f n ( ˜ W ) , W n )) , for some maps ˜ f i : ˜ W → ˜ W , i = 1 , ..., n , where ˜ W is the support of ˜ W .Then, G is strongly transferable to f ( G ) . By the theorem, the strong transferability holds if (i) each player has a commonpayoff component ( ˜ W ) which is observed by every player in the game, and (ii)the policy is restricted to this commonly observed payoff component. Note thatthis policy applied to the payoff component ˜ W can be heterogeneous across theplayers. Furthermore, Theorem 3.1 is silent about the dependence structure among the individual components of the payoff state W . Hence any additional assumptionson the generation of ˜ W can be introduced without conﬂicting the conditions of thetheorem.A special case of the set-up in this theorem is considered, for example, by Grieco(2014) who studied entry decisions among ﬁrms, where the publicly observed pay-off component ˜ W is a market characteristic that is commonly observed by all theﬁrms in the market. A special case of the information structure in the theorem isthe case of a complete information game where each player observes the wholestate vector so that T i = ( ˜ W , ..., ˜ W n ) for all i = 1 , ..., n , ˜ W = ( ˜ W , ..., ˜ W n ) and W i = ˜ W for all i = 1 , ..., n .3.3.2. Coherence of Equilibrium-Selection Rules Between Games.

In order topropose the decomposition-based prediction as the prediction from the counterfac-tual game , we need to consider how the equilibrium is selected after the policy.(Recall the data generating process described in Section 3.2.1.) To produce a nat-ural connection between e G and e f ( G ) , suppose for simplicity that f ( W ) = W . Theconnection we propose is a weak invariance condition which requires that the prob-ability of selecting a set of equilibrium decision rules should remain the same afterthe policy if the decision rules continue to be equilibria after the policy. We can nat-urally extend this notion to the case where f ( W ) (cid:40) W . Given G and f ( G ) , we saythat e G and e f ( G ) are strongly coherent , if, for any A ⊂ Σ BCE ( G ) | f ( W ) ∩ Σ BCE ( f ( G )) ,we have e f ( G ) ( A | w ) = e G (cid:0)(cid:8) σ ∈ Σ BCE ( G ) : σ | f ( W ) ∈ A (cid:9) | w (cid:1) . (30)If e G and e f ( G ) are strongly coherent, this means that if the policy does not alter aset of equilibrium decision rules, it should not alter the probability of the set beingselected. Theorem 3.2.

Suppose that (i) G is strongly transferable to f ( G ) , and (ii) e G and e f ( G ) are strongly coherent. Then the decomposition-based predictions coincide withequilibrium-based predictions. Extensions to Other Solution Concepts.

One might wonder whether theresults in the previous sections extend to other solution concepts. Many other solu-tion concepts including Bayes Nash Equilibria (BNE) are restricted versions of theBCE. Thus, we let Σ (cid:48) ( W , T ) ⊂ Σ( W , T ) be a given subcollection of decision rules σ , and consider the restricted BCE: Σ (cid:48) BCE ( G ) = Σ BCE ( G ) ∩ Σ (cid:48) ( W , T ) . (31)We call this set the set of Bayes Correlated Equilibria (BCE) restricted to Σ (cid:48) ( W , T ) .For example, suppose that Σ (cid:48) ( W , T ) is the collection of decision rules σ of thefollowing form: for any A = A × ... × A n , σ ( A | w, t ) = n (cid:89) i =1 β i ( A i | w, t i ) , (32)where β i ( ·| w, t i ) is a conditional distribution on Y i given ( W, T i ) = ( w, t i ) . Thenthe BCE restricted to Σ (cid:48) ( W , T ) is a BNE. Hence one can view the set of BNE’s as areﬁnement of the set of BCE’s, where a BNE does not permit correlation betweenactions once conditioned on the signal proﬁle and the payoff state, whereas a BCEdoes. Another example is the Bayesian solution of Forges (1993), which assumesthat the proﬁle σ ( ·| w, t ) is independent of the state w .Let us turn to the validity of decomposition-based predictions in terms of a so-lution concept stronger than the BCE. First, deﬁne the equilibrium selection rule e (cid:48) G ( ·| w ) as a conditional distribution on Σ (cid:48) BCE ( G ) given W = w . Similarly as in (30),we say that the equilibrium selection rules e (cid:48) G and e (cid:48) f ( G ) are strongly coherent if forany A ⊂ Σ (cid:48) BCE ( G ) | f ( W ) ∩ Σ (cid:48) BCE ( f ( G )) , e (cid:48) f ( G ) ( A | w ) = e (cid:48) G (cid:0)(cid:8) σ ∈ Σ (cid:48) BCE ( G ) : σ | f ( W ) ∈ A (cid:9) | w (cid:1) . (33)Then, we obtain an analogue of Theorem 3.2 as follows. Theorem 3.3.

Suppose that G is strongly transferable to f ( G ) , and e (cid:48) G and e (cid:48) f ( G ) arestrongly coherent for some map f : W → W . Then the decomposition-based predictionscoincide with the equilibrium-based predictions in a counterfactual game f ( G ) whenthe decision rules are restricted to Σ (cid:48) ( W , T ) . Therefore the validity of decomposition-based predictions in terms of a morerestrictive solution concept follows from that in terms of BCE, once the coherencecondition on the equilibrium selection rules is modiﬁed properly. The game in Theorem 3.1 requires that players should not know more than thepayoff state W after communicating with each other. This assumption can be un-realistic: players may obtain additional information that does not directly enter thepayoff function through communications or through observing some common sig-nals, but may affect their equilibrium actions, by creating or altering coordinationbetween the players.Here we consider a situation where the policy potentially alters the informationstructure of the game. In this situation, the notion of strong transferability is notadequate, as it requires that the information structure remain the same after thepolicy. We introduce a weaker notion of transferability between games in terms of reduced forms below.3.4.1. Policy-Invariance of Reduced Forms.

First, deﬁne the set of reduced formsfrom a game with the payoff state restricted to f ( W ) ⊂ W , i.e., R BCE ( G ) | f ( W ) = (cid:8) ρ | f ( W ) : ρ ∈ R BCE ( G ) (cid:9) , where ρ | f ( W ) is ρ restricted to f ( W ) , i.e, for each A ⊂ Y , ρ | f ( W ) ( A |· ) is a map on f ( W ) such that ρ | f ( W ) ( A | w ) = ρ ( A | w ) for all w ∈ f ( W ) .Now, consider a counterfactual game f ( G (cid:48) ) for a map f : W → W , where twogames G and G (cid:48) share the same basic game, but can have different informationstructures. Hence a change of game from G to f ( G (cid:48) ) represents that the policy f not only changes the payoff state W but also the information structure as a result.We say G is weakly transferable to f ( G (cid:48) ) , if R BCE ( G ) | f ( W ) = R BCE ( f ( G (cid:48) )) . Thus, when G is weakly transferable to f ( G (cid:48) ) , all predictions after the policy coin-cide with some predictions from the original game with the support of the payoffstate restricted to f ( W ) . By the deﬁnition of R BCE ( G ) , if G is strongly transferableto f ( G ) , it is weakly transferable to f ( G ) .Unlike strong transferability, weak transferability exhibits a robustness propertywith variations in the information structures. To see this, let us follow BM and intro-duce an ordering between information structures. Given two information structures I = ( W , T , µ T | W ) and I (cid:48) = ( W , T (cid:48) , µ T (cid:48) | W ) , we say that I is individually sufﬁcient for I (cid:48) (with respect to the prior µ W ) if there exists a joint conditional distribution µ T,T (cid:48) | W of ( T, T (cid:48) ) given W such that for all B ⊂ T , B (cid:48) ⊂ T (cid:48) and B (cid:48) i ⊂ T (cid:48) i , µ T,T (cid:48) | W ( T × B (cid:48) | w ) = µ T (cid:48) | W ( B (cid:48) | w ) , and(34) µ T,T (cid:48) | W ( B × T (cid:48) | w ) = µ T | W ( B | w ) , and µ T (cid:48) i | T i ,T − i ,W ( B (cid:48) i | t i , t − i , w ) does not depend on ( t − i , w ) . In other words, I is indi-vidually sufﬁcient for I (cid:48) if there exists a joint conditional distribution µ T,T (cid:48) | W that isconsistent with the marginals in I and I (cid:48) , and for each i = 1 , ..., n , T (cid:48) i is condition-ally independent of ( T − i , W ) given T i . Hence in this case, the signal T (cid:48) i of agent i does not reveal anything new about the payoff state W or other players’ signals T − i beyond the signal T i . We interpret the information structure I as “richer” than I (cid:48) if I is individually sufﬁcient for I (cid:48) .If I is individually sufﬁcient for I (cid:48) and I (cid:48) is individually sufﬁcient for I , we saythat I and I (cid:48) are mutually individually sufﬁcient . Note that mutual sufﬁciency of I and I (cid:48) does not imply that their information structures are the same. For exam-ple, suppose that T i = ( V i , S i , ) and T (cid:48) i = ( V i , S (cid:48) i ) , and that ( S i , S (cid:48) i ) ’s are independentacross i ’s and independent of ( W, V ) , V = ( V , V ) . Then T i is conditionally inde-pendent of ( T (cid:48)− i , W ) given T (cid:48) i . It follows by symmetry that I and I (cid:48) are mutuallyindividually sufﬁcient. However, information structures, I and I (cid:48) , are different.The following result of BM reveals a tight relationship between the informationstructures and the set of reduced forms. Theorem 3.4 (Bergemann and Morris (2016)) . Information structure I = ( W , T , µ T | W ) is individually sufﬁcient for another information structure I (cid:48) = ( W , T (cid:48) , µ (cid:48) T | W ) if andonly if for any basic game B = ( Y , W , u, µ W ) , R BCE ( G ) ⊂ R BCE ( G (cid:48) ) , (35) where G = ( B, I ) and G (cid:48) = ( B, I (cid:48) ) . As the information set becomes richer, the set of reduced forms permitted by theequilibria of the game shrinks. This is because with a larger information set, eachplayer is more likely to ﬁnd it proﬁtable to deviate from the recommended actionsimplied by the reduced forms. Bergemann and Morris (2016) proved this result for the case where W and T are ﬁnite sets. Wefollow their arguments and provide the proof for the case of general spaces W and T in the onlineappendix. Since any information structure I is mutually individually sufﬁcient for itself triv-ially, Theorem 3.4 implies that mutual individual sufﬁciency yields an equivalencerelation among the information structures sharing the same state space W . Thisobservation yields the following robustness result. Theorem 3.5.

Suppose that G = ( B, I ) is weakly transferable to f ( G ) = ( f ( B ) , I ) and information structures I , I (cid:48) and I (cid:48)(cid:48) are mutually individually sufﬁcient. Then G (cid:48) = ( B, I (cid:48) ) is weakly transferable to f ( G (cid:48)(cid:48) ) = ( f ( B ) , I (cid:48)(cid:48) ) . Therefore, the weak transferability between G and f ( G ) is robust to the expan-sion of the information structure which maintains mutual individual sufﬁciency.3.4.2. Examples of Information Structures with Mutual Individual Sufﬁciency.

We consider special cases where two information structures are mutually individu-ally sufﬁcient. Given a game G = ( B, I ) , we say that G (cid:48) = ( B, I (cid:48) ) is an indepen-dent expansion of G , if I (cid:48) = ( W , T (cid:48) , µ T (cid:48) | W ) is obtained by adding new signals S i to the previous signal T i in I , so that T (cid:48) i = ( T i , S i ) , and T (cid:48) = T × S , for some set S = S × ... × S n , and there exists a joint distribution µ T,S,W of ( T, S, W ) under which µ S i | T,W ( A i | t ) = µ S i | T i ( A i | t i ) , (36)for each i = 1 , ..., n , and for all A i ⊂ S i . If signals increase as an independent ex-pansion, the new (additional) signal does not reveal information about the previoussignals of other players or the payoff state beyond what the player already knew,but this new information itself can be correlated across the players, and affect theway players coordinate withe each other. Proposition 3.1.

Let I be the class of information structures that are independentexpansions of a given information structure I . Then any two information structures I (cid:48) and I (cid:48)(cid:48) in I are mutually individually sufﬁcient. Therefore, any two games with the same basic game but with different infor-mation structures that are independent expansions of some information structureinduce the same set of reduced-forms.Now, consider a case where the policy causes an expansion of an informationstructure that is already rich enough. We say that an information structure I =( W , T , µ T | W ) is full , if there exists a bijection h i : W → T i for each i such that for all A ⊂ T , µ T | W ( A | w ) = 1 { h ( w ) ∈ A } , for µ W -almost every w ∈ W , where h = ( h , ..., h n ) . Hence every player knowsthe payoff W and does not have more signals than this state vector. We say thatan information structure I is maximal , if I is individually sufﬁcient for some fullinformation structure. The proposition below shows that maximal informationstructures are mutually individually sufﬁcient.

Proposition 3.2.

Any two maximal information structures I (cid:48) = ( W , T (cid:48) , µ (cid:48) T | W ) and I (cid:48)(cid:48) = ( W , T (cid:48)(cid:48) , µ (cid:48)(cid:48) T | W ) are mutually individually sufﬁcient. Thus, two games with the same basic game but with two different maximal in-formation structures induce the same set of reduced forms in BCE. Hence the set ofpredictions from the BCEs of the game remain robust to variations between maxi-mal information structures.3.4.3.

Robustness to Information Structures.

In this subsection, we present aresult that shows the validity of decomposition-based predictions when a policychanges the information structure. For a policy analysis that alters the informationstructure, we introduce a weaker condition relating two equilibrium selection rulesfor two games. We deﬁne a conditional distribution on R BCE ( G ) given W = w : foreach set B ⊂ R BCE ( G ) and w ∈ W , γ G ( B | w ) = e G ( { σ ∈ Σ BCE ( G ) : ρ σ ∈ B } | w ) , so that γ G is a conditional probability on R BCE ( G ) induced by the equilibrium selec-tion rule e G . For two games G and f ( G (cid:48) ) with the same basic game, we say that e G and e f ( G (cid:48) ) are weakly coherent , if for any A ⊂ R BCE ( G ) | f ( W ) ∩ R BCE ( f ( G (cid:48) )) , γ f ( G (cid:48) ) ( A | w ) = e G (cid:0)(cid:8) σ ∈ Σ BCE ( G ) : ρ σ | f ( W ) ∈ B (cid:9) | w (cid:1) , (37) A game with a full information structure is different from a complete information game, becauseindividual players cannot have more signals than the payoff state W . A game with a maximal infor-mation structure is also different from a complete information game, because a maximal informationstructure does not require players to know all the other players’ signals. In a complete informationgame, each player observes the payoff state and all the other players’ signals. When the signals forindividual players cannot be more than the payoff state, the three notions of information structures- complete information, maximal information and full information - coincide. for all w ∈ f ( W ) . Thus if e G and e f ( G (cid:48) ) are weakly coherent, this means that ifa set of reduced forms induced by a set of equilibria in G remains invariant afterthe policy change (when restricted to f ( W ) ), the probability of selecting the setof reduced forms remains invariant in the counterfactual game f ( G (cid:48) ) . The lemmabelow shows that when G is strongly tranferable to f ( G ) , strong coherence impliesweak coherence. Lemma 3.1.

Suppose that G is strongly tranferable to f ( G ) . Then, if e G and e f ( G ) arestrongly coherent, they are weakly coherent. Certainly the converse does not hold, because different equilibria can generatethe same reduced form. We are now ready to present the validity of decomposition-based predictions.

Theorem 3.6.

Suppose that G and G (cid:48) are two games sharing the same basic gameand f : W → W is a given policy, which satisfy the following conditions.(i) G is weakly transferable to f ( G (cid:48) ) .(ii) e G and e f ( G (cid:48) ) are weakly coherent.Then the decomposition-based predictions coincide with the equilibrium-based pre-dictions in the counterfactual game f ( G (cid:48) ) . In this section, we consider a situation where f ( W ) (cid:54)⊂ W , i.e., a situation in whichthe policy sends the payoff state outside of the support of the state vector W in theoriginal game. First, given the original game G as before, we deﬁne a new gamewith the payoff state space given by W (cid:48) : G (cid:48) = ( B (cid:48) , I (cid:48) ) , (38)where B (cid:48) = ( Y , W (cid:48) , u (cid:48) , µ (cid:48) W ) and I (cid:48) = ( W (cid:48) , T , µ (cid:48) T | W ) . Note that G and G (cid:48) have incommon the action space Y and the signal space T . For a new policy f : W → W (cid:48) ,we denote the counterfactual game by f ( G (cid:48) ) = ( f ( B (cid:48) ) , I (cid:48) ) , where f ( B (cid:48) ) = ( Y , f ( W (cid:48) ) , u (cid:48) , µ (cid:48) W ◦ f − ) . (39)Once again, our task is to generate predictions from the counterfactual game f ( G (cid:48) ) using data from the original game G . In the spirit of ceteris paribus in comparative statics, we consider a counterfactualgame G (cid:48) that is “as close as possible to” the original game G , so that the only changewe consider is the transform of the payoff state W into f ( W ) . We say that G and G (cid:48) are agreeable , if ( u, µ W , µ T | W ) and ( u (cid:48) , µ (cid:48) W , µ (cid:48) T | W ) agree on W ∩ W (cid:48) , i.e., u ( w ) = u (cid:48) ( w ) for all w ∈ W ∩ W (cid:48) , µ W ( A ) = µ (cid:48) W ( A ) for all sets A ⊂ W ∩ W (cid:48) , and for all A ⊂ T , µ T | W ( A | w ) = µ (cid:48) T | W ( A | w ) for all w ∈ W ∩ W (cid:48) . Note that the condition of agreeabilityrequires that everything about the two games “agrees on” the intersection of theirpayoff state spaces. If W and W (cid:48) are disjoint, G and G (cid:48) are trivially agreeable.Suppose that G and G (cid:48) are agreeable and a map f : W → W (cid:48) is given. Then wesay that G is strongly transferable to f ( G (cid:48) ) , if Σ BCE ( G ) | f ( W ) = Σ BCE ( f ( G (cid:48) )) | W . (40)We also extend the notion of strong coherence. We say that e G and e f ( G (cid:48) ) are strongly coherent , if for any A ⊂ Σ BCE ( G ) | f ( W ) ∩ Σ BCE ( f ( G (cid:48) )) | W , we have e f ( G (cid:48) ) ( A | w ) = e G (cid:0)(cid:8) σ ∈ Σ BCE ( G ) : σ | f ( W ) ∈ A (cid:9)(cid:1) . (41)The theorem below extends the validity result of decomposition-based predictions. Theorem 3.7.

Suppose that G and G (cid:48) are agreeable and G is strongly transferable to f ( G (cid:48) ) for some f : W → W (cid:48) . Furthermore, assume that e G and e f ( G (cid:48) ) are stronglycoherent. Then, (cid:90) f ( W ) ∩ W ρ G ( A | w ) d ( µ W ◦ f − )( w ) (42) ≤ (cid:90) f ( W ) ρ f ( G (cid:48) ) ( A | w ) d ( µ W ◦ f − )( w ) ≤ (cid:90) f ( W ) ∩ W ρ G ( A | w ) d ( µ W ◦ f − )( w ) + µ W (cid:0) f − ( f ( W ) \ W ) (cid:1) . The theorem shows that the counterfactual prediction in this case is interval-identiﬁed. The length of the interval depends on the size of the last term: µ W (cid:0) f − ( f ( W ) \ W ) (cid:1) . (43)This term represents the probability that the policy sends the payoff state outsideof the support of the payoff state in the original game. This term tends to increaseif the pollicy f transforms the payoff state further away from the support. We callthis term the Extrapolation Error Bound . The extrapolation bound gives a sense of the maximal extrapolation error when one does not rely on the functional formrestrictions in the speciﬁcation details of the game, such as payoff functions anddistributional assumptions on the unobserved heterogeneity.

4. Predictions Using a Partially Observed Payoff State

To implement decomposition-based predictions as in (27), we need to recoverthe conditional distribution P { Y ∈ A | W = w } from data. However, we rarelyobserve the full payoff state W in practice. Let us assume that W = ( X, ε ) , where X = ( X i ) ni =1 is the observed payoff component which consists of all the covariatesof the players in the game, and ε = ( ε i ) ni =1 is the unobserved component. If X and ε are independent under µ W and f ( W ) = ( ˜ f ( X ) , ε ) , the (average) decomposition-based prediction is straightforward to obtain: (cid:90) ρ G ( A | f ( w )) dµ W ( w ) = (cid:90) (cid:90) P { Y ∈ A | ( X, ε ) = ( ˜ f ( x ) , ¯ ε ) } dµ X ( x ) dµ ε (¯ ε )= (cid:90) P { Y ∈ A | X = ˜ f ( x ) } dµ X ( x ) . Hence if A = A × ... × A n ⊂ Y and we let the i -th marginal of ρ G ( A | f ( w )) bedenoted by ρ G,i ( A i | f ( w )) , then for each i = 1 , ..., n , (cid:90) ρ G,i ( A i | f ( w )) dµ W ( w ) = (cid:90) P { Y i ∈ A i | X = ˜ f ( x ) } dµ X ( x ) . (44)Thus, the decomposition-based predictions can be carried out for the case withpartially observed payoff states, using observed covariates. Below we present threevariants of this approach: the ﬁrst one focuses on a situation with endogeneity (i.e., X i being potentially correlated with ε i ), the second one uses an index structure ofthe game, and the third one considers using an additional causal relation amongthe covariates. If X and ε are potentially correlated, the conditional distribution of Y given W = w cannot be identiﬁed from the joint distribution of ( Y, X ) . Nevertheless, While the assumption of independence between X and ε is far from innocuous, this assumption isoften used in structural econometric models for practical convenience. the decomposition approach can still be implemented if data has additional infor-mation. One possibility is the use of the control function approach (Blundell andPowell (2003) and Imbens and Newey (2009)). Suppose that X and ε are permit-ted to be correlated, but that there is a variable V (called a control function) suchthat the following conditions hold:(i) X and ε are independent conditional on V .(ii) The support of the conditional distribution of V given X = x is invariant as x varies in f ( X ) , where X represents the support of X .Then, we can see that (cid:90) ρ G ( A | f ( w )) dµ W ( w ) = (cid:90) P { Y ∈ A | ( X, ε ) = ( ˜ f ( x ) , ¯ ε ) } dµ X,ε ( x, ¯ ε )= (cid:90) (cid:90) P { Y ∈ A | ( X, ε, V ) = ( ˜ f ( x ) , ¯ ε, v ) } dµ X,ε,V ( x, ¯ ε, v )= (cid:90) (cid:90) P { Y ∈ A | ( X, V ) = ( ˜ f ( x ) , v ) } dµ X,V ( x, v ) . To obtain the control function V from data, we can follow the approach of Imbensand Newey (2009). In the online appendix, we illustrate the control function ap-proach in an applied example for a workhorse model of consumer demand as inBerry, Levinsohn, and Pakes (1995). In many examples, the payoff state W i of each player i depends on observedcovariates through the index structure as follows: W i = ϕ i ( X (cid:48) β i , ε ) , (45)for some map ϕ i that is possibly unknown to the econometrician, and β i is theplayer-speciﬁc coefﬁcient vector. Suppose that the policy f takes the form: f ( W ) = ( ϕ ( ˜ f ( X ) (cid:48) β , ε ) , ..., ϕ n ( ˜ f ( X ) (cid:48) β n , ε )) , (46) Note that the covariate X = ( X j ) nj =1 in the payoff state W i of player i includes the covariates ofall the players in the game, because other players’ covariates affect the payoff of player i . The index X (cid:48) β i involves X as a vectorized version. for some map ˜ f . If X is independent of ε = ( ε j ) nj =1 , we ﬁnd that for all A i ⊂ Y i , (cid:90) ρ G,i ( A i | f ( w )) dµ W ( w )= (cid:90) P { Y i ∈ A i | ( X (cid:48) γ , ..., X (cid:48) γ n ) = ( ˜ f ( x ) (cid:48) γ , ..., ˜ f ( x ) (cid:48) γ n ) } dµ X ( x ) , where γ i = β i / (cid:107) β i (cid:107) . The conditional probability inside the integral takes the formof a multi-index model (see Donkers and Schafghans (2008) and Xia (2008) andreferences therein.) This reformulation of the decomposition-based prediction canbe useful for dimension reduction when the game involves only a few players butthe dimension of X i is relatively large.

5. Empirical Application: Ciliberto and Tamer (2009) Re-visited

We revisit the counterfactual analysis in Ciliberto and Tamer (2009) using ourresults from Section 3. They investigated the effect of the repeal of the Wrightamendment on airline entry in markets out of Dallas Love Field Airport. The Wrightamendment had been implemented in 1979 to stimulate the use of the newer (andnot as central) Dallas Fort Worth (DFW) Airport. As of the early 2000’s, it restrictedthe ﬂights out of the central Dallas Love Field to other cities in Texas or those fromsome neighboring states. A full repeal of the amendment was agreed by the majorairlines and DFW Airport in 2008, and was to be fully implemented in 2014. This repeal could have led to signiﬁcant changes in market entry and, hence, onconsumer welfare.Ciliberto and Tamer (2009) produced a counterfactual prediction of the outcomesafter the repeal of the Wright amendment, after estimating the identiﬁed set froma complete information entry-exit game permitting multiple equilibria. This is thegame presented in Example 2.6.1, but with more than 2 players. A market wasdeﬁned by a route between two airports, irrespective of directions or layovers. Theymodeled the Wright amendment as a dummy variable covariate, X W righti,m , which The agreement involved, most notably, decreasing the number of gates in Dallas Love Field torestrict its impact on Dallas Fort Worth. We provide extensive details on the empirical application, including the description of the covari-ates, estimators and inference procedures in the online appendix. equaled 1 if market m was affected by the Wright amendment (affecting all the ﬁrmsin the market) and 0 otherwise. The counterfactual experiment of repealing theWright amendment set X W righti,m to 0 and studied its effects on market entry. We notethat this experiment set X W righti,m to values within its support in the data. Since thegame is a complete information game, if the regularity condition on the equilibriumselection mechanism holds, the decomposition-based prediction coincides with theequilibrium-based prediction. For this application, we followed their work and focused on the decisions of the 4main airlines in their analysis (American Airlines, Delta Airlines, Southwest Airlinesand United Airlines). We performed a dimension reduction to facilitate nonpara-metric estimation (the parametric estimator below is robust to this reduction). Thisreduction is useful because our decomposition-based prediction must include allﬁrm-level covariates within X m (the covariates X j,m for every j (cid:54) = i impact i ’s entrydecision in equilibrium through affecting j ’s decision to enter). We dropped 3out of 8 market level variables (market size, per capita income growth and Dallasmarket) that lacked variation for nonparametric estimation. We dropped 1 addi-tional ﬁrm-market level covariate (a proxy for i ’s cost) using the causal structure ofthe game, because this variable was a function of other covariates in the analysis(including route distance) by construction.We compare the results from the decomposition approach to the original resultsin Ciliberto and Tamer (2009). The results of this exercise are shown in Columns1 and 2 of Table 1 using two different estimators (a linear/parametric and a non-parametric estimator), while their original results are shown in Column 3. In this model, unobservable payoff components ε i were assumed to be i.i.d. and independent ofall covariates, so we do not need to use the control function approach. In this context there are 8 market-level covariates and 2 variables at the ﬁrm-market level. Theyare described in detail in the online appendix. This generates a total of 16 covariates to be includedin the analysis. While the parametric estimator is robust to including all 16 covariates due to itsadditional structure, the performance of the nonparametric estimator is improved with a smallersubset. Both market size and per capita income growth appear well predicted by income per capita andmarket presence (variables that already capture economic performance at the market level andincluded in the analysis). Meanwhile the binary Dallas market variable was highly correlated withthe Wright amendment variable - by construction, any market in Dallas that does not use DallasLove Field Airport must be using Dallas Fort Worth Airport instead. However, Dallas Fort Worth isthe hub for American Airlines - and American Airlines’ market presence was already included as acovariate. Details are provided in the online appendix. In the ﬁrst column, we assume that the expected entry of i in market m is givenby the linear form E [ Y i,m | X m = x m ] = x (cid:48) m γ i and we estimate it using Ordinary LeastSquares in a linear regression framework, where Y i,m denotes the indicator of entryby ﬁrm i in market m . We then present the counterfactual estimate which is theestimated change in entry in the counterfactual game relative to the data for the m (cid:48) markets previously affected by the policy. This can be written as m (cid:48) (cid:80) m ∈M ˜ x (cid:48) m ˆ γ i − Y i ,where M represents the set of markets previously affected by the Wright amend-ment, ˜ x m represents the values of the covariates for market m in the counterfactual,and Y i is the average outcome in the data for ﬁrm i in those m (cid:48) markets. In thesecond column, we estimate the conditional expectation E [ Y i,m | X m = x ] = g i ( x ) nonparametrically. We use a leave one out kernel estimator. The bandwidth ischosen by cross-validation. (See the online appendix for details.) By comparingColumn 1 and Column 2, we see that the results do not appear to be driven by thenonlinearity in the conditional expectation function. In the third column, we restatethe results of counterfactual predictions from the main speciﬁcation in Ciliberto andTamer (2009) (Table VII, Column 1). These are the maximum predicted increase inthe share of Dallas Love Field markets that are served by each airline following the2014 repeal of the Wright amendment according to their estimates. We ﬁnd thatour results are broadly consistent with theirs (i.e. below their estimated maximumentry).Now we take this revisited exercise one step further. The Wright amendment wasactually repealed in 2014. This means that we can observe how airlines enteredthe markets after the repeal of the Wright Amendment. We compile 2015 data fromthe DB1B Market and Ticket Origin and Destination Dataset of the U.S. Departmentof Transportation (the same source as the original dataset), and treat it the sameway as the original authors - see the online appendix for details. We focus on thesame 81 markets from the original paper. The change in entry in 2015 in the datarelative to the original data is shown in Column 4 of Table 1. We then compare thecounterfactual estimates from Ciliberto and Tamer (2009) and our decompositionapproach to the realized outcomes. This speciﬁcation is misspeciﬁed because a linear reduced form cannot be induced from equi-libria in the entry-exit game. However, it is an approximation that serves as a simple and easilyinterpretable benchmark. T ABLE Ciliberto and Tamer (2009) Revisited: Model Predicted and Em-pirical Counterfactuals of the Repeal of the Wright Amendment

Outcome: Change in Probability of Entry in Dallas-Love MarketsDecomposition Method Decomposition Method Ciliberto & Tamer (2009) EmpiricalLinear Model Nonparametric Model Maximum Predicted EntryAmerican Airlines -0.030 0.128 0.463 -0.04(0.037) (0.036)Delta Airlines -0.023 0.174 0.499 0.46(0.042) (0.040)Southwest Airlines 0.508 0.451 0.474 0.471(0.038) (0.057)United Airlines -0.009 0.043 0.147 0(0.032) (0.017)

Notes: We report the estimated counterfactual changes to the entry of major airlines into Dallas Lovemarkets following the repeal of the Wright Amendment. In Columns (1)-(2), we use our decompositionapproach to provide point estimates of this counterfactual effect, using the same pre-2014 dataset ofCiliberto and Tamer (2009). Column (1) uses a linear model, while Column (2) reports a nonparametricestimate. Standard errors for these columns are computed by the bootstrap, following the approach inthe online appendix with B = 999 replications. In the third column, we restate the results in TableVII, Column 1 of Ciliberto and Tamer (2009), who presented the maximum change in entry of thoseairlines. Finally, the Wright Amendment was fully repealed in 2014, allowing all airlines to enter thosemarkets. The ﬁnal column shows the realized values of the change in entry for those airlines in affectedmarkets in 2015, after the repeal. The results show that the decomposition method (Columns (1)-(2)) using pre-repeal data performs well relative to the empirical outcomes in Column (4). Boththe parametric and nonparametric estimates of the decomposition approach capturethe large increase in entry by Southwest Airlines, and the negligible post-repeal en-try by American Airlines and United Airlines. This lack of entry by American andUnited post-repeal is broadly consistent with the multiple equilibria in an entrymodel: Southwest and Delta entered frequently after the repeal, but American andUnited stayed out of those markets. The empirical values are also within the maxi-mum bounds reported in Ciliberto and Tamer (2009). However, as the authors onlyreported the maximum predicted entry, their results appear further apart from therealized values for American and United. While it is possible that the lack of entry results for these ﬁrms was within Ciliberto and Tamer (2009)’s estimated set, thiswould imply their counterfactual analysis predicted a range of at least 0 to 50% ofmarkets entered by those airlines, a large range for policy analysis.While the decomposition-based results perform well for American, United andSouthwest, the method performs worse in predicting entry by Delta Airlines. Whilethis could simply be a feature of out-of-sample prediction, it is possible that changesto Delta between 2008-2014 (including the acquisition of Northwest Airlines, whichwas completed in 2010), and/or the deﬁnition of markets in this dataset affectedthe model’s performance. Nevertheless, we consider that the decomposition-basedprediction performed well overall in this out-of-sample exercise, particularly as itused data from years before the policy was implemented and matched well withmultiple observed outcomes.

6. Conclusion

Decomposition methods are attractive in counterfactual analysis for their compu-tational tractability and simplicity. However, those predictions might not be incen-tive compatible in strategic settings. Leveraging the solution concept of Bayes Cor-related Equilibria of Bergemann and Morris (2016), we showed that decomposition-based predictions coincide with equilibrium-based predictions when there is a richenough information structure (e.g. complete information games, or policies thataffect a commonly observed component in incomplete information games), policychanges fall within their support in the data, and equilibrium selection rules behavecoherently as we move to a counterfactual game.Dynamic games are largely absent from our analysis. A policy in these gamesgenerally induces a change of the agents’ posteriors through a change of the futurepath of the payoff states, and it seems nontrivial to maintain posterior invarianceafter the policy. It appears interesting in this regard to note the approach of Kocher-lakota (2019) who introduced independent shocks to the policy so that the pos-terior of the private sector for future policies remains invariant. Future work canexpand this insight and explore the validity of decomposition-based predictions ina dynamic setting. Delta Airlines only operates from Dallas Love Field to Atlanta, but there are multiple connectingﬂights from Atlanta. Routes that include layovers are considered as markets by Ciliberto and Tamer(2009). References A GUIRREGABIRIA , V.,

AND

A. N

EVO (2012): “Recent Developments in Empirical IO:Dynamic Demand and Dynamic Games,”

Report, University of Toronto .A GUIRREGABIRIA , V.,

AND

J. S

UZUKI (2014): “Identiﬁcation and counterfactuals indynamic models of market entry and exit,”

Quantitative Marketing and Economics ,12(3), 267–304.A

NGRIST , J. D.,

AND

V. L

AVY (1999): “Using Maimonides’ rule to estimate the ef-fect of class size on scholastic achievement,”

The Quarterly Journal of Economics ,114(2), 533–575.A

RCIDIACONO , P.,

AND

R. A. M

ILLER (2020): “Identifying dynamic discrete choicemodels off short panels,”

Journal of Econometrics , 215(2), 473–485.A

UMANN , R. J. (1974): “Subjectivity and correlation in randomized strategies,”

Journal of Mathematical Economics , 1(1), 67–96.B

ERGEMANN , D., B. A. B

ROOKS , AND

S. M

ORRIS (2019): “Counterfactuals withLatent Information,”

Cowles Foundation Discussion Paper, 2162, Yale University .B ERGEMANN , D.,

AND

S. M

ORRIS (2016): “Bayes Correlated Equilibrium and theComparison of Information Structures in Games,”

Theoretical Economics , 11, 487–522.B

ERRY , S., J. L

EVINSOHN , AND

A. P

AKES (1995): “Automobile Prices in MarketEquilibrium,”

Econometrica , 60, 889–917.B

ILICKA , K. A. (2019): “Comparing UK tax returns of foreign multinationals tomatched domestic ﬁrms,”

American Economic Review , 109(8), 2921–53.B

LINDER , A. S. (1973): “Wage Discrimination: Reduced Form and Structural Esti-mates,”

Journal of Human Resources , pp. 436–455.B

LUNDELL , R.,

AND

J. L. P

OWELL (2003): “Endogeneity in Nonparametric and Semi-parametric Regression Models,” in

Advances in Economics and Econometrics , ed.by L. Dewatripont, L. Hansen, and

S. Turnovsky, vol. 2, pp. 312–357. CambridgeUniversity Press, Cambridge.B

RESNAHAN , T. F.,

AND

P. C. R

EISS (1991): “Empirical Models of Discrete Games,”

Journal of Econometrics , 48, 57–81.C

HERNOZHUKOV , V., I. F

ERN ´ ANDEZ -V AL , AND

B. M

ELLY (2013): “Inference on coun-terfactual distributions,”

Econometrica , 81(6), 2205–2268. C HETTY , R. (2009): “Sufﬁcient Statistics for Welfare Analysis: A Bridge BetweenStructural and Reduced-Form Methods,”

Annual Review of Economics , 1, 451–487.C

HETTY , R. (2012): “Bounds on elasticities with optimization frictions: A synthesisof micro and macro evidence on labor supply,”

Econometrica , 80(3), 969–1018.C

ILIBERTO , F.,

AND

E. T

AMER (2009): “Market Structure and Multiple Equilibria inAirline Markets,”

Econometrica , 77, 1791–1828.C

OTTER , K. D. (1991): “Correlated Equilibrium in Games with Type-DependentStrategies,”

Journal of Economic Theory , 54, 48–68.D I N ARDO , J., N. M. F

ORTIN , AND

T. L

EMIEUX (1996): “Labor Market Institutionsand the Distribution of Wages, 1973-1992: A Semiparametric Approach,”

Econo-metrica , 64, 1001–1044.D

ONKERS , B.,

AND

M. S

CHAFGHANS (2008): “Speciﬁcation and Estimation of Semi-parametric Multiple-Index Models,”

Econometric Theory , 24, 1584–1606.D

OWD , T., P. L

ANDEFELD , AND

A. M

OORE (2017): “Proﬁt shifting of US multina-tionals,”

Journal of Public Economics , 148, 1–13.E

LYAKIME , B., J.-J. L

AFFONT , P. L

OISEL , AND

Q. V

UONG (1997): “Auctioning andbargaining: An econometric study of timber auctions with secret reservationprices,”

Journal of Business & Economic Statistics , 15(2), 209–220.F

ORGES , F. (1993): “Five legitimate deﬁnitions of correlated equilibrium in gameswith incomplete information,”

Theory and Decision , 35(3), 277–310.(2006): “Correlated equilibrium in games with incomplete informationrevisited,”

Theory and Decision , 61(4), 329–344.F

ORTIN , N., T. L

EMIEUX , AND

S. F

IRPO (2011): “Decomposition Methods in Eco-nomics,” in

Handbook of Labor Economics , vol. 4, pp. 1–102. Elsevier.G

OWRISANKARAN , G., A. N

EVO , AND

R. T

OWN (2015): “Mergers when prices arenegotiated: Evidence from the hospital industry,”

American Economic Review ,105(1), 172–203.G

RIECO , P. L. E. (2014): “Discrete Games with Flexible Information Structures: AnApplication to Local Grocery Markets,”

RAND Journal of Economics , 45, 303–340.G

UALDANI , C.,

AND

S. S

INHA (2020): “Identiﬁcation and inference in discrete choicemodels with imperfect information,”

Working Paper .G UERRE , E., I. P

ERRIGNE , AND

Q. V

UONG (2000): “Optimal nonparametric estima-tion of ﬁrst-price auctions,”

Econometrica , 68(3), 525–574. H AILE , P. A.,

AND

E. T

AMER (2003): “Inference with an Incomplete Model of EnglishAuctions,”

Journal of Political Economy , 111, 1–51.H

ECKMAN , J. J. (2010a): “Building Bridges Between Structural and Program Eval-uation Approaches to Evaluating Policy,”

Journal of Economic Literature , 48, 356–398. (2010b): “Causal Parameters and Policy Analysis in Economists: A Twenti-eth Century Restrospective,”

Quarterly Journal of Economics , 115, 45–97.H

ECKMAN , J. J.,

AND

E. V

YTLACIL (2001): “Policy-relevant treatment effects,”

Amer-ican Economic Review , 91(2), 107–111.H

ORTAC ¸ SU , A., AND

D. M C A DAMS (2010): “Mechanism choice and strategic biddingin divisible good auctions: An empirical analysis of the Turkish treasury auctionmarket,”

Journal of Political Economy , 118(5), 833–865.I

MBENS , G. W.,

AND

W. K. N

EWEY (2009): “Identiﬁcation and Estimation of Tri-angular Simultaneous Equations Models Without Additivity,”

Econometrica , 77,1481–1512.J IA , P. (2008): “What happens when Wal-Mart comes to town: an empirical analy-sis of the discount retailing industry,” Econometrica , 76, 1263–1316.J

UHN , C., K. M. M

URPHY , AND

B. P

IERCE (1993): “Wage inequality and the rise inreturns to skill,”

Journal of Political Economy , 101(3), 410–442.K

ALOUPTSIDI , M., Y. K

ITAMURA , L. L

IMA , AND

E. A. S

OUZA -R ODRIGUES (2020):“Partial Identiﬁcation and Inference for Dynamic Models and Counterfactuals,”Discussion paper, National Bureau of Economic Research.K

ALOUPTSIDI , M., P. T. S

COTT , AND

E. S

OUZA -R ODRIGUES (2017): “On the non-identiﬁcation of counterfactuals in dynamic discrete games,”

International Jour-nal of Industrial Organization , 50, 362–371.(2020): “Identiﬁcation of Counterfactuals in Dynamic Discrete Choice Mod-els,”

Quantitative Economics .K LEVEN , H. (2020): “Sufﬁcient Statistics Revisited,” Discussion Paper 27242, Na-tional Bureau of Economic Research.K

LEVEN , H., C. L

ANDAIS , AND

J. E. S

ØGAARD (2019): “Children and gender in-equality: Evidence from Denmark,”

American Economic Journal: Applied Econom-ics , 11(4), 181–209.K

OCHERLAKOTA , N. R. (2019): “Practical Policy Evaluation,”

Journal of MonetaryEconomics, Forthcoming . K OOPMANS , T. C. (1949): “Identiﬁcation Problems in Economic Model Construc-tion,”

Econometrica , 17, 125–144.L IU , Q. (2015): “Correlation and common priors in games with incomplete infor-mation,” Journal of Economic Theory , 157, 49–75.M

AGNOLFI , L.,

AND

C. R

ONCORONI (2019): “Estimation of Discrete Games withWeak Assumptions on Information,”

Working Paper .M ARSCHAK , J. (1953):

Economic measurements for policy and prediction . CowlesCommission for Research in Economics.M

YERSON , R. B. (1994): “Communication, Correlated Equilibria and IncentiveCompatibility,” in

Handbook of Game Theory with Economic Applications , vol. 2,pp. 827–848. Elsevier, North-Holland.N

ORETS , A.,

AND

X. T

ANG (2014): “Semiparametric inference in dynamic binarychoice models,”

Review of Economic Studies , 81(3), 1229–1262.O

AXACA , R. (1973): “Male-Female Wage Differentials in Urban Labor Markets,”

International Economic Review , pp. 693–709.P

AARSCH , H. J. (1997): “Deriving an estimate of the optimal reserve price: anapplication to British Columbian timber sales,”

Journal of Econometrics , 78(1),333–357.P

ECK , J.,

AND

K. S

HELL (1991): “Market uncertainty: correlated and sunspot equi-libria in imperfectly competitive economies,”

The Review of Economic Studies ,58(5), 1011–1029.R

OBERTS , J. W.,

AND

A. S

WEETING (2013): “When Should Sellers Use Auctions?,”

American Economic Review , 103, 1830–1861.S

HELL , K. (1989): “Sunspot equilibrium,” in

General Equilibrium , pp. 274–280.Springer.S

TANTON , C. T.,

AND

C. T

HOMAS (2016): “Landing the ﬁrst job: The value ofintermediaries in online hiring,”

The Review of Economic Studies , 83(2), 810–854.S

TINCHCOMBE , M. B. (2011): “Correlated Equilibrium Existence for Inﬁnite Gameswith Type-Dependent Strategies,”

Journal of Economic Theory , 146, 638–655.S

YRGKANIS , V., E. T

AMER , AND

J. Z

IANI (2018): “Inference on auctions with weakassumptions on information,”

ArXiv Preprint, arXiv:1710.03830 .X IA , S. (2008): “A Multiple-Index Model and Dimension Reduction,” Journal of theAmerican Statistical Association , 103(484), 1631–1640. O NLINE A PPENDIX TO “A D

ECOMPOSITION A PPROACH TO C OUNTERFACTUAL A NALYSIS IN G AME -T HEORETIC M ODELS ” Nathan Canen and Kyungchul Song

University of Houston and University of British Columbia

This online appendix consists of ﬁve appendices. Appendix A brieﬂy introducesa mathematical environment in which the Bayesian game with general state spaceand signal space is deﬁned. We clarify the meaning of an equilibrium selection rulein this environment. Appendix B provides the mathematical proofs for the resultsin the main text. Appendix C provides details on the implementation of the de-composition method. This includes a step-by-step overview of the nonparametricestimation method used in Section 5, as well as details about the empirical appli-cation (including data collection, empirical speciﬁcations and the implementationof the parametric and nonparametric estimators). Appendices D and E provide ad-ditional details referenced in the paper. Appendix D gives further details on usingdecomposition-based predictions using the control function approach and AppendixE provides additional derivations used in Example 3.1.

Appendix A. Equilibrium Selection Rules

Let us give a mathematical set-up of our general Bayesian game in Section 3.We assume that each of the spaces Y , W and T is a completely separable metricspace, and their Borel σ -ﬁelds, are denoted by B ( Y ) , B ( W ) , and B ( T ) . Considerthe measurable space ( W × T , B ( W × T )) , where B ( W × T ) denotes the Borel σ -ﬁeldof W × T with respect to the product topology. Throughout the paper, we consideronly those transforms f : W → W such that f is Borel measurable and f ( W ) isBorel. We assume throughout the paper that there is a measure ν on B ( W × T ) which dominates all probability measures on B ( W × T ) that are used in this paper.Furthermore, we also assume that all the densities and conditional densities withrespect to ν used in this paper are square integrable. This is more general than the set-up in Section 3. This general set-up is used in the proof ofTheorem 3.4 below. Then µ W in the basic game B of G = ( B, I ) is a probability measure on B ( W ) .Also, µ T | W in the information structure I of game G = ( B, I ) is the conditionaldistribution as a Markov kernel B ( T ) × W → [0 , . (The existence of such a Markovkernel is ensured here, see, e.g. Theorem 5.3 of Kallenberg (1997).) From thiswe can construct the joint distribution µ W,T on B ( W × T ) , so that we have densityfunctions of µ W,T , µ W | T and µ T | W , etc. (with respect to a dominating measureinduced by ν .)Now let the set K be the class of Markov kernels σ ( ·|· , · ) : B ( Y ) × W × T → [0 , such that for each A ∈ B ( Y ) , σ ( A |· , · ) is Borel measurable, and for each ( w, t ) ∈ W × T , σ ( ·| w, t ) is a probability measure on B ( Y ) . We endow K with weak topol-ogy which is the weakest topology that makes the maps σ (cid:55)→ (cid:82) f ⊗ hd ( µ W,T ⊗ σ ) continuous for all f ∈ L ( µ W,T ) and all h ∈ C b ( Y ) , where C b ( Y ) denotes the set ofcontinuous and bounded real maps on Y , (cid:90) f ⊗ hd ( µ W,T ⊗ σ ) = (cid:90) (cid:90) f ( w, t ) h ( y ) dσ ( y | w, t ) dµ W,T ( w, t ) , (47)and L ( µ W,T ) denotes the class of real integrable maps on W × T . (See Deﬁnition2.2 of H¨ausler and Luschgy (2015).) Note that K is a complete separable metricspace with respect to the weak topology. (e.g. Proposition A.25.III of Daley andVere-Jones (2003)).Let Σ ⊂ K be a subset. We call each element σ ∈ Σ a decision rule. Let usdeﬁne B (Σ) to be the Borel σ -ﬁeld of decision rules. An equilibrium selection ruleof game G in our paper is a conditional distribution on the set of Bayes CorrelatedEquilibria given the payoff state. Since each equilibrium is a Markov kernel, thismeans that we need to introduce a conditional probability measure on the space ofMarkov kernels.Consider the following measurable space ( K × W , B ( K × W )) . (48)Let λ be a nonzero ﬁnite Borel measure on B ( K × W ) . Since K is a completeseparable metric space, so is K × W . By Theorem 1.3 of Billingsley (1999), thismeans that λ is Radon. Suppose that ˜ W : K × W → W is a Borel measurable map.Then by the Distintegration Theorem of Chang and Pollard (1997), there exist a set { λ ˜ w } ˜ w ∈ W such that (a) for each ˜ w ∈ W , λ ˜ w is a probability measure on B ( K × W ) such that λ ˜ w ( { ˜ W (cid:54) = ˜ w } ) = 0 , (b) for any A ∈ B ( K × W ) , the map ˜ w (cid:55)→ λ ˜ w ( A ) is B ( W ) -measurable, and (c) for any A ∈ B ( K × W ) , we have λ ( A ) = (cid:90) λ ˜ w ( A ) d ( λ ◦ ˜ W − )( ˜ w ) . (49)For Bayesian games G considered in the paper, we assume that Σ BCE ( G ) is nonemptyand closed with respect to the weak topology. Note that { A ∩ Σ BCE ( G ) : A ∈B ( K ) } = B (Σ BCE ( G )) . Let us deﬁne e G ( ·|· ) : B (Σ BCE ( G )) × W → [0 , , as follows:for each ( A, w ) ∈ B (Σ BCE ( G )) × W , e G ( A | w ) = λ w ( A × W ) λ w (Σ BCE ( G ) × W ) . (50)It is not hard to see that for each w ∈ W , e G ( ·| w ) is a probability measure on Σ BCE ( G ) and for each A ∈ B (Σ BCE ( G )) , e G ( A |· ) is B ( W ) -measurable. This e G is ourequilibrium selection rule on Σ BCE ( G ) . Appendix B. Mathematical Proofs

B.1. Existence of Posteriors in the Counterfactual Game

Suppose that we are given a Bayesian game G = ( B, I ) , where B = ( Y , W , u, µ W ) and I = ( W , T , µ T | W ) . We consider the existence of the posterior in the coun-terfactual game, i.e., the conditional distribution of the transformed state f ( W ) given signal vector T . For this, we ﬁrst construct a joint distribution of ( f ( W ) , T ) in the counterfactual game f ( G ) and then derive the conditional distribution of f ( W ) given T using a disintegration theorem. In deriving the joint distribution of ( f ( W ) , T ) in the counterfactual game, we need to use care.Note that we are not taking the joint distribution µ W,T from the original gameas policy invariant. Instead, we are taking the signal generator, µ T | W , as pol-icy invariant. Hence we ﬁrst construct the joint distribution of ( f ( W ) , T ) using ( µ T | W , µ W ◦ f − ) in the counterfactual game f ( G ) , and then derive the posterior asthe conditional distribution of f ( W ) given T using a disintegration theorem.First, we let for each measurable f : W → W , µ fW = µ W ◦ f − . (51) This is the distribution of the payoff state after the policy. Deﬁne a distribution µ fT on B ( T ) as follows: for B ∈ B ( T ) , µ fT ( B ) = (cid:90) f ( W ) µ T | W ( B | w ) dµ fW ( w ) . (52)Hence the marginal distribution of the signals is affected by the policy in general.Given µ W , µ fW , µ T and µ fT , let us ﬁrst deﬁne posteriors µ W | T ( ·| t ) and µ fW | T ( ·| t ) asfollows: for any Borel B ⊂ T , (cid:90) B µ W | T ( A | t ) dµ T ( t ) = (cid:90) A µ T | W ( B | w ) dµ W ( w ) , for all Borel A ⊂ W (53) (cid:90) B µ fW | T ( A | t ) dµ fT ( t ) = (cid:90) A µ T | W ( B | w ) dµ fW ( w ) , for all Borel A ⊂ f ( W ) . (Note that we use the same µ T | W in both equations which comes from the factthat G and f ( G ) have the same information structure.) Existence and uniquenessof µ W | T ( ·| t ) and µ fW | T ( ·| t ) follow from Disintegration Theorem of Chang and Pollard(1997) in the set-up explained in Appendix A. The posterior µ W | T ( ·| t ) is a probabilitymeasure on W for µ T -a.e. t ’s, and µ fW | T ( ·| t ) is a probability measure on f ( W ) for µ fT -a.e. t ’s. B.2. Policy-Invariance of Equilibria

We ﬁrst introduce the information structure that is more general than the public-private dichotomy of information in Theorem 3.1.

Assumption B.1 (Information Structure with a Public-Private Dichotomy) . Thereexists a signal proﬁle T = ( T , ..., T n ) ∈ T and payoff state W ∈ W such that theconditional distribution of T given W is equal to µ T | W and for each i = 1 , ..., n , wehave T i = ( T i, , T i, ) , and for each j = 1 , ..., n , T j, is measurable with respect to the σ -ﬁeld generated by T i .Assumption B.1 says that the signal component T i, is public information for everyplayer i = 1 , ..., n . Let us assume the environment of Assumption B.1. Let T i = T i, × T i, , where T i, and T i, are the signal spaces which T i, and T i, take values fromrespectively. Suppose that the public component of the signal space is partitionedas T i, = T i, ∪ ( T i, \ T i, ) , (54) for some nonempty subset T i, of T i, . Deﬁne T = × ni =1 ( T i, × T i, ) , and T = × ni =1 (( T i, \ T i, ) × T i, ) . (55)We construct a game whose signal space is restricted to T (cid:96) , (cid:96) = 1 , . For this, deﬁnea Markov kernel µ T | W,(cid:96) : B ( T (cid:96) ) × W → [0 , as follows: for any Borel A ⊂ T (cid:96) , µ T | W,(cid:96) ( A | w ) = µ T | W ( A | w ) µ T | W ( T (cid:96) | w ) . (Following the convention, we take / to be .) This gives the following informa-tion structure: (cid:96) = 1 , , I (cid:96) = ( W , T (cid:96) , µ T | W, ) . (56)The game restricted to the signal space T (cid:96) is given by G (cid:96) = ( B, I (cid:96) ) . (57)By construction, since we assume that Σ BCE ( G ) is nonempty, so is Σ BCE ( G (cid:96) ) , (cid:96) = 1 , .The following result says that the set of BCE’s of G consists precisely of σ | W × T with σ ∈ Σ BCE ( G ) , where σ | W × T is σ restricted to W × T so that for any Borel A ⊂ Y , σ ( A | w, t ) | W × T = σ ( A | w, t ) , for all µ W,T -a.e. ( w, t ) ∈ W × T . Lemma B.1.

Suppose that Assumption B.1 holds. Then, Σ BCE ( G ) | W × T = Σ BCE ( G ) , where Σ BCE ( G ) | W × T = { σ | W × T : σ ∈ Σ BCE } . Proof:

First note that U i ( τ i , t i , σ ) = (cid:90) (cid:90) u i ( τ i ( y i ) , y − i , w ) dσ ( y | w, t ) dµ W,T − i | T i ( w, t − i | t i ) (58) = (cid:90) (cid:90) u i ( τ i ( y i ) , y − i , w ) dσ ( y | w, t − i, , h i ( t i )) dµ W,T − i, | T i ( w, t − i, | t i ) , for some measurable function h i : T i → T − i, by Assumption B.1, where T − i, = × nj =1 ,j (cid:54) = i T j, . Whenever σ ∈ Σ BCE ( G ) , we have for µ T i -a.e. t i , U i ( τ i , t i , σ ) ≤ U i ( Id , t i , σ ) . (59)The inequality is true if we restrict t i = ( t i, , t i, ) so that t i, ∈ T i, . Altering theinformation structure µ T | W into µ T | W, does not change the conditional distribution µ W,T − i, | T i ( ·| t i ) , t i = ( t i, , t i, ) , as long as t i, ∈ T i, , because the signal space for T i, is not affected by this restriction. Therefore, we ﬁnd that Σ BCE ( G ) | W × T ⊂ Σ BCE ( G ) . (60)Now, let us show that Σ BCE ( G ) ⊂ Σ BCE ( G ) | W × T . (61)First, we deﬁne for σ ∈ Σ BCE ( G ) and σ ∈ Σ BCE ( G ) , c ( σ , σ )( ·| w, t ) = (cid:88) (cid:96) =1 σ (cid:96) ( ·| w, t )1 { t ∈ T (cid:96) } . (62)Let (cid:79) (cid:96) =1 Σ BCE ( G (cid:96) ) = { c ( σ , σ ) : σ ∈ Σ BCE ( G ) and Σ BCE ( G ) } . (63)Then by deﬁnition, for (cid:96) (cid:48) = 1 , , (cid:79) (cid:96) =1 Σ BCE ( G (cid:96) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) W × T (cid:96) (cid:48) = Σ BCE ( G (cid:96) (cid:48) ) . (64)Hence it sufﬁces to show for (61) that (cid:79) (cid:96) =1 Σ BCE ( G (cid:96) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) W × T ⊂ Σ BCE ( G ) | W × T . (65)Take c ( σ , σ ) ∈ (cid:78) (cid:96) =1 Σ BCE ( G (cid:96) ) for some σ ∈ Σ BCE ( G ) and σ ∈ Σ BCE ( G ) . Then, U i ( τ i , t i , c ( σ , σ )) = U i ( τ i , t i , σ )1 { t i ∈ T i, } (66) + U i ( τ i , t i , σ )1 { t i ∈ T i, \ T i, }≤ U i ( Id , t i , σ )1 { t i ∈ T i, } + U i ( Id , t i , σ )1 { t i ∈ T i, \ T i, } = U i ( Id , t i , c ( σ , σ )) . The ﬁrst equality follows because the expected payoff U i ( τ i , t i , c ( σ , σ )) does notdepend on t − i, once t i is ﬁxed, as seen from (58). The inequality above followsbecause σ ∈ Σ BCE ( G ) and σ ∈ Σ BCE ( G ) . Hence c ( σ , σ ) ∈ Σ BCE ( G ) . We ﬁndthat (cid:79) (cid:96) =1 Σ BCE ( G (cid:96) ) ⊂ Σ GCE ( G ) . (67) By restricting both sides to W × T , we obtain the inclusion (65). (cid:4) The following theorem is a formal version of Theorem 3.1.

Theorem B.1.

Suppose that a game-policy pair ( G, f ) satisﬁes the conditions below.(a) W = W × ... × W n , and for each i = 1 , ..., n , W i = ˜ W × W i and T i = W i .(b) For each i = 1 , ..., n , there exists a bijection h i : ˜ W → ˜ W such that for any A = × ni =1 A i ∈ B ( T ) , µ T | W ( A | w ) = 1 { h i ( w i ) ∈ A i , i = 1 , ..., n } , for all µ W -a.e. w, (68) where w = ( w i ) ni =1 ∈ W , w i = ( w i , w i ) , and h i ( w i ) = ( h i ( w i ) , w i ) .(c) For any A = × ni =1 A i ∈ B ( T ) , A i = ( A i , A i ) , with A i ⊂ ˜ W and A i ⊂ W i , µ W ( A ) = 0 , (69) whenever A i ’s are not identical across i ’s.(d) The transform f takes the following form f ( w ) = ( ˜ f i ( w i ) , w i ) ni =1 , for some ˜ f i : ˜ W → ˜ W .Then G is strongly transferable to f ( G ) . Proof:

Choose any Borel A ⊂ f ( W ) , and B = B × ... × B n ⊂ T . First, let us showthat(i) µ fW | T ( A | t ) = µ W | T ( A | t ) , for µ fT -a.e. t ∈ T , and(ii) µ fT − i | T i ( B − i | t i ) = µ T − i | T i ( B − i | t i ) , for µ fT i -a.e. t i ∈ T i ,where µ fT i denotes the i -th coordinate of the joint distribution µ fT .Let us show (i) ﬁrst. From Condition (b) and (52), µ fT ( B ) = (cid:90) f ( W ) { h ( w ) ∈ B } d ( µ W ◦ f − )( w ) (70) = ( µ W ◦ f − ) (cid:0) h − ( B ) ∩ f ( W ) (cid:1) , where h ( w ) = ( h i ( w i )) ni =1 and h i ( w i ) = ( h i ( w i ) , w i ) . By (53) (with B = T there),any Markov kernel ˜ µ W | T : B ( f ( W )) × T → [0 , that satisﬁes that for all Borel A ⊂ f ( W ) , (cid:90) T ˜ µ W | T ( A | t ) dµ fT ( t ) = ( µ W ◦ f − )( A ) (71) is a version of the desired posterior µ fW | T in the counterfactual game f ( G ) . Let ustake ˜ µ W | T ( A | t ) = 1 { t ∈ h ( A ) } . (72)Then, we have (cid:90) T ˜ µ W | T ( A | t ) dµ fT ( t ) = µ fT ( h ( A )) = ( µ W ◦ f − )((( h − ◦ h ) A ) ∩ f ( W )) (73) = ( µ W ◦ f − )( A ) . The equalities above use the fact that T = W , f ( W ) ⊂ W = h ( W ) and A ⊂ f ( W ) .Hence (71) is fulﬁlled. It is veriﬁed that ˜ µ W | T chosen in (72) is indeed the desiredposterior µ fW | T . By Condition (b), the conditional distribution of signal T given W = w is degenerate at h ( w ) . Hence the conditional distribution of W given T = t is degenerate at h − ( t ) . This means that µ W | T ( A | t ) = 1 { h − ( t ) ∈ A } (74) = 1 { t ∈ h ( A ) } , for all µ T -a.e. t. Comparing this with (71), we obtain (i) above.Let us turn to (ii). From (53) and (52), similarly as before, any Markov kernel ˜ µ T − i | T i : B ( T − i ) × T i → [0 , is a version of the posterior µ fT − i | T i if and only if for all B = × ni =1 B i ∈ B ( T ) , (cid:90) B i ˜ µ T − i | T i ( B − i | t i ) dµ fT i ( t i ) = µ fT ( B ) . (75)Let W = ( W , ..., W n ) and T = ( T , ..., T n ) be random vectors whose joint distri-bution is given by µ W,T , where µ W,T ( A × B ) = (cid:90) A µ T | W ( B | w ) dµ W ( w ) , (76)for all A × B ∈ B ( W × T ) . By Condition (c), µ W concentrates on a set A suchthat A = .... = A ,n = { a } for some a ∈ ˜ W . Hence we can choose randomvectors W such that W i = ( ˜ W , W i ) for some common random vector ˜ W . Let W , − i = ( W ,j ) j (cid:54) = i . Then, we deﬁne ˜ µ T − i | T i ( B − i | t i ) = µ W , − i | ˜ W ,W i ( B , − i | h − i ( t i ))1 { t i ∈ B j , ∀ j (cid:54) = i } , (77) where t i = ( t i , t i ) and h − i ( t i ) = ( h − i ( t i ) , t i ) , and µ W , − i | ˜ W ,W i represents theconditional distribution of W , − i given ( ˜ W , W i ) . We will show that with this choice,we obtain (75) above. Note that (cid:90) B i ˜ µ T − i | T i ( B − i | t i ) dµ fT i ( t i ) (78) = (cid:90) B i µ W , − i | ˜ W ,W i ( B , − i | h − i ( t i ))1 { t i ∈ B j , ∀ j (cid:54) = i } dµ fT i ( t i )= (cid:90) B i µ W , − i | ˜ W ,W i ( B , − i | h − i ( t i ))1 { t i ∈ B j , ∀ j (cid:54) = i } d ( µ W i ◦ ( f − i ◦ h − i ))( t i ) . We can rewrite the last integral as µ W , − i , ˜ W ,W ,i (cid:32) B , − i × ( ˜ f − i ◦ h − i ) (cid:32) n (cid:92) j =1 B j (cid:33) × B i (cid:33) (79) = ( µ W ◦ ( f − ◦ h − ))( B ) = µ fT ( B ) , where the last equality follows by (70). Therefore, ˜ µ T − i | T i satisﬁes (75) and henceis a version of the posterior µ fT − i | T i in the counterfactual game f ( G ) . On the otherhand, we have (cid:90) µ W , − i | ˜ W ,W i ( B , − i | h − i ( t i ))1 { t i ∈ B j , ∀ j (cid:54) = i } dµ T i ( t i ) = µ T ( B ) . (80)Hence comparing with (79), we obtain (ii).Let us take T i, = ˜ f i ( ˜ W ) . (81)As in (56), we deﬁne I and G = ( B, I ) . Then by (i) and (ii), we have Σ BCE ( G ) = Σ BCE ( f ( G )) , (82)where f ( G ) = ( f ( B ) , I ) . By Lemma B.1, we have Σ BCE ( G ) | f ( W ) × T = Σ BCE ( G ) , and Σ BCE ( f ( G )) | f ( W ) × T = Σ BCE ( f ( G )) . (83)Note that the restriction to f ( W ) above comes from (68) because restricting signal T i, to T i, for each i = 1 , ..., n is equivalent to restricting the payoff state W to f ( W ) . Thus we obtain the desired result. (cid:4) Proof of Theorem 3.2:

Proof of Theorem 3.4:

We follow the arguments in the proof of Theorem 2 ofBM. First, we show necessity. Suppose that I is individually sufﬁcient for I (cid:48) . Thenwe show that for any games G = ( B, I ) and G (cid:48) = ( B, I (cid:48) ) with any basic game B = ( Y , W , u, µ W ) , R BCE ( G ) ⊂ R BCE ( G (cid:48) ) . (85)Fix any such games G and G (cid:48) . Let σ ∈ Σ BCE ( G ) and ρ σ ∈ R BCE ( G ) . Then, for (85),it sufﬁces to ﬁnd σ (cid:48) ∈ Σ BCE ( G (cid:48) ) such that ρ σ = ρ σ (cid:48) . (86)For µ W,T (cid:48) -a.e. ( w, t (cid:48) ) ∈ W × T (cid:48) , deﬁne σ (cid:48) ( ·| w, t (cid:48) ) as: for any Borel A ⊂ Y , σ (cid:48) ( A | w, t (cid:48) ) = (cid:90) T σ ( A | w, t ) dµ T | T (cid:48) ,W ( t | t (cid:48) , w ) . (87) The expected payoff U i ( τ i , t (cid:48) i , σ (cid:48) ) in G (cid:48) is written as (cid:90) (cid:90) Y u i ( τ i ( y i ) , y − i , w ) dσ (cid:48) ( y i , y − i | w, t (cid:48) ) dµ W,T (cid:48)− i | T (cid:48) i ( w, t (cid:48)− i | t (cid:48) i )= (cid:90) (cid:90) Y u i ( τ i ( y i ) , y − i , w ) dσ ( y i , y − i | w, t ) dµ T,T (cid:48)− i ,W | T (cid:48) i ( t, t (cid:48)− i , w | t (cid:48) i ) , using (87). By integrating out T (cid:48)− i in the integral, we rewrite the last integral as (cid:90) (cid:90) Y u i ( τ i ( y i ) , y − i , w ) dσ ( y i , y − i | w, t ) dµ T,W | T (cid:48) i ( t, w | t (cid:48) i )= (cid:90) (cid:90) (cid:90) Y u i ( τ i ( y i ) , y − i , w ) dσ ( y i , y − i | w, t ) dµ T − i ,W | T (cid:48) i ,T i ( t − i , w | t (cid:48) i , t i ) dµ T i | T (cid:48) i ( t i | t (cid:48) i )= (cid:90) (cid:90) (cid:90) Y u i ( τ i ( y i ) , y − i , w ) dσ ( y i , y − i | w, t ) dµ T − i ,W | T i ( t − i , w | t i ) dµ T i | T (cid:48) i ( t i | t (cid:48) i )= (cid:90) T i U i ( τ i , t i ; σ ) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) , where the second equality follows by the individual sufﬁciency of I for I (cid:48) . Since σ ∈ Σ BCE ( G ) , (cid:90) T i U i ( τ i , t i ; σ ) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) ≤ (cid:90) T i U i ( Id , t i ; σ ) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) . (88)Hence we have shown that σ (cid:48) ∈ Σ BCE ( G ) . Note that ρ σ (cid:48) ( A | w ) = (cid:90) σ (cid:48) ( A | w, t (cid:48) ) dµ T (cid:48) | W ( t (cid:48) | w ) (89) = (cid:90) (cid:90) T σ ( A | w, t ) dµ T | T (cid:48) ,W ( t | t (cid:48) , w ) dµ T (cid:48) | W ( t (cid:48) | w )= (cid:90) T σ ( A | w, t ) dµ T | W ( t | w ) = ρ σ ( A | w ) . We have shown (85).Let us show the converse. Suppose that (85) holds for all basic games with thestate space W . We will construct a basic game B such that (85) implies individualsufﬁciency of I for I (cid:48) . First, we set each player i ’s action space as follows: Y i = T i × D i , (90)where D i denotes the space of conditional distributions of ( W, T − i ) given ( T i , T (cid:48) i ) .(We topologize this space as in Appendix A.) Let dχ i ( ·| y i ) be the conditional den-sity of ( W, T − i i ) given Y i = y i ∈ Y i (with respect to the dominating measure ν introduced in Appendix A) deﬁned as follows: dχ i ( ·| y i ) = (cid:40) dµ W,T − i | T i ( w, t − i | y i ) , if y i ∈ T i , and dy i , if y i ∈ D i , (91)where dy i denotes the conditional density of the conditional distribution y i ∈ D i with respect to ν . Let us deﬁne the payoff function as u i ( y i , y − i , w ) = (cid:40) dχ i ( w, y − i | y i ) − (cid:82) W × T − i ( dχ i ( w, t − i | y i )) , if y − i ∈ T − i , and , otherwise.(92)Fixing a distribution µ W on W , we obtain the basic game B = ( Y , W , u, µ W ) . Deﬁne G = ( B, I ) and G (cid:48) = ( B, I (cid:48) ) . Let σ ( ·| w, t ) = δ t ( · ) , the Dirac measure. Then, we willshow that σ ∈ Σ BCE ( G ) . The expected payoff corresponding to σ becomes (cid:90) W × T − i (cid:90) T u i ( y, w ) dσ ( y | w, t ) dµ W,T − i | T i ( w, t − i | t i ) (93) = (cid:90) W × T − i u i ( t, w ) dµ W,T − i | T i ( w, t − i | t i )= (cid:90) W × T − i (cid:0) dµ W,T − i | T i ( w, t − i | t i ) (cid:1) . Now, consider the case where player i with signal t i chooses action y (cid:48) i ∈ T i suchthat y (cid:48) i (cid:54) = t i . The expected payoff in this case is equal to (cid:90) W × T − i u i ( y (cid:48) i , t − i , w ) dµ W,T − i | T i ( w, t − i | t i )= (cid:90) W × T − i (cid:0) dµ W,T − i | T i ( w, t − i | t i ) (cid:1) − (cid:90) W × T − i (cid:0) dµ W,T − i | T i ( w, t − i | y (cid:48) i ) − dµ W,T − i | T i ( w, t − i | t i ) (cid:1) . Similarly, if the player choose y (cid:48) i ∈ D i , then (cid:90) W × T − i u i ( y (cid:48) i , t − i , w ) dµ W,T − i | T i ( w, t − i | t i )= (cid:90) W × T − i (cid:0) dµ W,T − i | T i ( w, t − i | t i ) (cid:1) − (cid:90) W × T − i (cid:0) dy (cid:48) i ( w, t − i | t i ) − dµ W,T − i | T i ( w, t − i | t i ) (cid:1) . If ν is a dominating measure of a probability µ , then dµ − (cid:82) ( dµ ) is a short hand notation for dµ/dν ) − (cid:82) ( dµ/dν ) dν . By making explicit the integral domain, we specify with respect to whichmeasure an integral is taken. By comparing this with (93), we ﬁnd that σ ∈ Σ BCE ( G ) . Hence the reduced forminduced by σ is given by ρ σ , where for any Borel A ⊂ Y , ρ σ ( A | w ) = (cid:90) T σ ( A | w, t ) dµ T | W ( t | w ) = µ T | W ( A ∩ T | w ) . (94)If we deﬁne a conditional distribution µ ET | W as µ ET | W ( A | w ) = µ T | W ( A ∩ T | w ) , then µ ET | W ∈ R BCE ( G ) . By (85), this implies that µ ET | W ∈ R BCE ( G (cid:48) ) . Therefore, thereexists σ (cid:48) ∈ Σ BCE ( G (cid:48) ) such that µ ET | W ( A | w ) = (cid:90) T (cid:48) σ (cid:48) ( A | w, t (cid:48) ) dµ T (cid:48) | W ( t (cid:48) | w ) . (95)Let us deﬁne the joint conditional distribution of ( T, T (cid:48) ) given W as follows: for any A ⊂ T and B ⊂ T (cid:48) , µ T,T (cid:48) | W ( A × B | w ) = (cid:90) B σ (cid:48) ( A | w, t (cid:48) ) dµ T (cid:48) | W ( t (cid:48) | w ) . (96)Note that the support of µ ET | W ( ·| w ) is concentrated in T and on this support µ ET | W = µ T | W . Hence the expected payoff of σ (cid:48) under game G (cid:48) is given by (cid:90) W × T (cid:48)− i (cid:90) T u i ( y, w ) dσ (cid:48) ( y | w, t (cid:48) ) dµ W,T (cid:48)− i | T (cid:48) i ( w, t (cid:48)− i | t (cid:48) i ) (97) = (cid:90) W (cid:90) T (cid:48)− i (cid:90) T u i ( y, w ) dσ (cid:48) ( y | w, t (cid:48) ) dµ T (cid:48)− i | W,T (cid:48) i ( t (cid:48)− i | w, t (cid:48) i ) dµ W | T (cid:48) i ( w | t (cid:48) i )= (cid:90) T i (cid:90) W × T − i u i ( t, w ) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) , where µ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) is the conditional distribution of ( W, T − i | T i , T (cid:48) i ) derivedfrom the joint distribution of ( W, T, T (cid:48) ) . The latter joint distribution is determinedby the pair ( µ T,T (cid:48) | W , µ W ) . Using the deﬁnition of u i ( y, w ) in (92), we rewrite the lastintegral in (97) as (cid:90) T i (cid:90) W × T − i (cid:0) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) (cid:1) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) (98) − (cid:90) T i (cid:90) W × T − i (cid:0) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) − dµ W,T − i | T i ( w, t − i | t i ) (cid:1) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) . The ﬁrst equality is due to Fubini’s Theorem and the assumption that all conditional densities aresquare-integrable. On the other hand if player i takes y (cid:48) i = µ W,T − i | T i ,T (cid:48) i , the expected payoff becomes (cid:90) T i (cid:90) W × T − i u i ( y (cid:48) i , t − i , w ) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) (99) = (cid:90) T i (cid:90) W × T − i (cid:0) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) (cid:1) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) . Since σ (cid:48) ∈ Σ BCE ( G (cid:48) ) , the integral in (98) should be greater than or equal to theintegral in (99). This implies that − (cid:90) T i (cid:90) W × T − i (cid:0) dµ W,T − i | T i ,T (cid:48) i ( w, t − i | t i , t (cid:48) i ) − dµ W,T − i | T i ( w, t − i | t i ) (cid:1) dµ T i | T (cid:48) i ( t i | t (cid:48) i ) ≥ . (100)Hence we ﬁnd that µ W,T − i | T i ,T (cid:48) i ( ·| t i , t (cid:48) i ) = µ W,T − i | T i ( ·| t i ) , a.e.(101)In other words, I is individually sufﬁcient for I (cid:48) . (cid:4) Proof of Proposition 3.1:

Let I (cid:48) be an independent expansion of I with signalvector of each individual player equal to T (cid:48) i = ( T i , S i ) . Then for each i = 1 , ..., n , ( T i , S i ) is conditionally independent of T − i given T i . Hence I is individually sufﬁ-cient for I (cid:48) . On the other hand, for each i = 1 , ..., n , T i is conditionally independentof ( T − i , S − i , W ) given ( T i , S i ) trivially, and hence I (cid:48) is individually sufﬁcient for I .Hence I and I (cid:48) are mutually individually sufﬁcient. The relation of mutual individ-ual sufﬁciency among information structures is reﬂexive and transitive by Theorem3.4. Hence I (cid:48) and I (cid:48)(cid:48) are individually mutually sufﬁcient as well. (cid:4) Lemma B.2.

Suppose that G = ( B, I ) is a game with full information with h . Thenfor each i = 1 , ..., n , there exists a random variable T i taking values in T such that T i = h i ( W ) , µ W -a.e. , where the conditional distribution of T = ( T , ..., T n ) given W is equal to µ T | W . Proof: If G has full information, µ T | W ( ·| w ) gives the point mass of 1 to h ( w ) . Hencewe obtain the desired result. (cid:4) Lemma B.3.

Let I = ( W , T , µ T | W ) and I (cid:48) = ( W , T (cid:48) , µ (cid:48) T | W ) be maximal informationstructures. Then, for any basic game B = ( Y , W , u, µ W ) , R BCE ( G ) = R BCE ( G (cid:48) ) , where G = ( B, I ) and G (cid:48) = ( B, I (cid:48) ) . Proof:

First, we prove the following three claims:

Claim 1: If G (cid:48)(cid:48) has full information, R BCE ( G (cid:48)(cid:48) ) ⊂ R BCE ( G ) for any game G = ( B, I ) . Claim 2:

For any two full information games G and G (cid:48) with the same basic game, R BCE ( G ) = R BCE ( G (cid:48) ) . Claim 3: If G has maximal information and G (cid:48)(cid:48) has full information, R BCE ( G ) ⊂R BCE ( G (cid:48)(cid:48) ) .For any bijection h i : W → T , any random variable is conditionally independentof ( h − i ( W ) , W ) given h i ( W ) (under µ W ) where h − i = ( h j ) j (cid:54) = i . Hence by Lemma B.2,a full information structure is individually sufﬁcient for any information structure.By Theorem 3.4, we obtain Claim 1. Claim 2 immediately follows from Claim 1 bytaking G in Claim 1 as another full information game and using symmetry of thearguments. Let us show Claim 3. Since G = ( B, I ) is maximal information, thereexists a full information game G ∗ = ( B, I ∗ ) for which G is individually sufﬁcient for G ∗ . By Theorem 3.4, R BCE ( G ) ⊂ R BCE ( G ∗ ) . (102)By Claim 2, R BCE ( G ∗ ) = R BCE ( G (cid:48)(cid:48) ) . Hence, we obtain Claim 3.Now let us show Lemma B.3. Let G (cid:48)(cid:48) be any game with full information. Bytaking G in Claim 1 to be G in the lemma, and combining the claim with Claim 3,we obtain that R BCE ( G ) = R BCE ( G (cid:48)(cid:48) ) . Similarly, R BCE ( G (cid:48) ) = R BCE ( G (cid:48)(cid:48) ) . Thus we obtain the lemma. (cid:4) Proof of Proposition 3.2:

The proposition follows immediately from Lemma B.3and Theorem 3.4. (cid:4)

Proof of Lemma 3.1:

Suppose that e G and e f ( G ) are strongly coherent. Wewould like to show that γ G | f ( W ) and γ f ( G ) agree on f ( W ) . Take any Borel B ⊂ R BCE ( G ) | f ( W ) . Then we deﬁne A = (cid:8) σ | f ( W ) : ρ σ | f ( W ) ∈ B (cid:9) . (103)Note that for each w ∈ f ( W ) , γ G | f ( W ) ( B | w ) = e G (cid:0)(cid:8) σ ∈ Σ BCE ( G ) : ρ σ | f ( W ) ∈ B (cid:9) | w (cid:1) (104) = e G (cid:0)(cid:8) σ ∈ Σ BCE ( G ) : σ | f ( W ) ∈ A (cid:9) | w (cid:1) = e f ( G ) ( { σ ∈ Σ BCE ( f ( G )) : σ ∈ A } | w ) , where the last equality follows because Σ BCE ( G ) | f ( W ) = Σ BCE ( f ( G )) and e G and e f ( G ) are strongly coherent. If we let B (cid:48) = { ρ σ ∈ R BCE ( f ( G )) : σ ∈ A } , we can rewrite the last term in (104) as γ f ( G ) ( B (cid:48) | w ) . By the strongly transferabilityof G to f ( G ) , we have R BCE ( f ( G )) = R BCE ( G ) | f ( W ) . Hence we can rewrite B (cid:48) = (cid:8) ρ σ ∈ R BCE ( G ) | f ( W ) : σ ∈ A (cid:9) = B, where the last equality follows by (103). This yields that γ f ( G ) ( B (cid:48) | w ) = γ f ( G ) ( B | w ) .Thus we have shown that γ f ( G ) agrees with γ G | f ( W ) on f ( W ) . (cid:4) Proof of Theorem 3.6:

Take any Borel A ⊂ Y and w ∈ W . We write ρ G ( A | f ( w )) = (cid:90) Σ BCE ( G ) ρ σ ( A | f ( w )) de G ( σ | f ( w ))= (cid:90) Σ BCE ( G ) | f ( W ) ρ σ | f ( W ) ( A | f ( w )) de G | f ( W ) ( σ | f ( w )) , because ρ σ ( A | f ( w )) is equal to ρ σ | f ( W ) ( A | w (cid:48) ) with w (cid:48) = f ( w ) . By the deﬁnition of γ G | f ( W ) , the last integral is equal to (cid:90) R BCE ( G ) | f ( W ) ρ ( A | f ( w )) dγ G | f ( W ) ( ρ | f ( w ))= (cid:90) R BCE ( f ( G (cid:48) )) ρ ( A | f ( w )) dγ G | f ( W ) ( ρ | f ( w )) (by weak transferability) . Since we assume that Σ BCE ( G ) is closed in the weak topology, it follows that R BCE ( G ) is alsoclosed in the weak topology and so is its restriction to f ( W ) ⊂ W as long as f ( W ) is Borel. We haveassumed the latter thoughout the paper. By the weak coherence condition, the last integral is equal to (cid:90) R BCE ( f ( G (cid:48) )) ρ ( A | f ( w )) dγ f ( G (cid:48) ) ( ρ | f ( w )) (105) = (cid:90) Σ BCE ( f ( G (cid:48) )) ρ σ ( A | f ( w )) de f ( G (cid:48) ) ( σ | f ( w )) = ρ f ( G (cid:48) ) ( A | f ( w )) . Hence we obtain the desired result. (cid:4)

B.4. Counterfactual Analysis with Extrapolation

Lemma B.4.

Suppose that G and G (cid:48) are agreeable and G is strongly transferable to f ( G (cid:48) ) for some f : W → W (cid:48) . Then, for any Borel A ⊂ Y , and for almost every w ∈ f − ( f ( W ) ∩ W ) , ρ G ( A | w ) = ρ f ( G (cid:48) ) ( A | w ) . (106) Proof:

First, we write (cid:90) f ( W ) ρ f ( G (cid:48) ) ( A | w ) d ( µ W ◦ f − )( w ) = (cid:90) f ( W ) ∩ W ρ f ( G (cid:48) ) ( A | w ) d ( µ W ◦ f − )( w ) (107) + (cid:90) f ( W ) \ W ρ f ( G (cid:48) ) ( A | w ) d ( µ W ◦ f − )( w ) . Certainly, the last term is bounded by ( µ W ◦ f − ) ( f ( W ) \ W ) = µ W (cid:0) f − ( f ( W ) \ W ) (cid:1) . (108) We focus on the leading term which we rewrite as (cid:90) f − ( f ( W ) ∩ W ) ρ f ( G (cid:48) ) ( A | f ( w )) dµ W ( w ) , (109)using change of variables. By Lemma B.4 and change of variables, we can rewritethe last integral as (cid:90) f ( W ) ∩ W ρ G ( A | w ) d ( µ W ◦ f − )( w ) . (110)Thus we obtain the desired result. (cid:4) Appendix C. Implementation of a Decomposition Method

C.1. Implementation of a Nonparametric Decomposition Method

Decomposition-based predictions can be generated using various nonparametricestimation methods. For illustration, we explain a way to construct such predictionsusing a kernel estimation method. Let X = [ X , X ] , where X takes values from aﬁnite set X and X ∈ R d is a vector of continuous variables. Let the observed actionproﬁle of an n -player Bayesian game be denoted by Y = ( Y , ..., Y n ) (cid:48) . The averageeffect of the policy f based on the decomposition-based prediction is given by ∆ = E [ g ( f ( X ))] − E [ Y ] , (111)where g ( x ) = E [ Y | X = x ] , (112)which is the expected action proﬁle of the players in the game.Suppose that we observe i.i.d. draws ( Y m , X m ) , m = 1 , ..., M , from the jointdistribution of ( Y, X ) , where X m = [ X m, , X m, ] , m = 1 , ...., M . Thus the samplesize is M . To ﬁnd an estimator of ∆ , we ﬁrst construct a leave-one-out kernelestimator of g ( x ) for each m = 1 , ..., M as ˆ g m,h ( x ) = (cid:80) Mk =1 ,k (cid:54) = m Y k { X k, = x } K h ( X k, − x ) (cid:80) Mk =1 ,k (cid:54) = m { X k, = x } K h ( X k, − x ) , x = ( x , x ) , (113) where K h ( x ) = K ( x /h ) ...K ( x d /h ) /h d and K is a univariate kernel and h is abandwidth. For an estimator of ∆ , we consider the following: ˆ∆ = 1 M M (cid:88) m =1 ˆ g m,h ( f ( X m )) − Y . (114)For bandwidths, we use cross-validation for each player i = 1 , ..., n . Deﬁne CV i ( h ) = 1 M M (cid:88) m =1 ( Y i,m − ˆ g i,m,h ( X m )) , (115)where ˆ g i,m,h ( x ) denotes the i -th component of ˆ g m,h ( x ) . We choose a minimizer of CV i ( h ) over a range of h ’s as our bandwidth.For the bootstrap standard errors, we ﬁrst draw ( Y ∗ m,b , X ∗ m,b ) Mm =1 , b = 1 , ..., B , i.i.d.from the empirical distribution of ( Y m , X m ) Mm =1 , where B is the bootstrap number.(We take B = 999 in our empirical application.) Using the bootstrap sample, weobtain ˆ g ∗ m,h ( x ) (where h is the bandwidth chosen from the cross-validation). Deﬁne ˆ∆ ∗ i,b = 1 M M (cid:88) m =1 ˆ g i,m,h ( f ( X ∗ m,b )) − Y ∗ i,b , (116)where Y ∗ i,b is the sample mean of { Y ∗ i,m,b } Mm =1 , and Y ∗ i,m,b denotes the i -th componentof Y ∗ m,b . The bootstrap standard error for ˆ∆ i , the i -th component of ˆ∆ , is given asfollows: (cid:118)(cid:117)(cid:117)(cid:116) B B (cid:88) b =1 (cid:32) ˆ∆ ∗ i,b − B B (cid:88) b =1 ˆ∆ ∗ i,b (cid:33) . (117) C.2. Details on the Empirical Application

In this empirical application, each game m corresponds to a market m deﬁnedby a route (airport ‘A’ to airport ‘B’). Regarding the additional data collection forthe out-of-sample comparison, we used the same data sources to collect 2015 dataas the authors did for their original (pre-2009) dataset. In particular, we collectedall 2015 trimester data from the DB1B Market and Ticket Origin and DestinationDataset of the U.S. Department of Transportation and then aggregated them to theyearly level. We also followed the same data cleaning choices: most notably, weset entry equal to 0 for markets with less than 20 passengers, and dropped anymarkets with 6 or more market coupons - see Section S.1 in the Supplemental Note of Ciliberto and Tamer (2009). Then, we merged the 2015 data to the same 2742markets considered in their work.The choice of covariates in our application follows Ciliberto and Tamer (2009)pages 1808-1811. Recall that, for ﬁrm i , their covariates X i,m are given by (1) i ’s market presence in m (the share of markets served by airline i among all ofthose starting from m ’s endpoints), (2) m ’s “cost” (approximated by the differencebetween the origin/destination locations to the closest hub of ﬁrm i relative to thenonstop distance of this route), (3) whether market m is affected by the WrightAmendment, (4) Dallas market (whether this route includes an airport in Dallas),(5) market size (the geometric mean of population size at the route’s endpoints),(6) average per capita income (average income of the cities at market endpoints)(7) income growth rate (similarly deﬁned), (8) market distance (nonstop distancebetween the airports deﬁning this market), (9) distance to the center of the U.S.(sum of the distances from each endpoint of the route to the geographical centerof the U.S.), and (10) close airport (minimum of the distances from each airportto the closest alternative a passenger could use). Covariates (1)-(2) vary at theﬁrm-market level, while covariates (3)-(10) vary only at the market level.Our interest is in changing the distribution of the Wright Amendment variable,keeping the others constant. The role of the remaining variables is to control for avariety of confounders (costs, returns from serving those markets, outside optionsto consumers, geographical characteristics).Since the conditional expectation in (112) must include all X j,m for every j , weare left with possibly 8 (market speciﬁc) plus × (ﬁrm level) covariates to beincluded in the analysis. This is a large number for nonparametric estimation,although our linear results are robust to using the full set. As described in themain text, we perform a dimension reduction to facilitate nonparametric estima-tion. First, we note that 3 of those 16 variables (the binary Dallas market, marketsize, and income growth rate - numbered (4), (5), (7) above) were highly collinearor did not provide enough variation for nonparametric estimation. We droppedthem from our analysis. To see this, ﬁrst note that every market affected by theWright Amendment was already in Dallas. There are only 110 observations (out of2742) with different values for the Dallas market variable and the Wright amend-ment one. However, these 110 observations refer to markets using Dallas Fort WorthAirport, the hub for American Airlines. As a result, this variation is already captured by measures of market income and market presence. Regarding market size and in-come growth rate, they appear to be nonlinearly predicted from market incomeand market presence. Intuitively, the latter variables already (nonlinearly) reﬂecteconomic prosperity of those markets. Finally, we make use of a causal structureargument: we do not include the proxy for a ﬁrm’s cost (covariate (2) above) be-cause it is a function of market distance and market presence by construction. As aresult, we use the remaining 9 covariates in the speciﬁcations.Let us now describe the implementation of our estimators. Denote the set ofmarkets affected by the Wright amendment in the original dataset by M , which hascardinality m (cid:48) = 81 . Our counterfactual covariates are denoted by ˜ X i,m . They areequal to X i,m for all ﬁrms and markets, except for the Wright amendment variable,which is set to 0 for all markets m ∈ M and for all ﬁrms.For the linear estimator, we simply parametrize E [ Y i,m | X m = x ] = x (cid:48) γ i and esti-mate this model on the full dataset using Ordinary Least Squares. We then use therecovered estimate ˆ γ i with the counterfactual covariates, ˜ X i,m . Our linear estimatorof ∆ i for ﬁrm i is then given by: ˆ∆ i = 1 m (cid:48) (cid:88) m ∈M ˜ X (cid:48) i,m ˆ γ i − Y i , (118)where Y i is the average of Y i,m ’s over the m (cid:48) markets in M .For the nonparametric case, we follow the previous section in estimating E [ Y i,m | X m = x ] = g ( x ) and use a quartic kernel for continuous covariates. Standard errors forboth estimators are computed by the bootstrap approach outlined in the previoussection. Appendix D. Details on the Control Function Approach

Suppose that X and ε are permitted to be correlated, but that there is a variable V called a control function. As we saw in Section 4.1, we have (cid:90) ρ G ( A | f ( w )) dµ W ( w ) = (cid:90) (cid:90) P { Y ∈ A | ( X, V ) = ( ˜ f ( x ) , v ) } dµ X,V ( x, v ) . (119) Including one of these three variables generated numerical instability in nonparametric estimation. To obtain the control function V from data, we can follow Imbens and Newey(2009) and introduce an additional causal structure on the payoff state vector whichrelates X to an instrumental variable Z as follows:(a) X is generated from the following reduced form X = h ( Z, η ) , where Z is observed by the econometrician, and h is strictly monotonic in η and η is a scalar variable.(b) Z is independent of ( ε, η ) .This additional structure does not conﬂict with the generic speciﬁcation of aBayesian game, because assumptions (a) and (b) only introduce further structureon the payoff state vector W . Theorem 1 in Imbens and Newey (2009) shows thatwe can use V = F X | Z ( X | Z ) as a control function, where F X | Z is the conditional CDF of X given Z . Belowwe provide an applied example of the control function approach from consumerdemand. Example D.1 (Consumer Demand) . Consider the following workhorse model ofconsumer demand with strategic competition among ﬁrms, with notation adaptedfrom Petrin and Train (2010). Household i has preferences over ﬁrm j ’s productgiven by: U i,j = V ( p j , x i,j , β i ) + ξ j + ε i,j , (120)where x i,j are household-product characteristics, β i are parameters, p j is the equi-librium price for j , and ξ j , ε i,j are (unobserved) demand shocks, with ξ j correlatedwith p j .In the supply side, there are J ≥ ﬁrms who compete in prices, each one provid-ing a heterogeneous product. Nash equilibria of prices for ﬁrm j are given by thefunction p j = W ( z j , γ, ν j , where γ are parameters to be estimated, W ( · ) is strictlymonotonic in the (unobserved) scalar random variable ν j , and z j is independent of ( ξ j , ν j ) and observed by the econometrician. This set-up nests the cases of mo-nopoly pricing and marginal cost pricing, as well as equations that are separatelyadditive in ν j - see Petrin and Train (2010), p.6.Let Y i,j = 1 { U i,j > max k (cid:54) = j U i,k } denote i ’s choice of consuming product j . Coun-terfactuals in this setting include the effect of changing a product or a householdcharacteristic to ˜ f ( x i,j ) (e.g. improving a car’s horsepower, or increasing a house-hold’s income) on the average Y i,j or on market shares. Using z j as an instrumentalvariable, the model is identiﬁed and can be estimated using the generalized con-trol function approach in Berry and Haile (2014). Alternatively, one could use aparametric speciﬁcation and derive a control function, as done by Petrin and Train(2010). Appendix E. Derivations for Expressions in Example 3.1

E.1. Characterization of the Threshold ¯ w The BCE studied in Example 3.1 was deﬁned by a mediator that suggested “Entry”to ﬁrm i if W i > ¯ w and “No Entry” if W i ≤ ¯ w , where the (symmetric) threshold ¯ w was such that ﬁrm i is indifferent between entering and not entering the market.This is the same BCE studied in Magnolﬁ and Roncoroni (2019). However, wecannot use the same threshold ¯ w from their example because we assume that W i follows a standard normal distribution instead of a uniform distribution. To makesure this equilibrium is well deﬁned, we now show that ¯ w in our setting exists andthat it is unique.The thresholds { ¯ w i } i =1 , that equalize the expected utility from entering the mar-ket to that from not entering for ﬁrm i are the solutions to: ¯ w + δ (1 − Φ( ¯ w )) = 0 , and ¯ w + δ (1 − Φ( ¯ w )) = 0 , Many such instruments have been proposed in the literature, including the prices of non- j prod-ucts, the characteristics of non- j products and the characteristics of nearby markets - see Berry andHaile (2014) for further discussions. They assume that ( ξ j , ν j ) are jointly normally distributed and i.i.d. over j . In this case, one canderive a linear control function from E ( ξ j | ν j ) = λν j , where λ is a parameter to be estimated. We use a normal distribution assumption because it allows for our counterfactual f ( W i ) = α + W i to have the same support as W i , and to also follow a normal distribution. where Φ is the CDF of N (0 , . As we seek a symmetric equilibrium, let us guess fora symmetric threshold ¯ w = ¯ w = ¯ w such that ¯ w + δ (1 − Φ( ¯ w )) = 0 . The function h ( w ) = w + δ (1 − Φ( w )) is continuous in w , negative for w < (since δ < ) andpositive for w > − δ . A solution ¯ w such that h ( ¯ w ) = 0 exists by the IntermediateValue Theorem. This threshold is unique because h ( · ) is strictly increasing, since h (cid:48) ( w ) = 1 − δφ ( w ) > , as δ < and φ ( · ) > , where φ is the density function of N (0 , . E.2. Proof that the BCE in Example 3.1 is Not Incentive Compatiblein the Counterfactual Game

Consider the BCE recommendation of the original game whereby i enters if andonly if W i + α > ¯ w . The expected utility for i in the counterfactual game fromfollowing this strategy, conditional on the competitor also doing so, is given by: { W i + α > ¯ w } ( δµ W ( { W − i + α > ¯ w } ) + W i + α )= 1 { W i + α > ¯ w } ( δ (1 − Φ( ¯ w − α )) + W i + α ) . We now show that there exists a proﬁtable deviation from this recommendationfor i in the counterfactual game. Consider the following possible deviation from therecommendation for ﬁrm i : “Enter” if W i + α > ¯ w + ε , where ε is a constant suchthat < ε < − δ (1 − Φ( ¯ w − α )) − ¯ w . We note that such an ε exists because: w + δ (1 − Φ( ¯ w )) > ¯ w + δ (1 − Φ( ¯ w − α )) , where the inequality comes from ˜ h ( w, a ) = w + δ (1 − Φ( ¯ w − a )) being strictly de-creasing in the second argument, and α > .The difference in expected utilities from deviating relative to following the rec-ommendation is given by: (1 { W i + α > ¯ w + ε } − { W i + α > ¯ w } ) ( δ (1 − Φ( ¯ w − α )) + W i + α ) . (121)This difference is equal to 0 except for the event { ¯ w < W i + α < ¯ w + ε } , in which casethe ﬁrst term of (121) is equal to − . Let us focus on this subset. Then, deviating tothe alternative strategy is a proﬁtable deviation as long as δ (1 − Φ( ¯ w − α ))+ W i + α < . This is true if and only if W i + α < − δ (1 − Φ( ¯ w − α )) . (122) However, by the assumption on ε above, we have that ¯ w + ε < ¯ w − δ (1 − Φ( ¯ w − α )) − ¯ w = − δ (1 − Φ( ¯ w − α )) . (123)But since we are conditioning on the subset where ¯ w < W i + α < ¯ w + ε , usingequation (123) we ﬁnd that W i + α < ¯ w + ε < − δ (1 − Φ( ¯ w − α )) . (124)This is exactly the expression required for there to be a proﬁtable deviation forplayer i , see equation (122). It follows that the decomposition-based prediction in(28) based on this strategy cannot be a BCE in the counterfactual game. References B ERRY , S. T.,

AND

P. A. H

AILE (2014): “Identiﬁcation in differentiated productsmarkets using market level data,”

Econometrica , 82(5), 1749–1797.B

ILLINGSLEY , P. (1999):

Convergence of Probability Measures . John Wiley & Sons,Inc, New York.C

HANG , J. T.,

AND

D. P

OLLARD (1997): “Conditioning as Disintegration,”

StatisticaNeerlandica , 51, 287–317.C

ILIBERTO , F.,

AND

E. T

AMER (2009): “Market Structure and Multiple Equilibria inAirline Markets,”

Econometrica , 77, 1791–1828.D

ALEY , D. J.,

AND

D. V

ERE -J ONES (2003):

An Introduction to the Theory of PointProcesses . Springer, New York.H¨

AUSLER , E.,

AND

H. L

USCHGY (2015):

Stable Convergence and Stable Limit Theo-rems . Spriger, New York.I

MBENS , G. W.,

AND

W. K. N

EWEY (2009): “Identiﬁcation and Estimation of Tri-angular Simultaneous Equations Models Without Additivity,”

Econometrica , 77,1481–1512.K

ALLENBERG , O. (1997):

Foundations of Modern Probability . Springer-Verlag.M

AGNOLFI , L.,

AND

C. R

ONCORONI (2019): “Estimation of Discrete Games withWeak Assumptions on Information,”

Working Paper .P ETRIN , A.,

AND

K. T

RAIN (2010): “A control function approach to endogeneity inconsumer choice models,”