A Pairwise Strategic Network Formation Model with Group Heterogeneity: With an Application to International Travel
AA Pairwise Strategic Network Formation Model with GroupHeterogeneity: With an Application to International Travel
Tadao Hoshino ∗ February 5, 2021
Abstract
In this study, we consider a pairwise network formation model in which each dyad of agents strategi-cally determines the link status between them. Our model allows the agents to have unobserved groupheterogeneity in the propensity of link formation. For the model estimation, we propose a three-stepmaximum likelihood (ML) method. First, we obtain consistent estimates for the heterogeneity parametersat individual level using the ML estimator. Second, we estimate the latent group structure using the binarysegmentation algorithm based on the results obtained from the first step. Finally, based on the estimatedgroup membership, we re-execute the ML estimation. Under certain regularity conditions, we show thatthe proposed estimator is asymptotically unbiased and distributed as normal at the parametric rate. As anempirical illustration, we focus on the network data of international visa-free travels. The results indicatethe presence of significant strategic complementarity and a certain level of degree heterogeneity in thenetwork formation behavior.
Keywords : binary game; binary segmentation; degree heterogeneity; latent group structure; networkformation. ∗ School of Political Science and Economics, Waseda University, 1-6-1 Nishi-waseda, Shinjuku-ku, Tokyo 169-8050, Japan.Email: [email protected]. This work is supported financially by JSPS Grant-in-Aid for Scientific Research C-20K01597. a r X i v : . [ ec on . E M ] F e b Introduction
Empirical modeling of network formation is an important research topic that has been studied for severaldecades. While most of these models has been developed in the mathematical statistics literature, as theimportance of network structure in many economic activities has been increasingly recognized, there iscurrently a growing number of econometric studies that focus on network formation in conjunction with thesignificant advancement in the related econometric techniques. Econometric studies on network formation can be classified into two types: those that attempt toexplicitly incorporate the interaction of individuals in the realizing network structure endogenously affectingthe network formation behavior (e.g., Leung, 2015; Mele, 2017; Sheng, 2020) and those that do not accountfor such simultaneous interactions but emphasize modeling a flexible form of individual heterogeneity (e.g.,Graham, 2017; Jochmans, 2018; Dzemski, 2019). For the former type, network formation is modeled asa game in which agents strategically form links to maximize their payoffs. Although this game-theoreticapproach is (economic) theoretically well-underpinned, we often encounter serious analytical difficulties dueto the presence of multiple equilibria. To circumvent these difficulties, we typically need to introduce some adhoc behavioral assumptions into the network formation process, or we simply discontinue point-identifyingthe models and resort to partial identification. Compared with the former, the latter is more “descriptive”than “structural”, but has great flexibility in the model specification. These models are relatively easy toimplement and, thus, are appealing to empirical researchers. However, they are not suitable for analyzingthe interactions of agents in network formation, which should be an essential factor in economic and socialnetwork data.These two types of econometric models have their own advantages. Hence, it is an ingenuous idea toconstruct a new model that has the advantages of both approaches by combining them. However, to mybest knowledge, there are only a few papers that address this way of model extension (e.g., Graham, 2016;Graham and Pelican, 2020; Pelican and Graham, 2020). Graham (2016) gets around the multiple equilibriaissue by considering the “dynamic” (rather than the instantaneous) interdependencies in network formation.The latter two papers consider a quite general framework that incorporates both a general form of strategicinteraction and unobserved degree heterogeneity into a single model. However, they mainly focus on testingthe presence of interactions, but not on the estimation of the models.In this paper, we propose a new “pairwise” network formation model that is empirically tractable whileretaining the nice properties of the above-mentioned approaches. More specifically, we assume that eachlink connection is determined by the strategic interaction solely between the corresponding dyad of agents,without affecting or being affected by other dyads, rather than regarding the realized network as a consequenceof a large n -player game. Although ignoring such network externalities limits the range of applications ofour model, it would still cover a fairly large number of interesting empirical situations, and, most importantly, For recent developments regarding econometric approaches for analyzing network formation, we refer readers to, for example,Chandrasekhar (2016) and de Paula (2020). et al. (2019), wetreat the agent-specific preference heterogeneity parameters as the fixed-effect parameters to be estimated.However, note that if we include these heterogeneity parameters directly into the model, as the dimensionof the parameters increases proportionally to the sample size, our ML estimator suffers from the incidentalparameter problem. As will be confirmed in the numerical simulations reported later, the incidental parameterbias can be severe. Thus, to avoid the incidental parameter problem, as a second novel part of this study, wefocus on a situation where the individuals can be classified into several unknown groups and their specificeffects are homogeneous within each group. If the number of the latent groups is fixed, we can expect thatthe ML estimator becomes asymptotically unbiased at the parametric rate and hence the standard inferenceprocedure can be applied.In the literature on statistical network data analysis, uncovering such latent group structures in networkshas been intensively studied (e.g., Newman and Girvan, 2004; Bickel and Chen, 2009; Fortunato, 2010;Karrer and Newman, 2011; Rohe et al. , 2011; Abbe, 2017). Indeed, putting the strategic interaction effectaside, our network formation model can be regarded as a type of the stochastic block model , one of the majormodeling approaches in the above literature, with additional covariates, as in Zhang et al. (2019). In paneldata analysis also, identification of unobserved grouped heterogeneity is one of the most active researchareas (e.g., Bonhomme and Manresa, 2015; Ke et al. , 2016; Su et al. , 2016; Wang et al. , 2018). Althoughthe application of these grouping methods to econometric network models has been relatively limited thusfar, it is a promising approach, as discussed in Bonhomme (2020). If we can recover the true group structurewith probability approaching one (w.p.a.1) by using any method, the individual fixed-effect parameters canbe estimated at a faster rate than the case of no grouping. In this study, among several alternative methods,we adopt the binary segmentation (BS) method (see, e.g., Bai, 1997; Ke et al. , 2016; Lian et al. , 2019; Wangand Su, 2020). Compared to the other grouping methods, the BS method has several favorable propertiesincluding fast computation speed and robustness as it does not require us to set initial values.The whole estimation procedure is divided into three steps. The first step is to obtain the ML estimatorwithout considering the latent group structures. Although this estimator suffers from the incidental parameterbias, it is still possible to produce consistent estimates for the heterogeneity parameters. In the second step,we apply the BS method with respect to the estimated sender effect parameters and the receiver effectparameters separately to identify each agent’s group memberships. In the third step, we re-estimate themodel using the ML method given the estimated group structure. Under certain regularity conditions, weshow that the proposed estimator is asymptotically unbiased and normal at the parametric rate. Furthermore,3he estimator is asymptotically equivalent to the “oracle” estimator that is obtained based on the (unknown)true group memberships.To illustrate our model empirically, we investigate the formation of international visa-free travel networks,where the dependent variable of interest is defined as follows: g i,j = 1 if country i allows the citizens incountry j to visit i without visas and g i,j = 0 if not. We apply our model framework to the network of 57countries selected mainly from Asia, the Middle East, the former USSR, and Oceania. As expected, we findthe presence of a certain level of degree heterogeneity in terms of both the sender and the receiver effects.Interestingly, there seems to be a negative correlation between the sender effects and the receiver effects (inother words, there is a tendency that a country’s sender effect increases as its receiver effect decreases). Ourestimation result also suggests that there is a significant strategic complementarity in the network formationbehavior. Another interesting finding is that the countries are homophilous – tend to connect with similarothers – in terms of the political system. These findings would highlight the usefulness of the proposedmodel and method. Organization of the paper:
The remainder of the paper is organized as follows. In Section 2, weformally introduce the model investigated in this study. In this section, we demonstrate that our pairwisemodel exhibits multiple equilibria and discuss the conditions under which the model can be point-identified.Section 3 provides a detailed explanation about our three-step ML estimator. We also investigate theasymptotic properties of the proposed estimator in this section. In Section 4, we present a set of MonteCarlo experiments to evaluate the finite sample performance of the proposed estimator. Section 5 presentsour empirical analysis, and, finally, Section 6 concludes. All the technical details are relegated to Appendix.
Notation:
For a natural number n , I n denotes an n × n identity matrix. {·} denotes the indicator function,which is one if its argument is true and zero otherwise. For a matrix A , we use || A || to denote its Frobeniusnorm: || A || = (cid:112) tr { AA (cid:62) } , where tr {·} is a trace of a matrix. When A is a square matrix, we use λ min ( A ) to denote its smallest eigenvalue. For a vector a = ( a , . . . , a k ) (cid:62) , || a || ∞ denotes its maximum norm: || a || ∞ = max ≤ i ≤ k | a i | . For a general set X , we use X int to denote its interior. In addition, |X | denotes thecardinality of X . c (possibly with subscript) denotes a generic positive constant whose exact value may varyper case. Suppose that we have a sample of n agents that form social networks whose connections are represented byan n × n adjacency matrix G n = ( g i,j ) ≤ i,j ≤ n . These agents can be individuals, firms, municipalities, ornations depending on the context. The network is directed; that is, regardless of the value of g j,i , we observe g i,j = 1 if agent i links to j and g i,j = 0 otherwise. There are no self-loops; that is, the diagonal elements4f G n are all zero. Throughout the paper, we assume that the status of ( g i,j , g j,i ) is determined solely by thepair of agents ( i, j ) , without considering the status of other network links. Specifically, for each pair ( i, j ) ,suppose that i ’s marginal payoff of forming a link with j given g j,i = q is written as u i,j ( q ) = Z (cid:62) i,j β + α q + A ,i + B ,j − (cid:15) i,j , for i (cid:54) = j. Here, Z i,j ∈ R d z is a vector of observed covariates, A ,i ∈ R is agent i ’s individual specific effect as a“sender”, B ,j ∈ R is j ’s individual specific effect as a “receiver”, (cid:15) i,j ∈ R is an unobservable payoffcomponent, and β ∈ R d z and α ∈ R are unknown coefficient vector and the interaction effect parameter,respectively. The covariates Z i,j and Z j,i may contain common elements; however, in the later discussion,we require that they must have some agent specific elements that can vary across the partners. The individualspecific effects A ,i and B ,j , which we call the sender and the receiver effect, respectively, can be interpretedas the level of i ’s willingness to create connections with others and the popularity of j , respectively, thatgenerate degree heterogeneity across the agents. Following Dzemski (2019) and Yan et al. (2019), we treat { ( A ,i , B ,i ) } as fixed-effect parameters to be estimated.We assume that the agents have complete information; that is, the realizations of ( Z i,j , Z j,i ) and ( (cid:15) i,j , (cid:15) j,i ) are common knowledge to both i and j . Then, if we assume that the observed network G n is formed by acollection of Nash equilibrium actions, we obtain the following econometric model: g i,j = (cid:110) Z (cid:62) i,j β + α g j,i + A ,i + B ,j ≥ (cid:15) i,j (cid:111) g j,i = (cid:110) Z (cid:62) j,i β + α g i,j + A ,j + B ,i ≥ (cid:15) i,j (cid:111) , for i (cid:54) = j. (2.1)The following are the two examples to which the above framework can be potentially applied. Example 2.1 (Online social networking) . The analysis of online social networking behavior is an activeresearch topic in network science. For some social networking sites, users can easily establish links to others(become a follower ) without mutual consent. In addition, whether a person becomes a follower of someoneis often irrelevant to who else they are following. Thus, this would be a situation where our frameworkreasonably fits.
Example 2.2 (International visa-free network) . In the research on international migration and tourism,investigating the determinants and impacts of visa policies is one of the central interests (e.g., Neiman andSwagel, 2009; Neumayer, 2010; McKay and Tekleselassie, 2018). As bilateral visa policies is naturallyobserved as a consequence of strategic (economic and/or political) interactions between the two countries,our model would be an appropriate analytical tool here. In our empirical study presented in Section 5, bysetting g i,j = 1 if country i allows visa-free entry for the citizens of country j and g i,j = 0 if not, we willshow that the magnitude of the bilateral interaction in visa policies is significant.In these examples, we can naturally imagine that the strategic interaction effect α is positive (i.e.,strategic complements). We assume that strategic complementarity would be reasonable for most empirical5ituations of network formation games. Then, throughout the paper, we impose this assumption: α > .Under strategic complementarity, each pair’s Nash equilibrium action can be summarized in Figure 2.1. Asshown in the figure, the space of ( (cid:15) i,j , (cid:15) j,i ) cannot be partitioned into non-overlapping regions associatedwith the four alternative realizations of ( g i,j , g j,i ) . That is, both ( g i,j , g j,i ) = (1 , and ( g i,j , g j,i ) = (0 , can occur in the shaded area in the figure, and the link status is not uniquely determined in this area (i.e.,multiple equilibria exist). This non-uniqueness of model-consistent decisions is called incompleteness andhas been extensively studied in the literature on simultaneous equation models for discrete outcomes (e.g.,Tamer, 2003; Lewbel, 2007; Ciliberto and Tamer, 2009; Chesher and Rosen, 2020).Figure 2.1: Pure strategy Nash equilibriumThere are several approaches to handle this incompleteness issue in the literature. Among them, thisstudy adopts the traditional approach developed by Bresnahan and Reiss (1990) and Berry (1992) thatfocuses only on the unique equilibrium outcomes. That is, we consider estimating the model based only onthe information about “one-way links” in the network.In the following, we assume that { (cid:15) i,j } are identically distributed with a known cumulative distributionfunction (CDF) F . Further, we assume that the pairs { ( (cid:15) i,j , (cid:15) j,i ) } are independent and identically distributed(i.i.d.) across pairs, and their joint distribution is represented by H ( · , · ; ρ ) such that Pr( (cid:15) i,j ≤ a , (cid:15) j,i ≤ a ) = H ( a , a ; ρ ) , where ρ ∈ R is a parameter controlling the correlation between (cid:15) i,j and (cid:15) j,i . Define y (1 , i,j ≡ { ( g i,j , g j,i ) = (1 , } and y (0 , i,j ≡ { ( g i,j , g j,i ) = (0 , } . Let χ n,a be the a -th column of I n , A = ( A , , . . . , A ,n ) (cid:62) , and B = ( B , , . . . , B ,n ) (cid:62) , so that we can write Z (cid:62) i,j β + A ,i + B ,j = W (cid:62) i,j Π , Refer to de Paula (2013) for a comprehensive survey on this topic. W i,j = ( Z (cid:62) i,j , χ (cid:62) n,i , χ (cid:62) n,j ) (cid:62) and Π = ( β (cid:62) , A (cid:62) , B (cid:62) ) (cid:62) . In addition, we denote θ = ( β (cid:62) , α , ρ ) (cid:62) and γ = ( A (cid:62) , B (cid:62) ) (cid:62) . Then, the conditional probabilities of { y (1 , i,j = 1 } and { y (0 , i,j = 1 } are respectivelygiven as follows: P (1 , i,j ( θ , γ ) ≡ F ( W (cid:62) i,j Π ) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) ,P (0 , i,j ( θ , γ ) ≡ F ( W (cid:62) j,i Π ) − H ( W (cid:62) i,j Π + α , W (cid:62) j,i Π ; ρ ) . Here, note that the equalities y (1 , i,j = y (0 , j,i and P (1 , i,j ( θ, γ ) = P (0 , j,i ( θ, γ ) hold. Thus, the likelihoodfunction can be concentrated with respect to ( y (1 , i,j , P (1 , i,j ( θ, γ )) ; hereinafter, we omit the superscripts anddenote y i,j = y (1 , i,j and P i,j ( θ, γ ) = P (1 , i,j ( θ, γ ) when there is no confusion. Then, the log-likelihoodfunction can be written as L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i [ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [2 y i,j ln P i,j ( θ, γ ) + (1 − y i,j ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))] , (2.2)where N ≡ n ( n − . As above, we can consider three equivalent representations for the log-likelihoodfunction and switch between them according to analytical convenience. As mentioned in Introduction section, this study considers situations where the agents are grouped into severalsub-samples, and the individual fixed effects are heterogeneous across these groups but are homogeneouswithin the groups in the following manner: A ,i = K A (cid:88) k =1 a ,k · { i ∈ C A ,k } , B ,i = K B (cid:88) k =1 b ,k · { i ∈ C B ,k } . (2.3)That is, the agents can be classified into K A groups C A ≡ {C A , , . . . , C A ,K A } in terms of the sender effects { A ,i } , where K A is the total number of groups, which form a partition of { , . . . , n } into K A subsets.Similarly, in terms of the receiver effects { B ,i } , the agents can be grouped as C B ≡ {C B , , . . . , C B ,K B } .When an individual is a member of the intersection C A ,k (cid:84) C B ,l , his/her sender and receiver effects are equalto a ,k and b ,l , respectively. This intersection set would correspond to one “community” in the communitydetection literature. The group where each individual belongs to remains unknown to us. Meanwhile, itis often assumed in the literature that the number of groups is known to researchers (e.g., Bonhomme and7anresa, 2015; Okui and Wang, 2020). Then, following these studies, we treat K A and K B as known valuesand assume that K A , K B ≥ . We discuss how to choose K A and K B in practice in Section 5. Notethat transforming ( A , B ) to ( A + c, B − c ) for any constant c does not change the model (2.1). Thus,without loss of generality, we assume that a , = 0 for location normalization.Under this setup, the full ML estimator solves max ( θ, a , b , C A , C B ) L n ( θ, A , B ) , (2.4)where a ≡ ( a , . . . , a K A ) with a = 0 , b ≡ ( b , . . . , b K B ) , C A ≡ {C A , . . . , C AK A } , C B ≡ {C B , . . . , C BK B } , A i = (cid:80) K A k =1 a k · { i ∈ C Ak } , and B i = (cid:80) K B k =1 b k · { i ∈ C Bk } . The maximization problem in (2.4) is clearlya combinatorial (NP-hard) optimization problem. In the context of panel data models, several authors haveproposed iterative k-means (like) algorithms to computationally obtain a (local) solution efficiently to theproblems similar to (2.4) (e.g., Bonhomme and Manresa, 2015; Liu et al. , 2020). However, the iterativealgorithm is still computationally demanding. More importantly, it cannot be directly applied to our networkmodel where each agent’s heterogeneity parameters affect not only the value of his/her own likelihoodfunction but also that of the others.Hence, in this paper, we propose to decompose the maximization problem in (2.4) into three steps. Thefirst step is to estimate γ = ( A (cid:62) , B (cid:62) ) (cid:62) using the full ML estimator based on the log-likelihood function in(2.2) without explicitly considering the group structure. Given the consistent estimates of these parameters,the second step is to estimate the group memberships C A and C B using the BS algorithm (e.g., Bai, 1997; Ke et al. , 2016; Lian et al. , 2019; Wang and Su, 2020). The final step is to solve (2.4) with the group structurereplaced by the estimated C A and C B . Before presenting the estimation procedure in detail, we discuss identification conditions for the parametersin model (2.1). It is important to note that, even when individual heterogeneity parameters have only finitevariations groupwisely, each individual’s parameters must be point-identified separately to estimate the groupstructure consistently. A practical reason for this is that our three-step estimator based on the BS algorithmrequires preliminary consistent estimates of A and B in the estimation of the group structure. Note thatwe can always estimate “pseudo-true” group memberships based on the maximum likelihood principle evenwhen some elements of A and B are not point-identified. However, they may not necessarily coincidewith the true group memberships in general. This assumption should be debatable. Rather than directly specifying the number of groups, in the literature on the BSalgorithm for example, Ke et al. (2016) and Lian et al. (2019) propose introducing an additional threshold parameter to detect thegroup structure. However, since the number of groups can vary only discretely with the threshold value, introducing the thresholdparameter is essentially equivalent to selecting the number of groups. For the use of the BS algorithm, Wang and Su (2020) formallyshows that Bayesian Information Criterion (BIC)-type criterion can consistently select the correct number of groups as the samplesize increases. A more formal investigation on this issue is left as a future work. For a related discussion, see Bonhomme and Manresa (2015). A and B . In the following, similarlyas above, we set A , = 0 without loss of generality. To facilitate the discussion, we also introduce severalsimplifying assumptions, some of which are mentioned previously. Assumption 2.1. (i) The payoff disturbances { (cid:15) i,j } are identically distributed on the whole R with aknown strictly increasing marginal CDF F ( · ) . (ii) The pairs { ( (cid:15) i,j , (cid:15) j,i ) } are i.i.d. across dyads with jointCDF H ( · , · ; ρ ) , and H ( · , · ; ρ ) is strictly increasing in each argument for all ρ ∈ R . (iii) F ( · ) is threetimes continuously differentiable, and H ( · , · ; · ) is three times continuously differentiable with respect to allarguments. Let Θ ≡ B × A × R , A n ≡ { } × A n − , B n ≡ B n , and C n ≡ A n × B n , where B ⊂ R d z , A ⊂ R ++ , R ⊂ R , A ⊂ R , and B ⊂ R are parameter spaces for β , α , ρ , A i ’s, and B i ’s, respectively. Assumption 2.2. (i) θ ∈ Θ int , where Θ is compact. (ii) For all i = 2 , , . . . , A ,i ∈ A int , and, for all i = 1 , , . . . , B ,i ∈ B int , where A and B are compact. Assumption 2.3.
The covariates { Z i,j } are uniformly bounded. In Assumption 2.1(i), we assume that the marginal CDF of the error term is known. This assumption istypically adopted in the estimation of complete information games. As shown by Khan and Nekipelov (2018),when the marginal CDFs of ( (cid:15) i,j , (cid:15) j,i ) are unknown, it is generally impossible to estimate the interaction effect α at the parametric rate. Assumption 2.1(ii) requires that the error terms are independent across dyads. Notethat the parameters ( A ,i , B ,i ) have the role of accommodating all unobserved payoff components in linkformation behavior specific to i . Therefore, assuming the independence within the remainders { (cid:15) i, , . . . , (cid:15) i,n } should not be too restrictive. The other requirements in Assumption 2.1 are standard in that they are satisfiedin most of commonly used parametric models (such as logit and probit). In Assumption 2.2(ii), we assumethat the fixed-effect parameters { ( A ,i , B ,i ) } are bounded. Although imposing boundedness on the degreeheterogeneity parameters is commonly accepted in the literature on network formation models, some studiesconsider a more general framework where || A || ∞ and || B || ∞ can grow slowly (e.g., Yan et al. , 2016; Yan et al. , 2019). The admissible parameter space for the correlation parameter ρ , R , depends on the choice of thefunctional form of H , which is typically R = [ − , . Assumption 2.3 should not be restrictive in practice.Hereinafter, we fix the values of { Z i,j } ; that is, we interpret the following analysis as being conditional onthe realization of { Z i,j } . Thus, any randomness in the model is considered to be due to the randomness of { (cid:15) i,j } .An important implication from Assumptions 2.1–2.3 is that the one-way link-formation probabilities { P i,j ( θ, γ ) } are uniformly bounded away from and for all possible parameter values. In other words,we are assuming that our networks are dense such that the number of one-way links per agent will be aboutproportional to the number of sampled agents. The plausibility of this assumption depends on the context ofapplication. For example, international trade networks and the visa-free travel networks given in Example9.2 and Section 5 may be regarded as dense networks. Assumption 2.4.
For any ( β , α , γ ) , ( β , α , γ ) ∈ B × A × C n such that ( β , α , γ ) (cid:54) = ( β , α , γ ) ,either or both (a) and (b) hold: ( a ) lim inf n →∞ N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:110) W (cid:62) i,j (Π − Π ) > , W (cid:62) j,i (Π − Π ) + α − α < (cid:111) > b ) lim inf n →∞ N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:110) W (cid:62) i,j (Π − Π ) < , W (cid:62) j,i (Π − Π ) + α − α > (cid:111) > , where Π = ( β (cid:62) , γ (cid:62) ) (cid:62) , and Π = ( β (cid:62) , γ (cid:62) ) (cid:62) . Assumption 2.4 is our main identification condition, which basically requires the following two condi-tions. The first condition is a standard full-rank condition for { W i,j } and { W j,i } . The second condition isthat at least either Z i,j or Z j,i should contain agent-specific covariates that have large enough supports andalso have variations across all potential partners. If no such variables exist, since the signs of Z (cid:62) i,j ( β − β ) and Z (cid:62) j,i ( β − β ) cannot differ for some parameter values for all ( i, j ) ’s, Assumption 2.4 does not hold.It should be noted that this assumption is not inconsistent with Assumption 2.3 under the compactnessof the parameter space (i.e., Assumption 2.2). While the existence of player-specific continuous variableswith unbounded supports is typically required in the identification of non/semiparametric game models –the so-called identification-at-infinity argument (see, e.g., Tamer, 2003; Kline, 2015), we can develop ouridentification result under less stringent conditions owing to the full-parametric model specification. Theorem 2.3 (Identification) . (i) Suppose that Assumptions 2.1(i)–(ii) and 2.2–2.4 hold. Then, if ρ isknown, ( β , α , γ ) can be point-identified on B × A × C n .(ii) If ρ is unknown, but it is a unique maximizer of L ∗ n ( ρ ) , then ( θ , γ ) can be point-identified on Θ × C n ,where L ∗ n ( ρ ) ≡ E L n (( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , ρ ) (cid:62) , (cid:101) γ ( ρ )) , (2.5) and ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) ≡ argmax ( β,α, γ ) ∈B×A× C n E L n (( β (cid:62) , α, ρ ) (cid:62) , γ ) . These identification results are similar to those in Theorem 2 of Aradillas-Lopez and Rosen (2019).That is, the model parameters can be point-identified either (i) if the distribution of unobserved payoffdisturbances is fully known or (ii) if ρ uniquely maximizes the concentrated log-likelihood function (2.5).In the literature on network formation models, it is often assumed that (cid:15) i,j and (cid:15) j,i are independent (e.g., Hoff et al. , 2002; Jochmans, 2018; Yan et al. , 2019). If they are independent, since ρ = 0 is known, condition (i) Graham (2016) and Jochmans (2018) developed conditional likelihood methods that can be used to estimate the homophilyparameters (i.e., β in our context), even when the networks are sparse. However, their approach cannot be directly applied to ourcase because of the interdependence between g i,j and g j,i .
10s satisfied. Condition (ii) clearly depends on the choice of H function and is difficult to verify in general;however, this is directly empirically testable. A more primitive sufficient condition for this is that L ∗ n ( ρ ) is strictly concave, which is satisfied when ∂ L ∗ n ( ρ ) / ( ∂ρ ) is strictly negative under Assumption 2.1(iii).We provide an explicit form of ∂ L ∗ n ( ρ ) / ( ∂ρ ) in Appendix C.1. Even when neither of condition (i) nor(ii) is satisfied, it remains possible to partially identify the parameters, as in Aradillas-Lopez and Rosen(2019). For example, if L ∗ n ( ρ ) has two peaks at ρ and ρ , the resulting identified set is directly given by { ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , ρ, (cid:101) γ ( ρ )) : ρ ∈ { ρ , ρ }} . Remark 2.4 (Other identification strategies) . There are several other routes for identification of our modelthan the one in Theorem 2.3. For example, as our model is fully parametric, a classical parametricidentification approach may be used based on the properties of the information matrix (e.g., Rothenberg,1971; Bjorn and Vuong, 1984), although it is not easy to verify in practice. If one admits the existence of aplayer-specific continuous variable that has a positive density on the whole R , then the model can be easilyidentified by the identification-at-infinity approach in the same manner as in Tamer (2003). Since assumingthe existence of such unbounded variables is restrictive in practice, several authors have proposed otherapproaches based on some shape restrictions on the distribution of unobservables, without relying on theidentification-at-infinity argument (e.g., Kline, 2016; Zhou, 2019). Investigating whether their approachescan be applied to our model is an interesting topic, but it is left for a future work. The first step of the ML estimation aims to obtain consistent estimates of γ = ( A (cid:62) , B (cid:62) ) (cid:62) . Let ( (cid:98) θ n , (cid:98) γ n ) = argmax ( θ, γ ) ∈ Θ × C n L n ( θ, γ ) , (3.1)where (cid:98) θ n = ( (cid:98) β (cid:62) n , (cid:98) α n , (cid:98) ρ n ) (cid:62) , and (cid:98) γ n = ( (cid:98) A (cid:62) n , (cid:98) B (cid:62) n ) (cid:62) . Below, we present the asymptotic properties of theinitial full ML estimator in (3.1) with the main focus on (cid:98) γ n . Instead of introducing particular identificationconditions, for generality, we directly assume that the true parameter ( θ , γ ) is a unique maximizer of E L n ( θ, γ ) . Assumption 3.1. E L n ( θ, γ ) is uniquely maximized at ( θ , γ ) ∈ Θ × C n for all sufficiently large n . We first establish several consistency results in the next theorem. In particular, we show that the individualspecific effects can be uniformly consistently estimated.
Theorem 3.1.
Suppose that Assumptions 2.1–2.3 and 3.1 hold. Then, we have (i) (cid:98) θ n p → θ , (ii) n (cid:80) ni =1 | (cid:98) A n,i − A ,i | p → , (iii) n (cid:80) ni =1 | (cid:98) B n,i − B ,i | p → , and (iv) || (cid:98) γ n − γ || ∞ p → . (cid:98) γ n . To this end, it is convenient to re-define (cid:98) θ n and θ as (cid:98) θ n = argmax θ ∈ Θ L n ( θ, (cid:101) γ n ( θ )) and θ = argmax θ ∈ Θ E L n ( θ, (cid:101) γ ( θ )) , respectively, where (cid:101) γ n ( θ ) ≡ argmax γ ∈ C n L n ( θ, γ ) and (cid:101) γ ( θ ) ≡ argmax γ ∈ C n E L n ( θ, γ ) for any given θ ∈ Θ , assuming that they are well-defined. Further, we define γ − = ( A , . . . , A n , B , . . . , B n ) (cid:62) , H n, γγ ( θ, γ ) (2 n − × (2 n − ≡ ∂ L n ( θ, γ ) ∂ γ − ∂ γ (cid:62)− , H n,θθ ( θ, γ ) ( d z +2) × ( d z +2) ≡ ∂ L n ( θ, γ ) ∂θ∂θ (cid:62) , H n, γ θ ( θ, γ ) (2 n − × ( d z +2) ≡ ∂ L n ( θ, γ ) ∂ γ ∂θ (cid:62) , and H n,θ γ ( θ, γ ) ≡ H n, γ θ ( θ, γ ) (cid:62) . The exact form of H n, γγ ( θ, γ ) can be found in (A.2) in Appendix A.Finally, let I n,θθ ( θ, γ ) ( d z +2) × ( d z +2) ≡ H n,θθ ( θ, γ ) − H n,θ γ ( θ, γ ) [ H n, γγ ( θ, γ )] − H n, γ θ ( θ, γ ) . This I n,θθ ( θ, γ ) matrix serves as the Hessian matrix for the concentrated ML estimator (cid:98) θ n (see, e.g.,Amemiya, 1985). Now, we introduce the following assumptions. Assumption 3.2. (cid:101) γ ( θ ) uniquely exists uniformly on { Θ : || θ − θ || ≤ ε } , where ε > is an arbitrary smallconstant. Assumption 3.3.
For an arbitrary small constant ε > , there exist constants c γ , c θ > that may dependon ε such that (i) λ min ( − n · H n, γγ ( θ, γ )) > c γ and (ii) λ min ( −I n,θθ ( θ, γ )) > c θ w.p.a.1 uniformly on { Θ × C n : || θ − θ || ≤ ε, || γ − γ || ∞ ≤ ε } . Assumptions 3.2 and 3.3 should be fairly reasonable in practice. Then, under these additional assump-tions, we can derive the (cid:96) -norm and max-norm convergence rates for (cid:98) γ n , as shown in the next theorem. Theorem 3.2.
Suppose that Assumptions 2.1–2.3 and 3.1–3.3 hold. Then, we have (i) n (cid:80) ni =1 | (cid:98) A n,i − A ,i | = O P ( n − / ) , (ii) n (cid:80) ni =1 | (cid:98) B n,i − B ,i | = O P ( n − / ) , and (iii) || (cid:98) γ n − γ || ∞ = O P ( (cid:112) ln n/n ) . The max-norm convergence rate obtained in Theorem 3.2 is consistent with the result of Theorem 3 inGraham (2017) and that of Theorem 3.1 in Yan et al. (2019).
Given the consistent estimates of A and B , we use the BS algorithm to estimate the group structure. Wefirst sort (cid:98) A n and (cid:98) B n in ascending order and write the order statistics as (cid:98) A n, (1) ≤ (cid:98) A n, (2) ≤ · · · ≤ (cid:98) A n, ( n ) , and (cid:98) B n, (1) ≤ (cid:98) B n, (2) ≤ · · · ≤ (cid:98) B n, ( n ) .
12n the following, we mainly describe the estimation of the group membership for the sender effects, C A . Theexactly the same procedure described below can be used to estimate C B .An concept behind the BS algorithm is quite simple. If { A ,i } are heterogeneous across K A latentgroups but are homogeneous within the groups, there should exist K A − “break points” in the sorted { A ,i } . Since (cid:98) A n is uniformly consistent for A , these break points appear also in the following sequence: (cid:98) A n, (1) , . . . , (cid:98) A n, ( n ) w.p.a.1. For ≤ i < j ≤ n , we define (cid:98) ∆ A ( i, j ) as the sum of squared variations over { (cid:98) A n, ( i ) , . . . , (cid:98) A n, ( j ) } ; namely, (cid:98) ∆ A ( i, j ) ≡ j (cid:88) l = i ( (cid:98) A n, ( l ) − ¯ A n,i,j ) , where ¯ A n,i,j ≡ j − i + 1 j (cid:88) l = i (cid:98) A n, ( l ) . Further, we define (cid:98) S Ai,j ( κ ) ≡ j − i +1 (cid:16) (cid:98) ∆ A ( i, κ ) + (cid:98) ∆ A ( κ + 1 , j ) (cid:17) if j < κ j − i +1 (cid:98) ∆ A ( i, j ) if j = κ. That is, (cid:98) S Ai,j ( κ ) provides the total variance of { (cid:98) A n, ( i ) , . . . , (cid:98) A n, ( j ) } when a break point is placed at κ . Assumingthat K A ≥ , the BS algorithm proceeds as follows: Step 1 ( K A = 2) : We find the first break point, say (cid:98) t , by (cid:98) t = argmin ≤ κ
Theorem 3.3.
Suppose that Assumptions 2.1–2.3 and 3.1–3.4 hold. Then, we have
Pr( (cid:98) C An = C A ) → and Pr( (cid:98) C Bn = C B ) → . Although the proof of Theorem 3.3 is almost analogous to Ke et al. (2016) and Wang and Su (2020), forcompleteness, we provide it in Appendix B.
Remark 3.4 (Fine-tuning) . Bai (1997) shows that the BS method tends to over/underestimate the locationof break points depending on the share of each group and the gaps between the values of the group effects.14o account for this problem, he proposes the following repartitioning method. Let { (cid:98) t , . . . , (cid:98) t K A } be theestimated break points obtained by the standard BS method. Then, we replace the initial estimate (cid:98) t k with (cid:98) t repart k ≡ argmin (cid:98) t k − +1 ≤ κ< (cid:98) t k +1 (cid:98) S A (cid:98) t k − +1 , (cid:98) t k +1 ( κ ) for all k = 1 , . . . , K A − . Letting (cid:98) C A, repart n bethe resulting repartitioned estimator of C A , given the result of Theorem 3.3, it is straightforward to seethat (cid:98) C A, repart n is also consistent for C A . Note that the above repartitioning procedure can be implementedrecursively until convergence. The numerical simulations in Section 4 indicate that using this repartitioningmethod remarkably improves the probability of correctly predicting the group memberships. In the final step, we solve (2.4) approximately by using ( (cid:98) C An , (cid:98) C Bn ) in the place of ( C A , C B ) . Recalling that a is pinned at a = 0 , let δ ≡ ( θ (cid:62) , a , . . . , a K A , b , . . . , b K B ) (cid:62) , and D ≡ Θ × A K A − × B K B for theparameter space of δ . We denote δ as the true value of δ . Then, our final ML estimator for δ is defined as (cid:98) δ n = argmax δ ∈ D (cid:98) L n ( δ ) , where (cid:98) L n ( δ ) ≡ L n (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ (cid:98) C An,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ (cid:98) C Bn,k } (cid:111)(cid:17) . Similarly, we define (cid:98) δ oracle n = argmax δ ∈ D L n ( δ ) , where L n ( δ ) ≡ L n (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ C A ,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ C B ,k } (cid:111)(cid:17) ; that is, (cid:98) δ oracle n is the “oracle”estimator that is computed based on the true C A and C B . Since (cid:98) δ oracle n is the standard parametric ML estimator,the estimator follows a normal distribution asymptotically at the parametric rate, and its asymptotic covariancematrix is given by the inverse Fisher Information matrix. Meanwhile, we have shown in Theorem 3.3 thatthe estimated group memberships ( (cid:98) C An , (cid:98) C Bn ) are equal to ( C A , C B ) w.p.a.1. Therefore, we can claim that thefinal ML estimator (cid:98) δ n has asymptotically the same statistical performance as the oracle estimator (cid:98) δ oracle n , and,thus, it is asymptotically fully efficient. We formally state this result in the next theorem. Theorem 3.5.
Suppose that Assumptions 2.1–2.3 and 3.1–3.4 hold. In addition, we assume that I δδ ≡− lim n →∞ E (cid:2) ∂ L n ( δ ) / ( ∂δ∂δ (cid:62) ) (cid:3) exists and is positive definite. Then, (cid:98) δ n and (cid:98) δ oracle n have the sameasymptotic distribution: (cid:113) N ( (cid:98) δ oracle n − δ ) d → N ( d z + K A + K B +1 , I − δδ ) . Recall that the asymptotic equivalence between (cid:98) δ n and (cid:98) δ oracle n relies on the dense network structurewhere each agent’s specific effects can be point-identified. When the networks are not dense, (cid:98) δ n is generallyinconsistent, while (cid:98) δ oracle n may be still consistent (potentially with a slower convergence rate). Finally, notethat the above discussions hold true if the repartitioned estimator ( (cid:98) C A, repart n , (cid:98) C B, repart n ) is used instead of ( (cid:98) C An , (cid:98) C Bn ) . 15 Monte Carlo Experiments
In this section, we examine the finite sample performance of the three-step ML estimator. We consider thefollowing data-generating process for the Monte Carlo experiments: u i,j ( q ) = Z i,j, β , + Z i,j, β , + α q + K A (cid:88) k =1 a ,k · { i ∈ C A ,k } + K B (cid:88) k =1 b ,k · { i ∈ C B ,k } − (cid:15) i,j , for i (cid:54) = j, where Z i,j, = | X i − X j | with X i i.i.d. ∼ Uniform[ − , , Z i,j, i.i.d. ∼ N (0 , , ( (cid:15) i,j , (cid:15) j,i ) is i.i.d. across dyadsas the standard bivariate normal with correlation coefficient ρ = 0 . , ( β , , β , , α ) = ( − . , . , . ,and K A = K B = 3 . For the groupwise heterogeneity parameters, we consider ( a , , a , , a , ) = (0 , r, r ) and ( b , , b , , b , ) = ( − . − r, − . , − . r ) for r ∈ { . , . , . } . The smaller (larger) r becomes,the more difficult (easier) the identification of the group structure. The group memberships are determinedrandomly while maintaining the equal size of each group. Exceptionally for observation , ∈ C A , (sothat A , = 0 ) is fixed throughout the experiments. For each model setup, we consider two sample sizes: n ∈ { , } ; thus, the size of each group is 18 in the former case and is 25 in the latter. The number of MonteCarlo repetitions is set to 500 for each single experiment. For the estimation of the group memberships, forcomparison, we use both the standard BS method without repartitions and the repartitioned BS method.We first report the simulation results of estimating the common parameters ( α , β , , β , , ρ ) . Table4.1 presents the bias and RMSE (root mean squared error) for the following four estimators: the initialML estimator given in (3.1) (1st-step ML), the three-step ML estimator based on the BS method withno iterations (BS0) and that with two iterations (BS2), and the oracle estimator based on the true groupmembership (Oracle). For estimating ( β , , β , ) , as expected, the 1st-step ML estimator is largely biased forall scenarios due to the incidental parameter problem. Although the three-step estimators (i.e., BS0 and BS2)also have some biases when r = 0 . and n = 54 , the biases disappear as either r or n increases. Thus, thesebiases are probably due to frequent misclassification of group memberships under small r and n . In termsof RMSE, although we can observe a certain gap between the oracle estimator and the three-step estimators,the gaps can be reduced by increasing r and n , which is consistent with our theory. Interestingly, even whenusing the 1st-step ML estimator, the strategic interaction effect and the error correlation parameter can beestimated with almost no bias.The simulation results of estimating the group memberships are summarized in Table 4.2. Here, wecompare the performance of BS0 and BS2 in terms of the ratio of correct group classification. First of all,the results indicate that the repartitioned BS method (i.e., BS2) clearly outperforms the standard BS methodwithout repartitions (i.e., BS0). As expected, as r gets smaller, correctly predicting the group membershipbecomes significantly more difficult. If the gaps between the values of the group effects are sufficiently largeand the sample size is not small, BS2 can attain almost 90% of correct classification. We cannot observe One might view that the results reported in Table 4.2 are not particularly good for the BS algorithm. A main reason for thatwould be that our model is a bivariate binary response model, whereas most of the previous studies using the BS algorithm has α β , β , ρ r n Estimator Bias RMSE Bias RMSE Bias RMSE Bias RMSE0.4 54 1st-step ML 0.027 0.194 -0.192 0.358 0.222 0.257 0.047 0.196BS0 0.056 0.174 -0.113 0.237 0.120 0.162 -0.059 0.176BS2 0.057 0.176 -0.128 0.254 0.134 0.175 -0.050 0.176Oracle 0.006 0.131 -0.008 0.082 0.009 0.092 0.001 0.12675 1st-step ML 0.032 0.122 -0.102 0.261 0.152 0.172 0.022 0.118BS0 0.043 0.116 -0.056 0.176 0.080 0.108 -0.054 0.125BS2 0.043 0.117 -0.063 0.190 0.091 0.117 -0.043 0.120Oracle -0.001 0.093 -0.002 0.061 0.004 0.066 0.004 0.0910.7 54 1st-step ML 0.030 0.221 -0.175 0.351 0.221 0.263 0.061 0.234BS0 0.020 0.176 -0.056 0.220 0.056 0.125 -0.072 0.193BS2 0.035 0.184 -0.074 0.235 0.084 0.143 -0.069 0.197Oracle 0.007 0.132 -0.010 0.079 0.010 0.094 0.002 0.12875 1st-step ML 0.031 0.127 -0.104 0.267 0.152 0.173 0.031 0.126BS0 0.017 0.114 -0.017 0.169 0.025 0.074 -0.074 0.138BS2 0.025 0.118 -0.038 0.183 0.048 0.086 -0.058 0.132Oracle -0.004 0.091 -0.001 0.058 0.004 0.065 0.006 0.0911.0 54 1st-step ML 0.027 0.234 -0.178 0.356 0.227 0.267 0.082 0.268BS0 -0.007 0.182 0.003 0.209 -0.018 0.108 -0.120 0.233BS2 0.013 0.186 -0.036 0.214 0.029 0.114 -0.092 0.224Oracle 0.006 0.136 -0.011 0.080 0.014 0.094 0.000 0.13975 1st-step ML 0.033 0.145 -0.111 0.274 0.155 0.179 0.045 0.148BS0 -0.009 0.130 0.029 0.173 -0.039 0.085 -0.103 0.172BS2 0.010 0.120 -0.012 0.158 0.012 0.079 -0.063 0.137Oracle -0.001 0.094 -0.002 0.059 0.006 0.066 0.005 0.097Note. 1st-step ML: the initial ML estimator, BS0: the three-step ML estimator based on the BS method with norepartitions, BS2: the three-step ML estimator based on the repartitioned BS method with two iterations, Oracle: theoracle estimator based on the true group membership. any clear difference between the estimation of C A and that of C B . As an empirical application of our model and method, we analyze the network of international visa-freetravels. The dependent variable of interest is G n = ( g i,j ) ≤ i,j ≤ n , where g i,j = 1 if country i allows thecitizens in country j to visit i without visas, and g i,j = 0 otherwise. Since the bilateral relationship aboutvisa-free policy is expected to be complementary, this would fit into our model framework.In this empirical study, we consider 57 countries selected mainly from Asia, the Middle East, theformer USSR, and Oceania. The information about the visa policy of each country is taken from
Henly focused on models with a continuous outcome. The list of countries used in this empirical study is as follows: Armenia, Australia, Azerbaijan, Bahrain, Bangladesh, Belarus,Bhutan, Brunei, Cambodia, China, Cyprus, Estonia, Fiji, Georgia, Hong Kong, India, Indonesia, Iran, Iraq, Israel, Japan, Jordan,Kazakhstan, Kiribati, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lithuania, Malaysia, Moldova, Mongolia, Myanmar, Nauru,Nepal, New Zealand, Oman, Pakistan, Papua New Guinea, Philippines, Qatar, Russia, Saudi Arabia, Singapore, South Korea, SriLanka, Tajikistan, Thailand, Tonga, Turkey, UAE, Ukraine, Uzbekistan, Vanuatu, Viet Nam, and Yemen. These countries are
Correct classification ratio r n
Estimator C A C B and Partners: Passport Index 2020 ( ). The totalnumber of dyads in this network is − / . From Table 5.1, which summarizes the distributionof the link connections, we can observe that the number of country pairs with one-way links is smaller thanthat with mutual links or no links. This would suggest the presence of complementarity in the networkformation process. According to the above-mentioned passport index, Japan, Singapore, and South Koreaare the top three countries among the 57 countries in terms of the number of all countries with visa-freeaccess. For our restricted sample network, South Korea has the largest in-degree (cid:80) i g i, South Korea = 49 . Forthe out-degree, Nepal has the largest value (cid:80) j g Nepal ,j = 55 ; that is, Nepal allows 55 countries (out of 57)to visit Nepal only with on-arrival visas. More detailed information can be found in Table C.1 in AppendixC.2. Table 5.1: Distribution of { ( g i,j , g j,i ) : 1 ≤ i < j ≤ n } g j,i = 0 g j,i = 1 g i,j = 0
495 343 g i,j = 1
349 409The network for all the 57 countries is quite complicated and difficult to grasp the entire picture. Asone illustration of our data, Figure 5.1 presents the sub-network obtained by restricting the vertices to theEastern and Southeastern Asian countries. The left panel in the figure shows the whole shape of this sub-network. (Note that the direction of the arrows in the figure is “not” the direction of visa-free access, butit represents that the target country is allowed to visit the country at the arrow’s origin without visas.) The selected based on geographical proximity and ease of data collection. Based on their definition, we categorize electronic travel authorization (eTA) and on-arrival visa as visa-free access. ( g i,j · (1 − g j,i )) i,j ∈ Brunei ,...,
Viet Nam . From this figure, we can expect the existence of a certain level ofdegree heterogeneity. More specifically, Cambodia, for example, has five outgoing one-way links in thissub-network, suggesting that this country would have a larger sender effect A . In contrast, countries such asJapan and South Korea would exhibit a larger receiver effect B .Figure 5.1: Eastern and Southeastern Asian sub-network(Left panel: the whole sub-network, right panel: only one-way links.)For estimating the network formation model, we consider five covariates; for their definitions, see Table5.2. The summary statistics of the covariates are provided in Table C.2 in Appendix C.2. With thesevariables, we consider the following payoff function: u i,j ( q ) = (ln gdp_pc i )(ln gdp_pc j ) β , + | free i − free j | β , + { region i = region j } β , + ln( export ij + 1) β , + ln( import ij + 1) β , + α q + A ,i + B ,j − (cid:15) i,j , for i (cid:54) = j, where we assume that ( (cid:15) i,j , (cid:15) j,i ) have the standard bivariate normal distribution with correlation coefficient ρ . To estimate our network formation model with grouped degree heterogeneity, we first need to determinethe number of groups for the sender effects { A ,i } and that for the receiver effects { B ,i } , K A and K B ,respectively. Then, following Ke et al. (2016) and Wang and Su (2020), the optimal ( K A , K B ) is selected asthe minimizer of the BIC criterion: − (cid:98) L n ( (cid:98) δ n ) + (6 + K A + K B ) ln(1596) . Then, as a result of searchingover the models with ( K A , K B ) ∈ { , . . . , } , we find that the model with ( K A , K B ) = (7 , achieves thesmallest BIC (see Table C.3 in Appendix C.2 for more detailed information), and this is the model reportedhere. For comparison, we estimate not only our proposed model, which we call the grouped heterogeneity19odel, but also a dyadic bivariate probit model without strategic interaction and degree heterogeneities asa benchmark. For the estimation of the group memberships, we employ the repartitioning method with twoiterations (i.e., BS2 in the previous section).Table 5.2: Definitions of VariablesVariables Definitions gdp_pc i GDP per capita in 2018 (1,000 USD) free i Freedom rating (1 = Most Free, 7 = Least Free) a region i Categorical variable: East Asia, Southeast Asia, Central Asia, Europe, Middle East, or Oceania. export ij Total export value from country i to j in 2018 (million USD) b import ij Total import value from country i to j in 2018 (million USD) b Sources: (a)
Freedom in the World 2018 , Freedom House ( https://freedomhouse.org/ ); (b) IMF DATA( https://data.imf.org/ ).The estimation results are summarized in Table 5.3. First of all, as expected, our proposed modelsuggests that there is a significant strategic complementarity in the network formation behavior. We can alsofind a certain level of degree heterogeneity in terms of both the sender and the receiver effects. Comparingthe grouped heterogeneity model and the benchmark model, the log-likelihood value for the former isapparently significantly larger than that for the latter. This large difference in the degree of model fittingalso demonstrates the significance of strategic effect and unobserved heterogeneity (note however that thesemodels are not nested). For specific parameter estimates, we can observe several non-negligible differencesbetween the two models. For example, the effect of the export amount is predicted to be positive in thegrouped heterogeneity model, whereas the benchmark model predicts a significantly negative impact. Theerror correlation parameter is not significantly different from zero in our model but is weakly positivelysignificant in the benchmark model. This result would be understandable since the benchmark model canaccount for the interdependence of the links only through the error correlation. Additionally, there are severalinteresting findings. For both models, if countries i and j are located in the same region, they become morelikely to allow visa-free access, as expected. Not only in terms of geographical proximity, but we can alsoobserve significant homophily in terms of the political system.For the estimation results of country-specific effects, we report the estimated group memberships inTable 5.4. As expected from the above discussion, countries such as Cambodia are indeed classified intothe highest group (i.e., Group 7) in terms of the sender effect. The other two countries that have Group-7sender effect are Nepal and Sri Lanka. For the receiver effect, these two countries are classified as Group 2,and Cambodia is in Group 3. Overall, interestingly, there seems to be a weak negative correlation betweenthe sender effects and the receiver effects. As expected from the above discussion, Japan and South Koreaindeed belong to the group with the highest receiver effect (i.e., Group 6). The magnitudes of the receivereffects seem to roughly correlate with the size of the countries’ economies (with some exceptions, such as20able 5.3: Estimation ResultsGrouped Heterogeneity Model Benchmark ModelEstimate t -value Estimate t -valueIntercept -0.208 -1.196Strategic effect: α (ln gdp_pc j )(ln gdp_pc i ) : β | free i − free j | : β -0.229 -7.044 -0.075 -1.692 { region i = region j } : β ln( export ij + 1) : β ln( import ij + 1) : β ρ -0.080 -0.450 0.088 1.656Sender effects: A Group 1: a a a a a a a B Group 1: b -5.670 -14.388Group 2: b -4.590 -16.159Group 3: b -3.902 -14.559Group 4: b -3.446 -14.240Group 5: b -2.561 -10.952Group 6: b -1.558 -6.722Log-likelihood -810.798 -1531.496 This paper proposed a network formation model with pairwise strategic interaction and grouped degree het-erogeneity. Assuming some parametric form for the error distribution, we proved that the model parameterscan be identified under the availability of agent-specific covariates that have large supports and also havevariations across all potential partners. For estimating the model, based on the same idea as in Bresnahanand Reiss (1990) and Berry (1992), we proposed the three-step ML procedure: in the first-step, the model isestimated without considering the group structure; subsequently, we estimate the group memberships usingthe BS algorithm given the estimates for the heterogeneity parameters obtained in the first step; and, finally,21able 5.4: Estimated Group Memberships
Sender Effect : A Receiver Effect : B Group 1 Australia, China, Iraq, Nauru, Oman, Russia Iraq, PakistanGroup 2 Bhutan, India, Japan, New Zealand, UAE Bangladesh, Iran, Jordan, Lebanon, Nepal, SriLanka, YemenGroup 3 Bahrain, Cyprus, Estonia, Georgia, Kiribati,Kuwait, Latvia, Lithuania, Mongolia, Myanmar,Papua New Guinea, Qatar, Saudi Arabia, SouthKorea, Tonga, Turkey, Ukraine, Viet Nam, Yemen Armenia, Cambodia, Laos, Myanmar, Qatar, Rus-sia, Viet NamGroup 4 Azerbaijan, Belarus, Brunei, Fiji, Hong Kong, Is-rael, Kazakhstan, Kyrgyzstan, Moldova, Pakistan,Singapore, Thailand, Uzbekistan Belarus, Bhutan, China, Fiji, Georgia, In-dia, Indonesia, Kazakhstan, Kiribati, Kyrgyzstan,Moldova, Mongolia, Nauru, Papua New Guinea,Philippines, Tajikistan, Thailand, Tonga, Ukraine,UzbekistanGroup 5 Armenia, Bangladesh, Jordan, Lebanon, Malaysia,Philippines, Tajikistan, Vanuatu Azerbaijan, Bahrain, Kuwait, Latvia, Lithuania,Oman, Saudi Arabia, Turkey, UAE, VanuatuGroup 6 Indonesia, Iran, Laos Australia, Brunei, Cyprus, Estonia, Hong Kong,Israel, Japan, Malaysia, New Zealand, Singapore,South KoreaGroup 7 Cambodia, Nepal, Sri Lanka based on the estimated group memberships, we re-estimate the model. Under certain regularity conditions,we showed that the proposed estimator is asymptotically unbiased and distributed as normal at the parametricrate. The results of the Monte Carlo simulations show that our estimator performs reasonably well in finitesamples. An empirical application to international visa-free travel networks indicates the usefulness of theproposed model.Several limitations and extensions are as follows. First, our approach can be used only in pairwisenetwork formation games with no network externalities to/from the rest of the links, and this limits theempirical applicability. Therefore, it would be worthwhile to extend our results to network formation modelswith general network externalities involving more than two agents. However, we conjecture that we wouldresort to partial identification to achieve this. Second, our approach requires that the degree heterogeneityparameters have discrete support, although, in reality, it is possible that they are continuous. To address thisissue, it is of interest to modify our model in a similar manner to Bonhomme et al. (2017) and investigatethe three-step ML estimator in which K A and K B grows slowly to infinity. Third, as our model is a dyadicbinary game model, where a pairwise network formation model is its special case, we can consider itsordered-response game version as a natural extension. For example, we might be interested in analyzingbilateral military relations: non-alliance, quasi-alliance, or alliance. We expect that such extension can berelatively easily achieved by adopting the ML estimator discussed in Aradillas-Lopez and Rosen (2019).Finally, related to the empirical application in this study, we might be interested in investigating the causaleffect of visa policies between two countries on the flows of tourists between them; this is a dyadic treatmentevaluation problem when the treatment variable is determined strategically. To deal with such situations,22ombining the results of this study and the marginal treatment effect framework developed in Hoshino andYanagi (2020) would be beneficial. We leave these topics for future research.23 ppendix A Notations
Variables and parameters W i,j ≡ ( Z (cid:62) i,j , χ (cid:62) n,i , χ (cid:62) n,j ) (cid:62) θ ≡ ( β (cid:62) , α, ρ ) (cid:62) A ≡ ( A , . . . , A n ) (cid:62) , A − ≡ ( A , . . . , A n ) (cid:62) , B ≡ ( B , . . . , B n ) (cid:62) γ ≡ ( A (cid:62) , B (cid:62) ) (cid:62) , γ − ≡ ( A (cid:62)− B (cid:62) ) (cid:62) Π ≡ ( β (cid:62) , γ (cid:62) ) (cid:62) π i,j ≡ Z (cid:62) i,j β + A i + B j = W (cid:62) i,j Π δ ≡ ( θ (cid:62) , a , . . . , a K A , b , . . . , b K B ) (cid:62) . Functions and derivatives
Throughout this appendix, for notational simplicity, we denote ∂ a g ( a ) = ∂g ( a ) / ( ∂a ) , ∂ ab g ( a, b ) = ∂ g ( a, b ) / ( ∂a∂b ) , and so fourth. P i,j ( θ, γ ) ≡ F ( W (cid:62) i,j Π) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) (cid:96) i,j ( θ, γ ) ≡ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ )) L n ( θ, γ ) ≡ N n (cid:88) i =1 (cid:88) j>i (cid:96) i,j ( θ, γ ) p ,i,j ( θ, γ ) ≡ ∂ π i,j P i,j ( θ, γ ) = f ( W (cid:62) i,j Π) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) p ,i,j ( θ, γ ) ≡ ∂ π j,i P i,j ( θ, γ ) = − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) H ρ ( · , · , ; ρ ) ≡ ∂ ρ H ( · , · ; ρ ) , where f is the derivative of F , and H l ( · , · ; ρ ) is the derivative of H ( · , · ; ρ ) with respect to the l -th argument. ∂ A k P i,j ( θ, γ ) = p ,i,j ( θ, γ ) { i = k } + p ,i,j ( θ, γ ) { j = k } ∂ B k P i,j ( θ, γ ) = p ,i,j ( θ, γ ) { j = k } + p ,i,j ( θ, γ ) { i = k } s ,i,j ( θ, γ ) ≡ ∂ π i,j (cid:96) i,j ( θ, γ ) = y i,j p ,i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i p ,j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ p ,i,j ( θ, γ ) + p ,j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) s ,i,j ( θ, γ ) ≡ ∂ π j,i (cid:96) i,j ( θ, γ ) = y i,j p ,i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i p ,j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ p ,i,j ( θ, γ ) + p ,j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) s A k i,j ( θ, γ ) ≡ ∂ A k (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) { i = k } + s ,i,j ( θ, γ ) { j = k } s B k i,j ( θ, γ ) ≡ ∂ B k (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) { j = k } + s ,i,j ( θ, γ ) { i = k } s A i,j ( θ, γ ) ≡ ∂ A − (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) χ n,i, − + s ,i,j ( θ, γ ) χ n,j, − s B i,j ( θ, γ ) ≡ ∂ B (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) χ n,j + s ,i,j ( θ, γ ) χ n,i ξ i,j ( θ, γ ) ≡ ∂ θ P i,j ( θ, γ ) = (cid:2) p ,i,j ( θ, γ ) Z (cid:62) i,j + p ,i,j ( θ, γ ) Z (cid:62) j,i , p ,i,j ( θ, γ ) , − H ρ ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) (cid:3) (cid:62) θi,j ( θ, γ ) ≡ ∂ θ (cid:96) i,j ( θ, γ ) = y i,j ξ i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i ξ j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ ξ i,j ( θ, γ ) + ξ j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) , where χ n,i, − and χ n,j, − are ( n − × vectors defined by removing the first element of χ n,i and χ n,j , respectively.Using these notations, we can write S n,θ ( θ, γ ) ≡ ∂ θ L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i s θi,j ( θ, γ ) S n, γ ( θ, γ ) ≡ ∂ γ − L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i (cid:2) s A i,j ( θ, γ ) (cid:62) , s B i,j ( θ, γ ) (cid:62) (cid:3) (cid:62) . Further, writing (cid:96) i,j ( δ ) ≡ (cid:96) i,j (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ C A ,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ C B ,k } (cid:111)(cid:17) so that L n ( δ ) = N (cid:80) ni =1 (cid:80) j>i (cid:96) i,j ( δ ) ,we define s a k i,j ( δ ) ≡ ∂ a k (cid:96) i,j ( δ ) = s ,i,j ( δ ) { i ∈ C A ,k } + s ,i,j ( δ ) { j ∈ C A ,k } s b k i,j ( δ ) ≡ ∂ b k (cid:96) i,j ( δ ) = s ,i,j ( δ ) { j ∈ C B ,k } + s ,i,j ( δ ) { i ∈ C B ,k } s δi,j ( δ ) ≡ ∂ δ (cid:96) i,j ( δ ) = (cid:104) s θi,j ( δ ) (cid:62) , s a i,j ( δ ) , . . . , s a KA i,j ( δ ) , s b i,j ( δ ) , . . . , s b KB i,j ( δ ) (cid:105) (cid:62) , where the definitions of s ,i,j ( δ ) , s ,i,j ( δ ) , and s θi,j ( δ ) should be clear from the context. Hessian matrix
Define h ,i,j ( θ, γ ) ≡ ∂ π i,j s ,i,j ( θ, γ ) h ,i,j ( θ, γ ) ≡ ∂ π j,i s ,i,j ( θ, γ ) = ∂ π i,j s ,i,j ( θ, γ ) h ,i,j ( θ, γ ) ≡ ∂ π j,i s ,i,j ( θ, γ ) . It is easy to see that E h ,i,j ( θ , γ ) = − p ,i,j P i,j − p ,j,i P j,i − [ p ,i,j + p ,j,i ] − P i,j − P j,i E h ,i,j ( θ , γ ) = − p ,i,j p ,i,j P i,j − p ,j,i p ,j,i P j,i − [ p ,i,j + p ,j,i ][ p ,i,j + p ,j,i ]1 − P i,j − P j,i , (A.1)where we have used p ,i,j , p ,i,j , and P i,j to denote p ,i,j ( θ , γ ) , p ,i,j ( θ , γ ) , and P i,j ( θ , γ ) , respectively, forsimplicity. Hereinafter, when the dependence on the parameters ( θ, γ ) is suppressed, it means that the functions areevaluated at the true value ( θ , γ ) .Note that, since (cid:96) i,j ( θ, γ ) = (cid:96) j,i ( θ, γ ) , we have h ,i,j ( θ, γ ) = ∂ π i,j π i,j (cid:96) i,j ( θ, γ ) = ∂ π i,j π i,j (cid:96) j,i ( θ, γ ) = h ,j,i ( θ, γ ) and h ,i,j ( θ, γ ) = h ,j,i ( θ, γ ) . By tedious calculations, we have ∂ A l A k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,k,j ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l, k ≥ ∂ B l B k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,j,k ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l, k ≥ A l B k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,k,j ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l ≥ , k ≥ . Hence, H n, AA ( θ, γ ) ≡ ∂ A − A (cid:62)− L n ( θ, γ ) = 2 N (cid:80) j (cid:54) =2 h , ,j ( θ, γ ) · · · h , ,n ( θ, γ ) ... . . . ... h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,n,j ( θ, γ ) H n, BB ( θ, γ ) ≡ ∂ BB (cid:62) L n ( θ, γ ) = 2 N (cid:80) j (cid:54) =1 h ,j, ( θ, γ ) · · · h , ,n ( θ, γ ) ... . . . ... h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,j,n ( θ, γ ) H n, AB ( θ, γ ) ≡ ∂ A − B (cid:62) L n ( θ, γ ) = 2 N h , , ( θ, γ ) (cid:80) j (cid:54) =2 h , ,j ( θ, γ ) · · · h , ,n ( θ, γ ) ... ... . . . ... h ,n, ( θ, γ ) h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,n,j ( θ, γ ) H n, γγ ( θ, γ ) = (cid:32) H n, AA ( θ, γ ) H n, AB ( θ, γ ) H n, BA ( θ, γ ) H n, BB ( θ, γ ) (cid:33) (A.2) B Proofs of Theorems
Proof of Theorem 2.3 (i) We first confirm that the true parameter vector ( θ , γ ) is a maximizer of E L n ( θ, γ ) . We can observe that E L n ( θ, γ ) − E L n ( θ , γ ) = 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i E (cid:8) ln (cid:2) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i (cid:3) − ln (cid:2) P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:3)(cid:9) = 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i E (cid:40) ln P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) ≤ N n (cid:88) i =1 (cid:88) j (cid:54) = i ln E (cid:40) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) , (B.1)where the last inequality follows from Jensen’s inequality. Further, E (cid:40) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) = E [ y i,j ] P i,j ( θ, γ ) P i,j + E [ y j,i ] P j,i ( θ, γ ) P j,i + E [1 − y i,j − y j,i ] 1 − P i,j ( θ, γ ) − P j,i ( θ, γ )1 − P i,j − P j,i = 1 , implying that the left-hand side term of (B.1) is less than or at most equal to zero for any given ( θ, γ ) . Then, since ρ s known, it is sufficient to show that N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:8) P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) (cid:54) = P i,j ( θ , γ ) (cid:9) > for sufficiently large n and for all ( β, α, γ ) ∈ B × A × C n such that ( β, α, γ ) (cid:54) = ( β , α , γ ) . The existence of pairssatisfying P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) (cid:54) = P i,j ( θ , γ ) contributes to a non-negligible difference between E L n ( θ, γ ) and E L n ( θ , γ ) , allowing us to distinguish ( θ, γ ) and ( θ , γ ) . Here, by Assumptions 2.1(i) and (ii), F ( a ) − H ( a, b ; ρ ) is strictly increasing in a and decreasing in b , respectively. Therefore, we have W (cid:62) i,j Π > W (cid:62) i,j Π , W (cid:62) j,i Π + α < W (cid:62) j,i Π + α = ⇒ P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) > P i,j ( θ , γ ) W (cid:62) i,j Π < W (cid:62) i,j Π , W (cid:62) j,i Π + α > W (cid:62) j,i Π + α = ⇒ P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) < P i,j ( θ , γ ) . Then, Assumption 2.4 gives the desired result.(ii) Since ( θ , γ ) is a maximizer of E L n ( θ, γ ) as confirmed above, ρ must be a maximizer of L ∗ n ( ρ ) , wherethe definition of L ∗ n ( ρ ) can be found in (2.5), and ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) = ( β , α , γ ) holds. For all ρ ∈ R , E L n (( β (cid:62) , α, ρ ) (cid:62) , γ ) − E L n (( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , ρ ) (cid:62) , (cid:101) γ ( ρ )) ≤ holds by definition. Then, by the same argument as inthe proof of (i), we can identify ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) uniquely for all ρ ∈ R as Assumption 2.4 is independent of thevalue of ρ . Thus, if ρ is identified as a unique maximizer of L ∗ n ( ρ ) , all the parameters of the model are identified. Proof of Theorem 3.1 (i) First, note that Assumptions 2.1–2.3 imply that there exist constants κ , κ ∈ (0 , such that P i,j ( θ, γ ) ∈ ( κ , − κ ) and − P i,j ( θ, γ ) − P j,i ( θ, γ ) ∈ ( κ , − κ ) for all possible parameter values. Observe that L n ( θ, γ ) − E L n ( θ, γ )= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [2( y i,j − E y i,j ) ln P i,j ( θ, γ ) − y i,j − E y i,j ) ln (1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 2 N n (cid:88) i =1 (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) , where ψ i,j ( θ, γ ) ≡ ln [ P i,j ( θ, γ ) / (1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))] . Let ¯ ψ ≡ ln((1 − κ ) /κ ) , so that − (1 − κ ) ¯ ψ < ( y i,j − E y i,j ) ψ i,j ( θ, γ ) < (1 − κ ) ¯ ψ, where the inequalities are uniform in ( θ, γ ) ∈ Θ × C n . By the triangle inequality, |L n ( θ, γ ) − E L n ( θ, γ ) | ≤ n n (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Further, by Hoeffding’s inequality, Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t ≤ (cid:32) − n − t (cid:80) j (cid:54) = i (2(1 − κ ) ¯ ψ ) (cid:33) (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) . Hence, Boole’s inequality gives Pr max ≤ i ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t ≤ n exp (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) . Setting t = C (cid:112) ln n/n for a sufficiently large constant C > , we have n exp (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) = 2 n exp (cid:18) − n − − κ ) ¯ ψ C ln nn (cid:19) = 2 exp (cid:18) ln n − (cid:18) C ( n − /n − κ ) ¯ ψ (cid:19) ln n (cid:19) → as n → ∞ . This implies that sup ( θ, γ ) ∈ Θ × C n |L n ( θ, γ ) − E L n ( θ, γ ) | = O P (cid:32)(cid:114) ln nn (cid:33) . (B.2)Then, with Assumption 3.1, the rest of the proof follows from the same argument as in the proof of Theorem 2 inGraham (2017).(ii) (iii) We prove the result by contradiction. Suppose that there exists a positive constant c such that max (cid:40) n n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) A n,i − A ,i (cid:12)(cid:12)(cid:12) , n n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) B n,i − B ,i (cid:12)(cid:12)(cid:12)(cid:41) ≥ c > w.p.a.1. Under Assumption 2.2(ii), this implies that there is a non-vanishing potion of observations with either or both (cid:98) A n,i and (cid:98) B n,i being not in the neighborhood of A ,i and B ,i , respectively. Therefore, by Assumption 3.1, there exista constant η ( c ) > and n ( c ) < ∞ such that η ( c ) < E L n ( θ , A , B ) − E L n ( θ , (cid:98) A n , (cid:98) B n ) (B.3)for all n ≥ n ( c ) . Note that (B.2) implies that E L n ( θ , A , B ) < L n ( θ , A , B ) + η ( c ) (B.4)w.p.a.1. By the definition of the ML estimator, L n ( θ , A , B ) < L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) + η ( c ) . (B.5)In addition, by the continuous mapping theorem and result (i), we have L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) < L n ( θ , (cid:98) A n , (cid:98) B n ) + η ( c ) (B.6) .p.a.1. Now, combining the inequalities (B.3)–(B.6) gives E L n ( θ , (cid:98) A n , (cid:98) B n ) < E L n ( θ , A , B ) − η ( c ) < L n ( θ , A , B ) − η ( c ) < L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) − η ( c ) < L n ( θ , (cid:98) A n , (cid:98) B n ) − η ( c ) w.p.a.1. The last line implies that η ( c ) < L n ( θ , (cid:98) A n , (cid:98) B n ) − E L n ( θ , (cid:98) A n , (cid:98) B n ) w.p.a.1; however, this contradicts with(B.2). Hence, as the choice of c is arbitrary, we obtain the desired result.(iv) Note that, for each i ( i (cid:54) = 1 ), it holds that (cid:98) A n,i = argmax A i ∈ A L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) , A ,i = argmax A i ∈ A E L n ( θ , A i , A , − i , B ) , where A − i ≡ ( A , . . . , A i − , A i +1 , . . . , A n ) (cid:62) , and L n ( θ, A i , A − i , B ) = L n ( θ, A , B ) . Pick any c > , and let A ci ≡ { A ∈ A : | A − A ,i | ≥ c } . Define ε n ( c ) as follows: ε n ( c ) ≡ min ≤ i ≤ n (cid:20) E L n ( θ , A ,i , A , − i , B ) − max A i ∈ A ci E L n ( θ , A i , A , − i , B ) (cid:21) . By Assumption 3.1, there exists n ( c ) < ∞ such that ε n ( c ) is strictly larger than zero for all n ≥ n ( c ) . By the definitionof (cid:98) A n,i , we have L n ( (cid:98) θ n , (cid:98) A n,i , (cid:98) A n, − i , (cid:98) B n ) > L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / . (B.7)By the triangle inequality, (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , (cid:98) A n, − i , (cid:98) B n ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) L n ( θ , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , (cid:98) B n ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) L n ( θ , A i , A , − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ∂ A (cid:62)− i L n ( θ , A i , ¯ A n, − i , (cid:98) B n )[ (cid:98) A n, − i − A , − i ] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ B (cid:62) L n ( θ , A i , A , − i , ¯ B n )[ (cid:98) B n − B ] (cid:12)(cid:12)(cid:12) + o P (1) , where the second inequality follows from the mean value expansion with result (i), ¯ A n, − i ∈ [ (cid:98) A n, − i , A , − i ] , and B n ∈ [ (cid:98) B n , B ] . Here, the first term on the right-hand side has the following form: ∂ A (cid:62)− k L n ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i ∂ A (cid:62)− k (cid:96) i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n ) χ (cid:62) n,i, − k [ (cid:98) A n, − k − A , − k ] + 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n ) χ (cid:62) n,j, − k [ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,i − A ,i ] { i (cid:54) = k } + 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,j − A ,j ] { j (cid:54) = k } = 2 N n (cid:88) i =1 (cid:88) j (cid:54) = i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,i − A ,i ] { i (cid:54) = k } , where χ n,i, − k and χ n,j, − k are ( n − × vectors defined by removing the k -th element of χ n,i and χ n,j , respectively,and the last equality holds because s ,i,j ( θ, A , B ) = ∂ π j,i (cid:96) i,j ( θ, A , B ) = ∂ π j,i (cid:96) j,i ( θ, A , B ) = s ,j,i ( θ, A , B ) . Then,for a constant c > independent of A k and k , we have (cid:12)(cid:12)(cid:12) ∂ A (cid:62)− k L n ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ] (cid:12)(cid:12)(cid:12) ≤ cn n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) A n,i − A ,i (cid:12)(cid:12)(cid:12) = o P (1) by result (ii). Based on the same argument, we can also show that (cid:12)(cid:12)(cid:12) ∂ B (cid:62) L n ( θ , A i , A , − i , ¯ B n )[ (cid:98) B n − B ] (cid:12)(cid:12)(cid:12) = o P (1) ,implying that (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) = o P (1) uniformly in A i ∈ A and i . Similarly, we can show that (cid:12)(cid:12)(cid:12) E L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − E L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) = o P (1) . Hence, the following inequalities hold w.p.a.1: L n ( θ , A i , A , − i , B ) > L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / (B.8) E L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) > E L n ( θ , A i , A , − i , B ) − ε n ( c ) / (B.9)uniformly in A i ∈ A and i . In addition, (B.2) implies that E L n ( θ , (cid:98) A n,i , A , − i , B ) > L n ( θ , (cid:98) A n,i , A , − i , B ) − ε n ( c ) / (B.10) L n ( (cid:98) θ n , A ,i , (cid:98) A , − i , (cid:98) B n ) > E L n ( (cid:98) θ n , A ,i , (cid:98) A , − i , (cid:98) B n ) − ε n ( c ) / (B.11)w.p.a.1. Then, combining the inequalities (B.7) and (B.8)–(B.11) yields E L n ( θ , (cid:98) A n,i , A , − i , B ) > L n ( θ , (cid:98) A n,i , A , − i , B ) − ε n ( c ) / > L n ( (cid:98) θ n , (cid:98) A n,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / > L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / > E L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / E L n ( θ , A ,i , A , − i , B ) − ε n ( c )= max A i ∈ A ci E L n ( θ , A i , A , − i , B )+ (cid:20) E L n ( θ , A ,i , A , − i , B ) − max A i ∈ A ci E L n ( θ , A i , A , − i , B ) (cid:21) − ε n ( c ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ ≥ max A i ∈ A ci E L n ( θ , A i , A , − i , B ) w.p.a.1 for all i . The last line implies that (cid:98) A n,i / ∈ A ci . As the choice of c is arbitrary, this further implies that max ≤ i ≤ n | (cid:98) A n,i − A ,i | p → . Analogously, we can also show that max ≤ i ≤ n | (cid:98) B n,i − B ,i | p → . Lemma B.1.
For any ( θ, γ ) ∈ Θ × C n such that || θ − θ || = o (1) and || γ − γ || ∞ = o (1) ,(i) max ≤ k ≤ n (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,k,j ( θ, γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12) = o P (1) ,(ii) max ≤ k ≤ n (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,j,k ( θ, γ ) − E h ,j,k ( θ , γ )) (cid:12)(cid:12)(cid:12) = o P (1) .Proof. We only prove (i) since (ii) is completely analogous. By the triangle inequality, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ , γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (B.12)With Assumption 2.1(iii), the mean value expansion gives h ,k,j ( θ, γ ) − h ,k,j ( θ , γ ) = h ,k,j ( θ, γ ) − h ,k,j ( θ , γ ) + h ,k,j ( θ , γ ) − h ,k,j ( θ , γ )= ∂ θ (cid:62) h ,k,j (¯ θ, γ )[ θ − θ ] + ∂ A (cid:62)− h ,k,j ( θ, ¯ γ )[ A − − A , − ] + ∂ B (cid:62) h ,k,j ( θ, ¯ γ )[ B − B ] , where ¯ θ ∈ [ θ, θ ] , and ¯ γ ∈ [ γ , γ ] . Further, letting h ,k,j ( θ, γ ) ≡ ∂ π k,j h ,k,j ( θ, γ ) and h ,k,j ( θ, A ) ≡ ∂ π j,k h ,k,j ( θ, γ ) , we have ∂ A (cid:62)− h ,k,j ( θ, ¯ γ )[ A − − A , − ] = h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k, − [ A − − A , − ] + h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j, − [ A − − A , − ] ∂ B (cid:62) h ,k,j ( θ, ¯ γ )[ B − B ] = h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j [ B − B ] + h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k [ B − B ] . Then, for some large constant c > , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ∂ θ (cid:62) h ,k,j (¯ θ, γ )[ θ − θ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k, − [ A − − A , − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j, − [ A − − A , − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j [ B − B ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k [ B − B ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:107) θ − θ (cid:107) + 2 c (cid:107) A − A (cid:107) ∞ + 2 c (cid:107) B − B (cid:107) ∞ . As the right-hand side term in the last line is independent of k , we have (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12) = o (1) for all k for any ( θ, γ ) such that || θ − θ || = o (1) and || γ − γ || ∞ = o (1) .For the second term of (B.12), note that the random components involved in h ,k,j ( θ , γ ) are only ( y k,j , y j,k ) and, thus, that { h ,k,j ( θ , γ ) } j (cid:54) = k are independent by Assumption 2.1(ii). Further, as h ,k,j ( θ , γ ) is uniformlybounded, using Hoeffding’s and Boole’s inequalities similarly as above, we can show that max ≤ k ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ , γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:114) ln nn (cid:33) . This completes the proof.
Lemma B.2. (i) (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ = o P (1) for any θ ∈ Θ such that || θ − θ || = o (1) ,(ii) n (cid:80) ni =1 (cid:12)(cid:12)(cid:12) (cid:101) A n,i ( θ ) − (cid:101) A ,i ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) ,(iii) n (cid:80) ni =1 (cid:12)(cid:12)(cid:12) (cid:101) B n,i ( θ ) − (cid:101) B ,i ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) ,(iv) (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ = O P ( (cid:112) ln n/n ) .Proof. (i) By the triangle inequality, (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ ≤ (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ + (cid:107) (cid:101) γ ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ . For the first term on the right-hand side, the same argument as in the proof of Theorem 3.1(iv) achieves || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || ∞ = o P (1) for any θ in the neighborhood of θ under Assumption 3.2. For the second term, Assumption 3.2and Berge’s theorem implies that every element of (cid:101) γ ( θ ) is continuous in the neighborhood of θ (see, e.g., CorollaryA4.8, Kreps, 2012). Thus, || (cid:101) γ ( θ ) − (cid:101) γ ( θ ) || ∞ = o (1) holds.(ii) (iii) By the first-order condition and the mean value expansion, (2 n − × = n · S n, γ ( θ , (cid:101) γ n ( θ ))= n · S n, γ − ( − n · H n, γγ ( θ , ¯ γ n )) [ (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ )] , where ¯ γ n ∈ [ (cid:101) γ n ( θ ) , (cid:101) γ ( θ )] . Then, by result (i) and Assumption 3.3(i), we have (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ ) = ( − n · H n, γγ ( θ , ¯ γ n )) − n · S n, γ (B.13)w.p.a.1; thus, || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = || (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ ) || ≤ O P (1) · || n · S n, γ || . urther, observe that E || n · S n, γ || = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) s A i,j ( θ , γ ) (cid:62) s A k,l ( θ , γ ) + s B i,j ( θ , γ ) (cid:62) s B k,l ( θ , γ ) (cid:3) = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,k,l χ n,k, − + s ,k,l χ n,l, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,k,l χ n,l + s ,k,l χ n,k ) (cid:3) = 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,i,j χ n,i, − + s ,i,j χ n,j, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,j,i χ n,j, − + s ,j,i χ n,i, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,i,j χ n,j + s ,i,j χ n,i ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,j,i χ n,i + s ,j,i χ n,j ) (cid:3) = O (1) by Assumption 2.1(ii). Then, it holds that || n · S n, γ || = O P (1) by Markov’s inequality; thus, we have || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = O P (1) . Finally, by the basic norm inequality, it holds that n (cid:88) i =1 | (cid:101) A n,i ( θ ) − (cid:101) A ,i ( θ ) | + n (cid:88) i =1 | (cid:101) B n,i ( θ ) − (cid:101) B ,i ( θ ) | ≤ √ n − || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = O P ( √ n ) , which gives the desired result.(iv) Let γ − k = ( A (cid:62)− k , B (cid:62) ) (cid:62) for k (cid:54) = 1 and write L n ( θ, A k , γ − k ) as L n ( θ, γ ) . By the first-order condition andmean value expansion, n · ∂ A k L n ( θ , (cid:101) A n,k ( θ ) , (cid:101) γ n, − k ( θ ))= 2 n − n (cid:88) i =1 (cid:88) j>i s A k i,j + n · ∂ A k L n ( θ , (cid:101) A n,k ( θ ) , (cid:101) γ n, − k ( θ )) − n · ∂ A k L n ( θ , (cid:101) A ,k ( θ ) , (cid:101) γ n, − k ( θ ))+ n · ∂ A k L n ( θ , A ,k , (cid:101) γ n, − k ( θ )) − n · ∂ A k L n ( θ , A ,k , (cid:101) γ , − k ( θ ))= 2 n − n (cid:88) i =1 (cid:88) j>i s A k i,j + 2 n − (cid:88) j (cid:54) = k h ,k,j ( θ , ¯ A n,k , (cid:101) γ n, − k ( θ ))[ (cid:101) A n,k ( θ ) − (cid:101) A ,k ( θ )]+ 2 n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) γ n, − k ( θ ) − (cid:101) γ , − k ( θ )] , where ¯ A n,k ∈ [ (cid:101) A n,k ( θ ) , (cid:101) A ,k ( θ )] , and ¯ γ n, − k ∈ [ (cid:101) γ n, − k ( θ ) , (cid:101) γ , − k ( θ )] . In view of (A.1), Lemma B.1(i), and result(i) imply that n − (cid:80) j (cid:54) = k h ,k,j ( θ , ¯ A n,k , (cid:101) γ n, − k ( θ )) is bounded and away from zero w.p.a.1 uniformly in k . Then, (cid:12)(cid:12)(cid:12) (cid:101) A n,k ( θ ) − (cid:101) A ,k ( θ ) (cid:12)(cid:12)(cid:12) ≤ ( c + o p (1)) {| T ,n,k | + | T ,n,k |} or some c > , where T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i s A k i,j ,T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) γ n, − k ( θ ) − (cid:101) γ , − k ( θ )]= 2 n − n (cid:88) i =1 (cid:88) j>i ∂ A (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) A n, − k ( θ ) − (cid:101) A , − k ( θ )]+ 2 n − n (cid:88) i =1 (cid:88) j>i ∂ B (cid:62) s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) B n ( θ ) − (cid:101) B ( θ )] ≡ T ,n,k + T ,n,k , say.First, observe that T ,n,k = 2 n − n (cid:88) i =1 (cid:88) j>i s ,i,j { i = k } + 2 n − n (cid:88) i =1 (cid:88) j>i s ,i,j { j = k } = 2 n − (cid:88) j>k s ,k,j + 2 n − (cid:88) j
To simplify the discussion, we focus on the estimation of C A with K A = 3 only. The other cases can be provedanalogously. In addition, for notational simplicity, we omit the superscript A in this proof.Now, let u n,i ≡ (cid:98) A n,i − A ,i for i = 1 , . . . , n. In particular, u n, = 0 holds by the normalization. In accordance with the ordering (cid:98) A n, (1) ≤ · · · ≤ (cid:98) A n, ( n ) , wepermutate A ,i ’s and obtain { A , ( i ) } . By Theorem 3.2(iii), we have max ≤ i ≤ n | u n,i | = O P ( (cid:112) ln n/n ) . Hence, .p.a.1, the sequence { A , ( i ) } contains two “true” break points ( t , t ) ≡ ( t n, , t n, ) in the following manner: A , ( i ) = a , if ≤ i ≤ t a , if t + 1 ≤ i ≤ t a , if t + 1 ≤ i ≤ n. We can assume, without loss of generality, that (cid:98) S ,n ( t ) < (cid:98) S ,n ( t ) . If (cid:98) S ,n ( t ) > (cid:98) S ,n ( t ) , by reversing the orderof { (cid:98) A n, ( i ) } and re-labeling the break points appropriately, we can prove the theorem completely analogously. Recallthat (cid:98) t = argmin ≤ κ
C.1 Explicit form of ∂ L ∗ n ( ρ ) / ( ∂ρ ) For notational simplicity, let C = ( β (cid:62) , α, γ (cid:62) ) (cid:62) and (cid:101) C ( ρ ) = ( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ ) (cid:62) ) (cid:62) , and write L n ( θ, γ ) equivalently as L n ( ρ, C ) , so that (cid:101) C ( ρ ) = argmax C ∈B×A× C n L n ( ρ, C ) and L ∗ n ( ρ ) = E L n ( ρ, (cid:101) C ( ρ )) . By the chainrule, ∂ ρ L ∗ n ( ρ ) = ∂ ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ ) = ∂ ρ E L n ( ρ, (cid:101) C ( ρ )) , where the second equality holds for any ρ ∈ R by the first-order condition for (cid:101) C ( ρ ) , and we have denoted ∂ ρ to standfor the partial derivative with respect to the “first” ρ . Then, we also have ∂ ρρ L ∗ n ( ρ ) = ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ ρ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ ) . In addition, applying the implicit function theorem to ∂ C E L n ( ρ, (cid:101) C ( ρ )) = ( d z +2 n ) × for ρ ∈ R yields ( d z +2 n ) × = ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) = ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ )= ⇒ ∂ ρ (cid:101) C ( ρ ) = − (cid:104) ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:105) − ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) . herefore, we obtain ∂ ρρ L ∗ n ( ρ ) = ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) − ∂ ρ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:104) ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:105) − ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) . Note that since ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) is negative semidefinite by the second-order condition of maximization, the secondterm on the right-hand side is non-positive. Thus, to ensure that ∂ ρρ L ∗ n ( ρ ) is strictly negative, ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) must also be negative and larger in magnitude than the second term. C.2 Supplementary tables for Section 5
Table C.1: In-degree and Out-degreeCountry In-degree Out-degreeArmenia 21 46 (cont’) (cont’)Australia 43 6 Japan 48 14 Papua New Guinea 18 21Azerbaijan 24 24 Jordan 13 41 Philippines 22 38Bahrain 33 22 Kazakhstan 29 34 Qatar 29 31Bangladesh 6 39 Kiribati 22 15 Russia 37 18Belarus 26 33 Kuwait 35 23 Saudi Arabia 31 19Bhutan 15 2 Kyrgyzstan 24 33 Singapore 47 34Brunei 43 26 Laos 17 49 South Korea 49 25Cambodia 15 54 Latvia 40 19 Sri Lanka 7 53China 24 9 Lebanon 13 34 Tajikistan 23 36Cyprus 41 19 Lithuania 31 18 Thailand 31 37Estonia 42 19 Malaysia 46 46 Tonga 23 21Fiji 23 29 Moldova 26 28 Turkey 35 27Georgia 29 32 Mongolia 21 20 UAE 44 20Hong Kong 43 33 Myanmar 13 16 Ukraine 36 29India 13 3 Nauru 21 7 Uzbekistan 22 28Indonesia 25 51 Nepal 9 55 Vanuatu 27 33Iran 10 48 New Zealand 45 18 Viet Nam 16 14Iraq 5 5 Oman 34 5 Yemen 7 9Israel 33 21 Pakistan 5 21Table C.2: Summary StatisticsMean Std. Dev. Min. Max. gdp_pc i free i export ij import ij K B K A References
Abbe, E., 2017. Community detection and stochastic block models: recent developments,
The Journal of MachineLearning Research , 18 (1), 6446–6531.Amemiya, T., 1985.
Advanced Econometrics , Harvard university press.Aradillas-Lopez, A. and Rosen, A.M., 2019. Inference in ordered response games with complete information,
Workingpaper .Bai, J., 1997. Estimating multiple breaks one at a time,
Econometric Theory , 315–352.Berry, S.T., 1992. Estimation of a model of entry in the airline industry,
Econometrica , 60 (4), 889–917.Bickel, P.J. and Chen, A., 2009. A nonparametric view of network models and Newman–Girvan and other modularities,
Proceedings of the National Academy of Sciences , 106 (50), 21068–21073.Billingsley, P., 2012.
Probability and Measure: Anniversary Edition , John Wiley & Sons.Bjorn, P.A. and Vuong, Q.H., 1984. Simultaneous equations models for dummy endogenous variables: a game theoreticformulation with an application to labor force participation,
Working paper .Bonhomme, S., 2020. Econometric analysis of bipartite networks, in:
B.S. Graham and Á. de Paula, eds.,
TheEconometric Analysis of Network Data , Elsevier, 83–121.Bonhomme, S., Lamadon, T., and Manresa, E., 2017. Discretizing unobserved heterogeneity,
University of Chicago,Working paper .Bonhomme, S. and Manresa, E., 2015. Grouped patterns of heterogeneity in panel data,
Econometrica , 83 (3),1147–1184.Bresnahan, T.F. and Reiss, P.C., 1990. Entry in monopoly market,
The Review of Economic Studies , 57 (4), 531–553.Chandrasekhar, A.G., 2016. Econometrics of network formation, in:
Y. Bramoullé, A. Galeotti, and B. Rogers, eds.,
The Oxford Handbook of the Economics of Networks , Oxford University Press, 303–357.Chesher, A. and Rosen, A.M., 2020. Structural modeling of simultaneous discrete choice, cemmap Working paper ,CWP9/20. iliberto, F. and Tamer, E., 2009. Market structure and multiple equilibria in airline markets, Econometrica , 77 (6),1791–1828.de Paula, Á., 2013. Econometric analysis of games with multiple equilibria,
Annual Review of Economics , 5, 107–131.de Paula, Á., 2020. Econometric models of network formation,
Annual Review of Economics , 12, 775–799.Dzemski, A., 2019. An empirical model of dyadic link formation in a network with unobserved heterogeneity,
Reviewof Economics and Statistics , 101 (5), 763–776.Fortunato, S., 2010. Community detection in graphs,
Physics Reports , 486 (3-5), 75–174.Graham, B.S., 2016. Homophily and transitivity in dynamic network formation,
NBER Working Paper , (22186).Graham, B.S., 2017. An econometric model of network formation with degree heterogeneity,
Econometrica , 85 (4),1033–1063.Graham, B.S. and Pelican, A., 2020. Testing for externalities in network formation using simulation, in:
B.S. Grahamand Á. de Paula, eds.,
The Econometric Analysis of Network Data , Elsevier, 63–82.Hoff, P.D., Raftery, A.E., and Handcock, M.S., 2002. Latent space approaches to social network analysis,
Journal ofthe American Statistical association , 97 (460), 1090–1098.Hoshino, T. and Yanagi, T., 2020. Treatment effect models with strategic interaction in treatment decisions, arXiv ,(1810.08350).Jochmans, K., 2018. Semiparametric analysis of network formation,
Journal of Business & Economic Statistics , 36 (4),705–713.Karrer, B. and Newman, M.E., 2011. Stochastic blockmodels and community structure in networks,
Physical ReviewE , 83 (1), 016107.Ke, Y., Li, J., Zhang, W., et al. , 2016. Structure identification in panel data analysis,
The Annals of Statistics , 44 (3),1193–1233.Khan, S. and Nekipelov, D., 2018. Information structure and statistical information in discrete response models,
Quantitative Economics , 9 (2), 995–1017.Kline, B., 2015. Identification of complete information games,
Journal of Econometrics , 189 (1), 117–131.Kline, B., 2016. The empirical content of games with bounded regressors,
Quantitative Economics , 7 (1), 37–81.Kreps, D.M., 2012.
Microeconomic Foundations I: Choice and Competitive Markets , Princeton University Press.Leung, M.P., 2015. Two-step estimation of network-formation models with incomplete information,
Journal of Econo-metrics , 188 (1), 182–195.Lewbel, A., 2007. Coherency and completeness of structural models containing a dummy endogenous variable,
International Economic Review , 48 (4), 1379–1392. ian, H., Qiao, X., and Zhang, W., 2019. Homogeneity pursuit in single index models based panel data analysis, Journal of Business & Economic Statistics , in press.Liu, R., Shang, Z., Zhang, Y., and Zhou, Q., 2020. Identification and estimation in panel models with overspecifiednumber of groups,
Journal of Econometrics , 215 (2), 574–590.McKay, A. and Tekleselassie, T.G., 2018. Tall paper walls: The political economy of visas and cross-border travel,
TheWorld Economy , 41 (11), 2914–2933.Mele, A., 2017. A structural model of dense network formation,
Econometrica , 85 (3), 825–850.Neiman, B. and Swagel, P., 2009. The impact of post-9/11 visa policies on travel to the united states,
Journal ofInternational Economics , 78 (1), 86–99.Neumayer, E., 2010. Visa restrictions and bilateral travel,
The Professional Geographer , 62 (2), 171–181.Newman, M.E. and Girvan, M., 2004. Finding and evaluating community structure in networks,
Physical Review E ,69 (2), 026113.Okui, R. and Wang, W., 2020. Heterogeneous structural breaks in panel data models,
Journal of Econometrics .Pelican, A. and Graham, B.S., 2020. An optimal test for strategic interaction in social and economic network formationbetween heterogeneous agents,
NBER Working Paper , 27793.Rohe, K., Chatterjee, S., Yu, B., et al. , 2011. Spectral clustering and the high-dimensional stochastic blockmodel,
TheAnnals of Statistics , 39 (4), 1878–1915.Rothenberg, T.J., 1971. Identification in parametric models,
Econometrica , 577–591.Sheng, S., 2020. A structural econometric analysis of network formation games through subnetworks,
Econometrica ,88 (5), 1829–1858.Su, L., Shi, Z., and Phillips, P.C., 2016. Identifying latent structures in panel data,
Econometrica , 84 (6), 2215–2264.Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria,
The Review of EconomicStudies , 70 (1), 147–165.Wang, W., Phillips, P.C., and Su, L., 2018. Homogeneity pursuit in panel data models: Theory and application,
Journalof Applied Econometrics , 33 (6), 797–815.Wang, W. and Su, L., 2020. Identifying latent group structures in nonlinear panels,
Journal of Econometrics .Yan, T., Jiang, B., Fienberg, S.E., and Leng, C., 2019. Statistical inference in a directed network model with covariates,
Journal of the American Statistical Association , 114 (526), 857–868.Yan, T., Leng, C., and Zhu, J., 2016. Asymptotics in directed exponential random graph models with an increasingbi-degree sequence,
The Annals of Statistics , 44 (1), 31–57.Zhang, Y., Chen, K., Sampson, A., Hwang, K., and Luna, B., 2019. Node features adjusted stochastic block model,
Journal of Computational and Graphical Statistics , 28 (2), 362–373.Zhou, Y., 2019. Identification and estimation of entry games under symmetry of unobservables,
SSRN Working paper ..