[PDF] A Pairwise Strategic Network Formation Model with Group Heterogeneity: With an Application to International Travel

Abstract

In this study, we consider a pairwise network formation model in which each dyad of agents strategically determines the link status between them. Our model allows the agents to have unobserved group heterogeneity in the propensity of link formation. For the model estimation, we propose a three-step maximum likelihood (ML) method. First, we obtain consistent estimates for the heterogeneity parameters at individual level using the ML estimator. Second, we estimate the latent group structure using the binary segmentation algorithm based on the results obtained from the first step. Finally, based on the estimated group membership, we re-execute the ML estimation. Under certain regularity conditions, we show that the proposed estimator is asymptotically unbiased and distributed as normal at the parametric rate. As an empirical illustration, we focus on the network data of international visa-free travels. The results indicate the presence of significant strategic complementarity and a certain level of degree heterogeneity in the network formation behavior.

Full PDF

AA Pairwise Strategic Network Formation Model with GroupHeterogeneity: With an Application to International Travel

Tadao Hoshino ∗ February 5, 2021

Abstract

In this study, we consider a pairwise network formation model in which each dyad of agents strategi-cally determines the link status between them. Our model allows the agents to have unobserved groupheterogeneity in the propensity of link formation. For the model estimation, we propose a three-stepmaximum likelihood (ML) method. First, we obtain consistent estimates for the heterogeneity parametersat individual level using the ML estimator. Second, we estimate the latent group structure using the binarysegmentation algorithm based on the results obtained from the ﬁrst step. Finally, based on the estimatedgroup membership, we re-execute the ML estimation. Under certain regularity conditions, we show thatthe proposed estimator is asymptotically unbiased and distributed as normal at the parametric rate. As anempirical illustration, we focus on the network data of international visa-free travels. The results indicatethe presence of signiﬁcant strategic complementarity and a certain level of degree heterogeneity in thenetwork formation behavior.

Keywords : binary game; binary segmentation; degree heterogeneity; latent group structure; networkformation. ∗ School of Political Science and Economics, Waseda University, 1-6-1 Nishi-waseda, Shinjuku-ku, Tokyo 169-8050, Japan.Email: [email protected]. This work is supported ﬁnancially by JSPS Grant-in-Aid for Scientiﬁc Research C-20K01597. a r X i v : . [ ec on . E M ] F e b Introduction

Empirical modeling of network formation is an important research topic that has been studied for severaldecades. While most of these models has been developed in the mathematical statistics literature, as theimportance of network structure in many economic activities has been increasingly recognized, there iscurrently a growing number of econometric studies that focus on network formation in conjunction with thesigniﬁcant advancement in the related econometric techniques. Econometric studies on network formation can be classiﬁed into two types: those that attempt toexplicitly incorporate the interaction of individuals in the realizing network structure endogenously aﬀectingthe network formation behavior (e.g., Leung, 2015; Mele, 2017; Sheng, 2020) and those that do not accountfor such simultaneous interactions but emphasize modeling a ﬂexible form of individual heterogeneity (e.g.,Graham, 2017; Jochmans, 2018; Dzemski, 2019). For the former type, network formation is modeled asa game in which agents strategically form links to maximize their payoﬀs. Although this game-theoreticapproach is (economic) theoretically well-underpinned, we often encounter serious analytical diﬃculties dueto the presence of multiple equilibria. To circumvent these diﬃculties, we typically need to introduce some adhoc behavioral assumptions into the network formation process, or we simply discontinue point-identifyingthe models and resort to partial identiﬁcation. Compared with the former, the latter is more “descriptive”than “structural”, but has great ﬂexibility in the model speciﬁcation. These models are relatively easy toimplement and, thus, are appealing to empirical researchers. However, they are not suitable for analyzingthe interactions of agents in network formation, which should be an essential factor in economic and socialnetwork data.These two types of econometric models have their own advantages. Hence, it is an ingenuous idea toconstruct a new model that has the advantages of both approaches by combining them. However, to mybest knowledge, there are only a few papers that address this way of model extension (e.g., Graham, 2016;Graham and Pelican, 2020; Pelican and Graham, 2020). Graham (2016) gets around the multiple equilibriaissue by considering the “dynamic” (rather than the instantaneous) interdependencies in network formation.The latter two papers consider a quite general framework that incorporates both a general form of strategicinteraction and unobserved degree heterogeneity into a single model. However, they mainly focus on testingthe presence of interactions, but not on the estimation of the models.In this paper, we propose a new “pairwise” network formation model that is empirically tractable whileretaining the nice properties of the above-mentioned approaches. More speciﬁcally, we assume that eachlink connection is determined by the strategic interaction solely between the corresponding dyad of agents,without aﬀecting or being aﬀected by other dyads, rather than regarding the realized network as a consequenceof a large n -player game. Although ignoring such network externalities limits the range of applications ofour model, it would still cover a fairly large number of interesting empirical situations, and, most importantly, For recent developments regarding econometric approaches for analyzing network formation, we refer readers to, for example,Chandrasekhar (2016) and de Paula (2020). et al. (2019), wetreat the agent-speciﬁc preference heterogeneity parameters as the ﬁxed-eﬀect parameters to be estimated.However, note that if we include these heterogeneity parameters directly into the model, as the dimensionof the parameters increases proportionally to the sample size, our ML estimator suﬀers from the incidentalparameter problem. As will be conﬁrmed in the numerical simulations reported later, the incidental parameterbias can be severe. Thus, to avoid the incidental parameter problem, as a second novel part of this study, wefocus on a situation where the individuals can be classiﬁed into several unknown groups and their speciﬁceﬀects are homogeneous within each group. If the number of the latent groups is ﬁxed, we can expect thatthe ML estimator becomes asymptotically unbiased at the parametric rate and hence the standard inferenceprocedure can be applied.In the literature on statistical network data analysis, uncovering such latent group structures in networkshas been intensively studied (e.g., Newman and Girvan, 2004; Bickel and Chen, 2009; Fortunato, 2010;Karrer and Newman, 2011; Rohe et al. , 2011; Abbe, 2017). Indeed, putting the strategic interaction eﬀectaside, our network formation model can be regarded as a type of the stochastic block model , one of the majormodeling approaches in the above literature, with additional covariates, as in Zhang et al. (2019). In paneldata analysis also, identiﬁcation of unobserved grouped heterogeneity is one of the most active researchareas (e.g., Bonhomme and Manresa, 2015; Ke et al. , 2016; Su et al. , 2016; Wang et al. , 2018). Althoughthe application of these grouping methods to econometric network models has been relatively limited thusfar, it is a promising approach, as discussed in Bonhomme (2020). If we can recover the true group structurewith probability approaching one (w.p.a.1) by using any method, the individual ﬁxed-eﬀect parameters canbe estimated at a faster rate than the case of no grouping. In this study, among several alternative methods,we adopt the binary segmentation (BS) method (see, e.g., Bai, 1997; Ke et al. , 2016; Lian et al. , 2019; Wangand Su, 2020). Compared to the other grouping methods, the BS method has several favorable propertiesincluding fast computation speed and robustness as it does not require us to set initial values.The whole estimation procedure is divided into three steps. The ﬁrst step is to obtain the ML estimatorwithout considering the latent group structures. Although this estimator suﬀers from the incidental parameterbias, it is still possible to produce consistent estimates for the heterogeneity parameters. In the second step,we apply the BS method with respect to the estimated sender eﬀect parameters and the receiver eﬀectparameters separately to identify each agent’s group memberships. In the third step, we re-estimate themodel using the ML method given the estimated group structure. Under certain regularity conditions, weshow that the proposed estimator is asymptotically unbiased and normal at the parametric rate. Furthermore,3he estimator is asymptotically equivalent to the “oracle” estimator that is obtained based on the (unknown)true group memberships.To illustrate our model empirically, we investigate the formation of international visa-free travel networks,where the dependent variable of interest is deﬁned as follows: g i,j = 1 if country i allows the citizens incountry j to visit i without visas and g i,j = 0 if not. We apply our model framework to the network of 57countries selected mainly from Asia, the Middle East, the former USSR, and Oceania. As expected, we ﬁndthe presence of a certain level of degree heterogeneity in terms of both the sender and the receiver eﬀects.Interestingly, there seems to be a negative correlation between the sender eﬀects and the receiver eﬀects (inother words, there is a tendency that a country’s sender eﬀect increases as its receiver eﬀect decreases). Ourestimation result also suggests that there is a signiﬁcant strategic complementarity in the network formationbehavior. Another interesting ﬁnding is that the countries are homophilous – tend to connect with similarothers – in terms of the political system. These ﬁndings would highlight the usefulness of the proposedmodel and method. Organization of the paper:

The remainder of the paper is organized as follows. In Section 2, weformally introduce the model investigated in this study. In this section, we demonstrate that our pairwisemodel exhibits multiple equilibria and discuss the conditions under which the model can be point-identiﬁed.Section 3 provides a detailed explanation about our three-step ML estimator. We also investigate theasymptotic properties of the proposed estimator in this section. In Section 4, we present a set of MonteCarlo experiments to evaluate the ﬁnite sample performance of the proposed estimator. Section 5 presentsour empirical analysis, and, ﬁnally, Section 6 concludes. All the technical details are relegated to Appendix.

Notation:

For a natural number n , I n denotes an n × n identity matrix. {·} denotes the indicator function,which is one if its argument is true and zero otherwise. For a matrix A , we use || A || to denote its Frobeniusnorm: || A || = (cid:112) tr { AA (cid:62) } , where tr {·} is a trace of a matrix. When A is a square matrix, we use λ min ( A ) to denote its smallest eigenvalue. For a vector a = ( a , . . . , a k ) (cid:62) , || a || ∞ denotes its maximum norm: || a || ∞ = max ≤ i ≤ k | a i | . For a general set X , we use X int to denote its interior. In addition, |X | denotes thecardinality of X . c (possibly with subscript) denotes a generic positive constant whose exact value may varyper case. Suppose that we have a sample of n agents that form social networks whose connections are represented byan n × n adjacency matrix G n = ( g i,j ) ≤ i,j ≤ n . These agents can be individuals, ﬁrms, municipalities, ornations depending on the context. The network is directed; that is, regardless of the value of g j,i , we observe g i,j = 1 if agent i links to j and g i,j = 0 otherwise. There are no self-loops; that is, the diagonal elements4f G n are all zero. Throughout the paper, we assume that the status of ( g i,j , g j,i ) is determined solely by thepair of agents ( i, j ) , without considering the status of other network links. Speciﬁcally, for each pair ( i, j ) ,suppose that i ’s marginal payoﬀ of forming a link with j given g j,i = q is written as u i,j ( q ) = Z (cid:62) i,j β + α q + A ,i + B ,j − (cid:15) i,j , for i (cid:54) = j. Here, Z i,j ∈ R d z is a vector of observed covariates, A ,i ∈ R is agent i ’s individual speciﬁc eﬀect as a“sender”, B ,j ∈ R is j ’s individual speciﬁc eﬀect as a “receiver”, (cid:15) i,j ∈ R is an unobservable payoﬀcomponent, and β ∈ R d z and α ∈ R are unknown coeﬃcient vector and the interaction eﬀect parameter,respectively. The covariates Z i,j and Z j,i may contain common elements; however, in the later discussion,we require that they must have some agent speciﬁc elements that can vary across the partners. The individualspeciﬁc eﬀects A ,i and B ,j , which we call the sender and the receiver eﬀect, respectively, can be interpretedas the level of i ’s willingness to create connections with others and the popularity of j , respectively, thatgenerate degree heterogeneity across the agents. Following Dzemski (2019) and Yan et al. (2019), we treat { ( A ,i , B ,i ) } as ﬁxed-eﬀect parameters to be estimated.We assume that the agents have complete information; that is, the realizations of ( Z i,j , Z j,i ) and ( (cid:15) i,j , (cid:15) j,i ) are common knowledge to both i and j . Then, if we assume that the observed network G n is formed by acollection of Nash equilibrium actions, we obtain the following econometric model: g i,j = (cid:110) Z (cid:62) i,j β + α g j,i + A ,i + B ,j ≥ (cid:15) i,j (cid:111) g j,i = (cid:110) Z (cid:62) j,i β + α g i,j + A ,j + B ,i ≥ (cid:15) i,j (cid:111) , for i (cid:54) = j. (2.1)The following are the two examples to which the above framework can be potentially applied. Example 2.1 (Online social networking) . The analysis of online social networking behavior is an activeresearch topic in network science. For some social networking sites, users can easily establish links to others(become a follower ) without mutual consent. In addition, whether a person becomes a follower of someoneis often irrelevant to who else they are following. Thus, this would be a situation where our frameworkreasonably ﬁts.

Example 2.2 (International visa-free network) . In the research on international migration and tourism,investigating the determinants and impacts of visa policies is one of the central interests (e.g., Neiman andSwagel, 2009; Neumayer, 2010; McKay and Tekleselassie, 2018). As bilateral visa policies is naturallyobserved as a consequence of strategic (economic and/or political) interactions between the two countries,our model would be an appropriate analytical tool here. In our empirical study presented in Section 5, bysetting g i,j = 1 if country i allows visa-free entry for the citizens of country j and g i,j = 0 if not, we willshow that the magnitude of the bilateral interaction in visa policies is signiﬁcant.In these examples, we can naturally imagine that the strategic interaction eﬀect α is positive (i.e.,strategic complements). We assume that strategic complementarity would be reasonable for most empirical5ituations of network formation games. Then, throughout the paper, we impose this assumption: α > .Under strategic complementarity, each pair’s Nash equilibrium action can be summarized in Figure 2.1. Asshown in the ﬁgure, the space of ( (cid:15) i,j , (cid:15) j,i ) cannot be partitioned into non-overlapping regions associatedwith the four alternative realizations of ( g i,j , g j,i ) . That is, both ( g i,j , g j,i ) = (1 , and ( g i,j , g j,i ) = (0 , can occur in the shaded area in the ﬁgure, and the link status is not uniquely determined in this area (i.e.,multiple equilibria exist). This non-uniqueness of model-consistent decisions is called incompleteness andhas been extensively studied in the literature on simultaneous equation models for discrete outcomes (e.g.,Tamer, 2003; Lewbel, 2007; Ciliberto and Tamer, 2009; Chesher and Rosen, 2020).Figure 2.1: Pure strategy Nash equilibriumThere are several approaches to handle this incompleteness issue in the literature. Among them, thisstudy adopts the traditional approach developed by Bresnahan and Reiss (1990) and Berry (1992) thatfocuses only on the unique equilibrium outcomes. That is, we consider estimating the model based only onthe information about “one-way links” in the network.In the following, we assume that { (cid:15) i,j } are identically distributed with a known cumulative distributionfunction (CDF) F . Further, we assume that the pairs { ( (cid:15) i,j , (cid:15) j,i ) } are independent and identically distributed(i.i.d.) across pairs, and their joint distribution is represented by H ( · , · ; ρ ) such that Pr( (cid:15) i,j ≤ a , (cid:15) j,i ≤ a ) = H ( a , a ; ρ ) , where ρ ∈ R is a parameter controlling the correlation between (cid:15) i,j and (cid:15) j,i . Deﬁne y (1 , i,j ≡ { ( g i,j , g j,i ) = (1 , } and y (0 , i,j ≡ { ( g i,j , g j,i ) = (0 , } . Let χ n,a be the a -th column of I n , A = ( A , , . . . , A ,n ) (cid:62) , and B = ( B , , . . . , B ,n ) (cid:62) , so that we can write Z (cid:62) i,j β + A ,i + B ,j = W (cid:62) i,j Π , Refer to de Paula (2013) for a comprehensive survey on this topic. W i,j = ( Z (cid:62) i,j , χ (cid:62) n,i , χ (cid:62) n,j ) (cid:62) and Π = ( β (cid:62) , A (cid:62) , B (cid:62) ) (cid:62) . In addition, we denote θ = ( β (cid:62) , α , ρ ) (cid:62) and γ = ( A (cid:62) , B (cid:62) ) (cid:62) . Then, the conditional probabilities of { y (1 , i,j = 1 } and { y (0 , i,j = 1 } are respectivelygiven as follows: P (1 , i,j ( θ , γ ) ≡ F ( W (cid:62) i,j Π ) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) ,P (0 , i,j ( θ , γ ) ≡ F ( W (cid:62) j,i Π ) − H ( W (cid:62) i,j Π + α , W (cid:62) j,i Π ; ρ ) . Here, note that the equalities y (1 , i,j = y (0 , j,i and P (1 , i,j ( θ, γ ) = P (0 , j,i ( θ, γ ) hold. Thus, the likelihoodfunction can be concentrated with respect to ( y (1 , i,j , P (1 , i,j ( θ, γ )) ; hereinafter, we omit the superscripts anddenote y i,j = y (1 , i,j and P i,j ( θ, γ ) = P (1 , i,j ( θ, γ ) when there is no confusion. Then, the log-likelihoodfunction can be written as L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i [ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [2 y i,j ln P i,j ( θ, γ ) + (1 − y i,j ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))] , (2.2)where N ≡ n ( n − . As above, we can consider three equivalent representations for the log-likelihoodfunction and switch between them according to analytical convenience. As mentioned in Introduction section, this study considers situations where the agents are grouped into severalsub-samples, and the individual ﬁxed eﬀects are heterogeneous across these groups but are homogeneouswithin the groups in the following manner: A ,i = K A (cid:88) k =1 a ,k · { i ∈ C A ,k } , B ,i = K B (cid:88) k =1 b ,k · { i ∈ C B ,k } . (2.3)That is, the agents can be classiﬁed into K A groups C A ≡ {C A , , . . . , C A ,K A } in terms of the sender eﬀects { A ,i } , where K A is the total number of groups, which form a partition of { , . . . , n } into K A subsets.Similarly, in terms of the receiver eﬀects { B ,i } , the agents can be grouped as C B ≡ {C B , , . . . , C B ,K B } .When an individual is a member of the intersection C A ,k (cid:84) C B ,l , his/her sender and receiver eﬀects are equalto a ,k and b ,l , respectively. This intersection set would correspond to one “community” in the communitydetection literature. The group where each individual belongs to remains unknown to us. Meanwhile, itis often assumed in the literature that the number of groups is known to researchers (e.g., Bonhomme and7anresa, 2015; Okui and Wang, 2020). Then, following these studies, we treat K A and K B as known valuesand assume that K A , K B ≥ . We discuss how to choose K A and K B in practice in Section 5. Notethat transforming ( A , B ) to ( A + c, B − c ) for any constant c does not change the model (2.1). Thus,without loss of generality, we assume that a , = 0 for location normalization.Under this setup, the full ML estimator solves max ( θ, a , b , C A , C B ) L n ( θ, A , B ) , (2.4)where a ≡ ( a , . . . , a K A ) with a = 0 , b ≡ ( b , . . . , b K B ) , C A ≡ {C A , . . . , C AK A } , C B ≡ {C B , . . . , C BK B } , A i = (cid:80) K A k =1 a k · { i ∈ C Ak } , and B i = (cid:80) K B k =1 b k · { i ∈ C Bk } . The maximization problem in (2.4) is clearlya combinatorial (NP-hard) optimization problem. In the context of panel data models, several authors haveproposed iterative k-means (like) algorithms to computationally obtain a (local) solution eﬃciently to theproblems similar to (2.4) (e.g., Bonhomme and Manresa, 2015; Liu et al. , 2020). However, the iterativealgorithm is still computationally demanding. More importantly, it cannot be directly applied to our networkmodel where each agent’s heterogeneity parameters aﬀect not only the value of his/her own likelihoodfunction but also that of the others.Hence, in this paper, we propose to decompose the maximization problem in (2.4) into three steps. Theﬁrst step is to estimate γ = ( A (cid:62) , B (cid:62) ) (cid:62) using the full ML estimator based on the log-likelihood function in(2.2) without explicitly considering the group structure. Given the consistent estimates of these parameters,the second step is to estimate the group memberships C A and C B using the BS algorithm (e.g., Bai, 1997; Ke et al. , 2016; Lian et al. , 2019; Wang and Su, 2020). The ﬁnal step is to solve (2.4) with the group structurereplaced by the estimated C A and C B . Before presenting the estimation procedure in detail, we discuss identiﬁcation conditions for the parametersin model (2.1). It is important to note that, even when individual heterogeneity parameters have only ﬁnitevariations groupwisely, each individual’s parameters must be point-identiﬁed separately to estimate the groupstructure consistently. A practical reason for this is that our three-step estimator based on the BS algorithmrequires preliminary consistent estimates of A and B in the estimation of the group structure. Note thatwe can always estimate “pseudo-true” group memberships based on the maximum likelihood principle evenwhen some elements of A and B are not point-identiﬁed. However, they may not necessarily coincidewith the true group memberships in general. This assumption should be debatable. Rather than directly specifying the number of groups, in the literature on the BSalgorithm for example, Ke et al. (2016) and Lian et al. (2019) propose introducing an additional threshold parameter to detect thegroup structure. However, since the number of groups can vary only discretely with the threshold value, introducing the thresholdparameter is essentially equivalent to selecting the number of groups. For the use of the BS algorithm, Wang and Su (2020) formallyshows that Bayesian Information Criterion (BIC)-type criterion can consistently select the correct number of groups as the samplesize increases. A more formal investigation on this issue is left as a future work. For a related discussion, see Bonhomme and Manresa (2015). A and B . In the following, similarlyas above, we set A , = 0 without loss of generality. To facilitate the discussion, we also introduce severalsimplifying assumptions, some of which are mentioned previously. Assumption 2.1. (i) The payoﬀ disturbances { (cid:15) i,j } are identically distributed on the whole R with aknown strictly increasing marginal CDF F ( · ) . (ii) The pairs { ( (cid:15) i,j , (cid:15) j,i ) } are i.i.d. across dyads with jointCDF H ( · , · ; ρ ) , and H ( · , · ; ρ ) is strictly increasing in each argument for all ρ ∈ R . (iii) F ( · ) is threetimes continuously diﬀerentiable, and H ( · , · ; · ) is three times continuously diﬀerentiable with respect to allarguments. Let Θ ≡ B × A × R , A n ≡ { } × A n − , B n ≡ B n , and C n ≡ A n × B n , where B ⊂ R d z , A ⊂ R ++ , R ⊂ R , A ⊂ R , and B ⊂ R are parameter spaces for β , α , ρ , A i ’s, and B i ’s, respectively. Assumption 2.2. (i) θ ∈ Θ int , where Θ is compact. (ii) For all i = 2 , , . . . , A ,i ∈ A int , and, for all i = 1 , , . . . , B ,i ∈ B int , where A and B are compact. Assumption 2.3.

The covariates { Z i,j } are uniformly bounded. In Assumption 2.1(i), we assume that the marginal CDF of the error term is known. This assumption istypically adopted in the estimation of complete information games. As shown by Khan and Nekipelov (2018),when the marginal CDFs of ( (cid:15) i,j , (cid:15) j,i ) are unknown, it is generally impossible to estimate the interaction eﬀect α at the parametric rate. Assumption 2.1(ii) requires that the error terms are independent across dyads. Notethat the parameters ( A ,i , B ,i ) have the role of accommodating all unobserved payoﬀ components in linkformation behavior speciﬁc to i . Therefore, assuming the independence within the remainders { (cid:15) i, , . . . , (cid:15) i,n } should not be too restrictive. The other requirements in Assumption 2.1 are standard in that they are satisﬁedin most of commonly used parametric models (such as logit and probit). In Assumption 2.2(ii), we assumethat the ﬁxed-eﬀect parameters { ( A ,i , B ,i ) } are bounded. Although imposing boundedness on the degreeheterogeneity parameters is commonly accepted in the literature on network formation models, some studiesconsider a more general framework where || A || ∞ and || B || ∞ can grow slowly (e.g., Yan et al. , 2016; Yan et al. , 2019). The admissible parameter space for the correlation parameter ρ , R , depends on the choice of thefunctional form of H , which is typically R = [ − , . Assumption 2.3 should not be restrictive in practice.Hereinafter, we ﬁx the values of { Z i,j } ; that is, we interpret the following analysis as being conditional onthe realization of { Z i,j } . Thus, any randomness in the model is considered to be due to the randomness of { (cid:15) i,j } .An important implication from Assumptions 2.1–2.3 is that the one-way link-formation probabilities { P i,j ( θ, γ ) } are uniformly bounded away from and for all possible parameter values. In other words,we are assuming that our networks are dense such that the number of one-way links per agent will be aboutproportional to the number of sampled agents. The plausibility of this assumption depends on the context ofapplication. For example, international trade networks and the visa-free travel networks given in Example9.2 and Section 5 may be regarded as dense networks. Assumption 2.4.

For any ( β , α , γ ) , ( β , α , γ ) ∈ B × A × C n such that ( β , α , γ ) (cid:54) = ( β , α , γ ) ,either or both (a) and (b) hold: ( a ) lim inf n →∞ N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:110) W (cid:62) i,j (Π − Π ) > , W (cid:62) j,i (Π − Π ) + α − α < (cid:111) > b ) lim inf n →∞ N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:110) W (cid:62) i,j (Π − Π ) < , W (cid:62) j,i (Π − Π ) + α − α > (cid:111) > , where Π = ( β (cid:62) , γ (cid:62) ) (cid:62) , and Π = ( β (cid:62) , γ (cid:62) ) (cid:62) . Assumption 2.4 is our main identiﬁcation condition, which basically requires the following two condi-tions. The ﬁrst condition is a standard full-rank condition for { W i,j } and { W j,i } . The second condition isthat at least either Z i,j or Z j,i should contain agent-speciﬁc covariates that have large enough supports andalso have variations across all potential partners. If no such variables exist, since the signs of Z (cid:62) i,j ( β − β ) and Z (cid:62) j,i ( β − β ) cannot diﬀer for some parameter values for all ( i, j ) ’s, Assumption 2.4 does not hold.It should be noted that this assumption is not inconsistent with Assumption 2.3 under the compactnessof the parameter space (i.e., Assumption 2.2). While the existence of player-speciﬁc continuous variableswith unbounded supports is typically required in the identiﬁcation of non/semiparametric game models –the so-called identiﬁcation-at-inﬁnity argument (see, e.g., Tamer, 2003; Kline, 2015), we can develop ouridentiﬁcation result under less stringent conditions owing to the full-parametric model speciﬁcation. Theorem 2.3 (Identiﬁcation) . (i) Suppose that Assumptions 2.1(i)–(ii) and 2.2–2.4 hold. Then, if ρ isknown, ( β , α , γ ) can be point-identiﬁed on B × A × C n .(ii) If ρ is unknown, but it is a unique maximizer of L ∗ n ( ρ ) , then ( θ , γ ) can be point-identiﬁed on Θ × C n ,where L ∗ n ( ρ ) ≡ E L n (( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , ρ ) (cid:62) , (cid:101) γ ( ρ )) , (2.5) and ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) ≡ argmax ( β,α, γ ) ∈B×A× C n E L n (( β (cid:62) , α, ρ ) (cid:62) , γ ) . These identiﬁcation results are similar to those in Theorem 2 of Aradillas-Lopez and Rosen (2019).That is, the model parameters can be point-identiﬁed either (i) if the distribution of unobserved payoﬀdisturbances is fully known or (ii) if ρ uniquely maximizes the concentrated log-likelihood function (2.5).In the literature on network formation models, it is often assumed that (cid:15) i,j and (cid:15) j,i are independent (e.g., Hoﬀ et al. , 2002; Jochmans, 2018; Yan et al. , 2019). If they are independent, since ρ = 0 is known, condition (i) Graham (2016) and Jochmans (2018) developed conditional likelihood methods that can be used to estimate the homophilyparameters (i.e., β in our context), even when the networks are sparse. However, their approach cannot be directly applied to ourcase because of the interdependence between g i,j and g j,i .

10s satisﬁed. Condition (ii) clearly depends on the choice of H function and is diﬃcult to verify in general;however, this is directly empirically testable. A more primitive suﬃcient condition for this is that L ∗ n ( ρ ) is strictly concave, which is satisﬁed when ∂ L ∗ n ( ρ ) / ( ∂ρ ) is strictly negative under Assumption 2.1(iii).We provide an explicit form of ∂ L ∗ n ( ρ ) / ( ∂ρ ) in Appendix C.1. Even when neither of condition (i) nor(ii) is satisﬁed, it remains possible to partially identify the parameters, as in Aradillas-Lopez and Rosen(2019). For example, if L ∗ n ( ρ ) has two peaks at ρ and ρ , the resulting identiﬁed set is directly given by { ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , ρ, (cid:101) γ ( ρ )) : ρ ∈ { ρ , ρ }} . Remark 2.4 (Other identiﬁcation strategies) . There are several other routes for identiﬁcation of our modelthan the one in Theorem 2.3. For example, as our model is fully parametric, a classical parametricidentiﬁcation approach may be used based on the properties of the information matrix (e.g., Rothenberg,1971; Bjorn and Vuong, 1984), although it is not easy to verify in practice. If one admits the existence of aplayer-speciﬁc continuous variable that has a positive density on the whole R , then the model can be easilyidentiﬁed by the identiﬁcation-at-inﬁnity approach in the same manner as in Tamer (2003). Since assumingthe existence of such unbounded variables is restrictive in practice, several authors have proposed otherapproaches based on some shape restrictions on the distribution of unobservables, without relying on theidentiﬁcation-at-inﬁnity argument (e.g., Kline, 2016; Zhou, 2019). Investigating whether their approachescan be applied to our model is an interesting topic, but it is left for a future work. The ﬁrst step of the ML estimation aims to obtain consistent estimates of γ = ( A (cid:62) , B (cid:62) ) (cid:62) . Let ( (cid:98) θ n , (cid:98) γ n ) = argmax ( θ, γ ) ∈ Θ × C n L n ( θ, γ ) , (3.1)where (cid:98) θ n = ( (cid:98) β (cid:62) n , (cid:98) α n , (cid:98) ρ n ) (cid:62) , and (cid:98) γ n = ( (cid:98) A (cid:62) n , (cid:98) B (cid:62) n ) (cid:62) . Below, we present the asymptotic properties of theinitial full ML estimator in (3.1) with the main focus on (cid:98) γ n . Instead of introducing particular identiﬁcationconditions, for generality, we directly assume that the true parameter ( θ , γ ) is a unique maximizer of E L n ( θ, γ ) . Assumption 3.1. E L n ( θ, γ ) is uniquely maximized at ( θ , γ ) ∈ Θ × C n for all suﬃciently large n . We ﬁrst establish several consistency results in the next theorem. In particular, we show that the individualspeciﬁc eﬀects can be uniformly consistently estimated.

Theorem 3.1.

Suppose that Assumptions 2.1–2.3 and 3.1 hold. Then, we have (i) (cid:98) θ n p → θ , (ii) n (cid:80) ni =1 | (cid:98) A n,i − A ,i | p → , (iii) n (cid:80) ni =1 | (cid:98) B n,i − B ,i | p → , and (iv) || (cid:98) γ n − γ || ∞ p → . (cid:98) γ n . To this end, it is convenient to re-deﬁne (cid:98) θ n and θ as (cid:98) θ n = argmax θ ∈ Θ L n ( θ, (cid:101) γ n ( θ )) and θ = argmax θ ∈ Θ E L n ( θ, (cid:101) γ ( θ )) , respectively, where (cid:101) γ n ( θ ) ≡ argmax γ ∈ C n L n ( θ, γ ) and (cid:101) γ ( θ ) ≡ argmax γ ∈ C n E L n ( θ, γ ) for any given θ ∈ Θ , assuming that they are well-deﬁned. Further, we deﬁne γ − = ( A , . . . , A n , B , . . . , B n ) (cid:62) , H n, γγ ( θ, γ ) (2 n − × (2 n − ≡ ∂ L n ( θ, γ ) ∂ γ − ∂ γ (cid:62)− , H n,θθ ( θ, γ ) ( d z +2) × ( d z +2) ≡ ∂ L n ( θ, γ ) ∂θ∂θ (cid:62) , H n, γ θ ( θ, γ ) (2 n − × ( d z +2) ≡ ∂ L n ( θ, γ ) ∂ γ ∂θ (cid:62) , and H n,θ γ ( θ, γ ) ≡ H n, γ θ ( θ, γ ) (cid:62) . The exact form of H n, γγ ( θ, γ ) can be found in (A.2) in Appendix A.Finally, let I n,θθ ( θ, γ ) ( d z +2) × ( d z +2) ≡ H n,θθ ( θ, γ ) − H n,θ γ ( θ, γ ) [ H n, γγ ( θ, γ )] − H n, γ θ ( θ, γ ) . This I n,θθ ( θ, γ ) matrix serves as the Hessian matrix for the concentrated ML estimator (cid:98) θ n (see, e.g.,Amemiya, 1985). Now, we introduce the following assumptions. Assumption 3.2. (cid:101) γ ( θ ) uniquely exists uniformly on { Θ : || θ − θ || ≤ ε } , where ε > is an arbitrary smallconstant. Assumption 3.3.

For an arbitrary small constant ε > , there exist constants c γ , c θ > that may dependon ε such that (i) λ min ( − n · H n, γγ ( θ, γ )) > c γ and (ii) λ min ( −I n,θθ ( θ, γ )) > c θ w.p.a.1 uniformly on { Θ × C n : || θ − θ || ≤ ε, || γ − γ || ∞ ≤ ε } . Assumptions 3.2 and 3.3 should be fairly reasonable in practice. Then, under these additional assump-tions, we can derive the (cid:96) -norm and max-norm convergence rates for (cid:98) γ n , as shown in the next theorem. Theorem 3.2.

Suppose that Assumptions 2.1–2.3 and 3.1–3.3 hold. Then, we have (i) n (cid:80) ni =1 | (cid:98) A n,i − A ,i | = O P ( n − / ) , (ii) n (cid:80) ni =1 | (cid:98) B n,i − B ,i | = O P ( n − / ) , and (iii) || (cid:98) γ n − γ || ∞ = O P ( (cid:112) ln n/n ) . The max-norm convergence rate obtained in Theorem 3.2 is consistent with the result of Theorem 3 inGraham (2017) and that of Theorem 3.1 in Yan et al. (2019).

Given the consistent estimates of A and B , we use the BS algorithm to estimate the group structure. Weﬁrst sort (cid:98) A n and (cid:98) B n in ascending order and write the order statistics as (cid:98) A n, (1) ≤ (cid:98) A n, (2) ≤ · · · ≤ (cid:98) A n, ( n ) , and (cid:98) B n, (1) ≤ (cid:98) B n, (2) ≤ · · · ≤ (cid:98) B n, ( n ) .

12n the following, we mainly describe the estimation of the group membership for the sender eﬀects, C A . Theexactly the same procedure described below can be used to estimate C B .An concept behind the BS algorithm is quite simple. If { A ,i } are heterogeneous across K A latentgroups but are homogeneous within the groups, there should exist K A − “break points” in the sorted { A ,i } . Since (cid:98) A n is uniformly consistent for A , these break points appear also in the following sequence: (cid:98) A n, (1) , . . . , (cid:98) A n, ( n ) w.p.a.1. For ≤ i < j ≤ n , we deﬁne (cid:98) ∆ A ( i, j ) as the sum of squared variations over { (cid:98) A n, ( i ) , . . . , (cid:98) A n, ( j ) } ; namely, (cid:98) ∆ A ( i, j ) ≡ j (cid:88) l = i ( (cid:98) A n, ( l ) − ¯ A n,i,j ) , where ¯ A n,i,j ≡ j − i + 1 j (cid:88) l = i (cid:98) A n, ( l ) . Further, we deﬁne (cid:98) S Ai,j ( κ ) ≡  j − i +1 (cid:16) (cid:98) ∆ A ( i, κ ) + (cid:98) ∆ A ( κ + 1 , j ) (cid:17) if j < κ j − i +1 (cid:98) ∆ A ( i, j ) if j = κ. That is, (cid:98) S Ai,j ( κ ) provides the total variance of { (cid:98) A n, ( i ) , . . . , (cid:98) A n, ( j ) } when a break point is placed at κ . Assumingthat K A ≥ , the BS algorithm proceeds as follows: Step 1 ( K A = 2) : We ﬁnd the ﬁrst break point, say (cid:98) t , by (cid:98) t = argmin ≤ κ , we proceed to the next step. Step 2 ( K A = 3) : Now, if K A = 3 , there exists one more break point either in { (cid:98) A n, ( i ) } (cid:98) t i =1 or in { (cid:98) A n, ( i ) } ni = (cid:98) t +1 . In otherwords, either one of the two converges to a sequence of constants as the sample size increases. Then, whenwe compute (cid:98) S A , (cid:98) t ( (cid:98) t ) and (cid:98) S A (cid:98) t +1 ,n ( n ) , and if (cid:98) S A , (cid:98) t ( (cid:98) t ) > (cid:98) S A (cid:98) t +1 ,n ( n ) for example, the break point is likely tolie in the former subset. Thus, the second break point, say (cid:98) t , can be found by (cid:98) t =  argmin ≤ κ< (cid:98) t (cid:98) S A , (cid:98) t ( κ ) if (cid:98) S A , (cid:98) t ( (cid:98) t ) > (cid:98) S A (cid:98) t +1 ,n ( n )argmin (cid:98) t +1 ≤ κ such that min k (cid:54) = k (cid:48) | a ,k − a ,k (cid:48) | > c A and min k (cid:54) = k (cid:48) | b ,k − b ,k (cid:48) | > c B . (ii) |C A ,k | /n → τ Ak ∈ (0 , for all k = 1 , . . . , K A and |C B ,k | /n → τ Bk ∈ (0 , for all k = 1 , . . . , K B . Assumption 3.4 is parallel to Assumption A2 in Wang and Su (2020). Similar assumptions are commonlyused in the literature of panel data models with latent group structure. The following theorem provides theconsistency result of the BS algorithm:

Theorem 3.3.

Suppose that Assumptions 2.1–2.3 and 3.1–3.4 hold. Then, we have

Pr( (cid:98) C An = C A ) → and Pr( (cid:98) C Bn = C B ) → . Although the proof of Theorem 3.3 is almost analogous to Ke et al. (2016) and Wang and Su (2020), forcompleteness, we provide it in Appendix B.

Remark 3.4 (Fine-tuning) . Bai (1997) shows that the BS method tends to over/underestimate the locationof break points depending on the share of each group and the gaps between the values of the group eﬀects.14o account for this problem, he proposes the following repartitioning method. Let { (cid:98) t , . . . , (cid:98) t K A } be theestimated break points obtained by the standard BS method. Then, we replace the initial estimate (cid:98) t k with (cid:98) t repart k ≡ argmin (cid:98) t k − +1 ≤ κ< (cid:98) t k +1 (cid:98) S A (cid:98) t k − +1 , (cid:98) t k +1 ( κ ) for all k = 1 , . . . , K A − . Letting (cid:98) C A, repart n bethe resulting repartitioned estimator of C A , given the result of Theorem 3.3, it is straightforward to seethat (cid:98) C A, repart n is also consistent for C A . Note that the above repartitioning procedure can be implementedrecursively until convergence. The numerical simulations in Section 4 indicate that using this repartitioningmethod remarkably improves the probability of correctly predicting the group memberships. In the ﬁnal step, we solve (2.4) approximately by using ( (cid:98) C An , (cid:98) C Bn ) in the place of ( C A , C B ) . Recalling that a is pinned at a = 0 , let δ ≡ ( θ (cid:62) , a , . . . , a K A , b , . . . , b K B ) (cid:62) , and D ≡ Θ × A K A − × B K B for theparameter space of δ . We denote δ as the true value of δ . Then, our ﬁnal ML estimator for δ is deﬁned as (cid:98) δ n = argmax δ ∈ D (cid:98) L n ( δ ) , where (cid:98) L n ( δ ) ≡ L n (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ (cid:98) C An,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ (cid:98) C Bn,k } (cid:111)(cid:17) . Similarly, we deﬁne (cid:98) δ oracle n = argmax δ ∈ D L n ( δ ) , where L n ( δ ) ≡ L n (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ C A ,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ C B ,k } (cid:111)(cid:17) ; that is, (cid:98) δ oracle n is the “oracle”estimator that is computed based on the true C A and C B . Since (cid:98) δ oracle n is the standard parametric ML estimator,the estimator follows a normal distribution asymptotically at the parametric rate, and its asymptotic covariancematrix is given by the inverse Fisher Information matrix. Meanwhile, we have shown in Theorem 3.3 thatthe estimated group memberships ( (cid:98) C An , (cid:98) C Bn ) are equal to ( C A , C B ) w.p.a.1. Therefore, we can claim that theﬁnal ML estimator (cid:98) δ n has asymptotically the same statistical performance as the oracle estimator (cid:98) δ oracle n , and,thus, it is asymptotically fully eﬃcient. We formally state this result in the next theorem. Theorem 3.5.

Suppose that Assumptions 2.1–2.3 and 3.1–3.4 hold. In addition, we assume that I δδ ≡− lim n →∞ E (cid:2) ∂ L n ( δ ) / ( ∂δ∂δ (cid:62) ) (cid:3) exists and is positive deﬁnite. Then, (cid:98) δ n and (cid:98) δ oracle n have the sameasymptotic distribution: (cid:113) N ( (cid:98) δ oracle n − δ ) d → N ( d z + K A + K B +1 , I − δδ ) . Recall that the asymptotic equivalence between (cid:98) δ n and (cid:98) δ oracle n relies on the dense network structurewhere each agent’s speciﬁc eﬀects can be point-identiﬁed. When the networks are not dense, (cid:98) δ n is generallyinconsistent, while (cid:98) δ oracle n may be still consistent (potentially with a slower convergence rate). Finally, notethat the above discussions hold true if the repartitioned estimator ( (cid:98) C A, repart n , (cid:98) C B, repart n ) is used instead of ( (cid:98) C An , (cid:98) C Bn ) . 15 Monte Carlo Experiments

In this section, we examine the ﬁnite sample performance of the three-step ML estimator. We consider thefollowing data-generating process for the Monte Carlo experiments: u i,j ( q ) = Z i,j, β , + Z i,j, β , + α q + K A (cid:88) k =1 a ,k · { i ∈ C A ,k } + K B (cid:88) k =1 b ,k · { i ∈ C B ,k } − (cid:15) i,j , for i (cid:54) = j, where Z i,j, = | X i − X j | with X i i.i.d. ∼ Uniform[ − , , Z i,j, i.i.d. ∼ N (0 , , ( (cid:15) i,j , (cid:15) j,i ) is i.i.d. across dyadsas the standard bivariate normal with correlation coeﬃcient ρ = 0 . , ( β , , β , , α ) = ( − . , . , . ,and K A = K B = 3 . For the groupwise heterogeneity parameters, we consider ( a , , a , , a , ) = (0 , r, r ) and ( b , , b , , b , ) = ( − . − r, − . , − . r ) for r ∈ { . , . , . } . The smaller (larger) r becomes,the more diﬃcult (easier) the identiﬁcation of the group structure. The group memberships are determinedrandomly while maintaining the equal size of each group. Exceptionally for observation , ∈ C A , (sothat A , = 0 ) is ﬁxed throughout the experiments. For each model setup, we consider two sample sizes: n ∈ { , } ; thus, the size of each group is 18 in the former case and is 25 in the latter. The number of MonteCarlo repetitions is set to 500 for each single experiment. For the estimation of the group memberships, forcomparison, we use both the standard BS method without repartitions and the repartitioned BS method.We ﬁrst report the simulation results of estimating the common parameters ( α , β , , β , , ρ ) . Table4.1 presents the bias and RMSE (root mean squared error) for the following four estimators: the initialML estimator given in (3.1) (1st-step ML), the three-step ML estimator based on the BS method withno iterations (BS0) and that with two iterations (BS2), and the oracle estimator based on the true groupmembership (Oracle). For estimating ( β , , β , ) , as expected, the 1st-step ML estimator is largely biased forall scenarios due to the incidental parameter problem. Although the three-step estimators (i.e., BS0 and BS2)also have some biases when r = 0 . and n = 54 , the biases disappear as either r or n increases. Thus, thesebiases are probably due to frequent misclassiﬁcation of group memberships under small r and n . In termsof RMSE, although we can observe a certain gap between the oracle estimator and the three-step estimators,the gaps can be reduced by increasing r and n , which is consistent with our theory. Interestingly, even whenusing the 1st-step ML estimator, the strategic interaction eﬀect and the error correlation parameter can beestimated with almost no bias.The simulation results of estimating the group memberships are summarized in Table 4.2. Here, wecompare the performance of BS0 and BS2 in terms of the ratio of correct group classiﬁcation. First of all,the results indicate that the repartitioned BS method (i.e., BS2) clearly outperforms the standard BS methodwithout repartitions (i.e., BS0). As expected, as r gets smaller, correctly predicting the group membershipbecomes signiﬁcantly more diﬃcult. If the gaps between the values of the group eﬀects are suﬃciently largeand the sample size is not small, BS2 can attain almost 90% of correct classiﬁcation. We cannot observe One might view that the results reported in Table 4.2 are not particularly good for the BS algorithm. A main reason for thatwould be that our model is a bivariate binary response model, whereas most of the previous studies using the BS algorithm has α β , β , ρ r n Estimator Bias RMSE Bias RMSE Bias RMSE Bias RMSE0.4 54 1st-step ML 0.027 0.194 -0.192 0.358 0.222 0.257 0.047 0.196BS0 0.056 0.174 -0.113 0.237 0.120 0.162 -0.059 0.176BS2 0.057 0.176 -0.128 0.254 0.134 0.175 -0.050 0.176Oracle 0.006 0.131 -0.008 0.082 0.009 0.092 0.001 0.12675 1st-step ML 0.032 0.122 -0.102 0.261 0.152 0.172 0.022 0.118BS0 0.043 0.116 -0.056 0.176 0.080 0.108 -0.054 0.125BS2 0.043 0.117 -0.063 0.190 0.091 0.117 -0.043 0.120Oracle -0.001 0.093 -0.002 0.061 0.004 0.066 0.004 0.0910.7 54 1st-step ML 0.030 0.221 -0.175 0.351 0.221 0.263 0.061 0.234BS0 0.020 0.176 -0.056 0.220 0.056 0.125 -0.072 0.193BS2 0.035 0.184 -0.074 0.235 0.084 0.143 -0.069 0.197Oracle 0.007 0.132 -0.010 0.079 0.010 0.094 0.002 0.12875 1st-step ML 0.031 0.127 -0.104 0.267 0.152 0.173 0.031 0.126BS0 0.017 0.114 -0.017 0.169 0.025 0.074 -0.074 0.138BS2 0.025 0.118 -0.038 0.183 0.048 0.086 -0.058 0.132Oracle -0.004 0.091 -0.001 0.058 0.004 0.065 0.006 0.0911.0 54 1st-step ML 0.027 0.234 -0.178 0.356 0.227 0.267 0.082 0.268BS0 -0.007 0.182 0.003 0.209 -0.018 0.108 -0.120 0.233BS2 0.013 0.186 -0.036 0.214 0.029 0.114 -0.092 0.224Oracle 0.006 0.136 -0.011 0.080 0.014 0.094 0.000 0.13975 1st-step ML 0.033 0.145 -0.111 0.274 0.155 0.179 0.045 0.148BS0 -0.009 0.130 0.029 0.173 -0.039 0.085 -0.103 0.172BS2 0.010 0.120 -0.012 0.158 0.012 0.079 -0.063 0.137Oracle -0.001 0.094 -0.002 0.059 0.006 0.066 0.005 0.097Note. 1st-step ML: the initial ML estimator, BS0: the three-step ML estimator based on the BS method with norepartitions, BS2: the three-step ML estimator based on the repartitioned BS method with two iterations, Oracle: theoracle estimator based on the true group membership. any clear diﬀerence between the estimation of C A and that of C B . As an empirical application of our model and method, we analyze the network of international visa-freetravels. The dependent variable of interest is G n = ( g i,j ) ≤ i,j ≤ n , where g i,j = 1 if country i allows thecitizens in country j to visit i without visas, and g i,j = 0 otherwise. Since the bilateral relationship aboutvisa-free policy is expected to be complementary, this would ﬁt into our model framework.In this empirical study, we consider 57 countries selected mainly from Asia, the Middle East, theformer USSR, and Oceania. The information about the visa policy of each country is taken from

Henly focused on models with a continuous outcome. The list of countries used in this empirical study is as follows: Armenia, Australia, Azerbaĳan, Bahrain, Bangladesh, Belarus,Bhutan, Brunei, Cambodia, China, Cyprus, Estonia, Fĳi, Georgia, Hong Kong, India, Indonesia, Iran, Iraq, Israel, Japan, Jordan,Kazakhstan, Kiribati, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lithuania, Malaysia, Moldova, Mongolia, Myanmar, Nauru,Nepal, New Zealand, Oman, Pakistan, Papua New Guinea, Philippines, Qatar, Russia, Saudi Arabia, Singapore, South Korea, SriLanka, Tajikistan, Thailand, Tonga, Turkey, UAE, Ukraine, Uzbekistan, Vanuatu, Viet Nam, and Yemen. These countries are

Correct classiﬁcation ratio r n

Estimator C A C B and Partners: Passport Index 2020 ( ). The totalnumber of dyads in this network is − / . From Table 5.1, which summarizes the distributionof the link connections, we can observe that the number of country pairs with one-way links is smaller thanthat with mutual links or no links. This would suggest the presence of complementarity in the networkformation process. According to the above-mentioned passport index, Japan, Singapore, and South Koreaare the top three countries among the 57 countries in terms of the number of all countries with visa-freeaccess. For our restricted sample network, South Korea has the largest in-degree (cid:80) i g i, South Korea = 49 . Forthe out-degree, Nepal has the largest value (cid:80) j g Nepal ,j = 55 ; that is, Nepal allows 55 countries (out of 57)to visit Nepal only with on-arrival visas. More detailed information can be found in Table C.1 in AppendixC.2. Table 5.1: Distribution of { ( g i,j , g j,i ) : 1 ≤ i < j ≤ n } g j,i = 0 g j,i = 1 g i,j = 0

495 343 g i,j = 1

349 409The network for all the 57 countries is quite complicated and diﬃcult to grasp the entire picture. Asone illustration of our data, Figure 5.1 presents the sub-network obtained by restricting the vertices to theEastern and Southeastern Asian countries. The left panel in the ﬁgure shows the whole shape of this sub-network. (Note that the direction of the arrows in the ﬁgure is “not” the direction of visa-free access, butit represents that the target country is allowed to visit the country at the arrow’s origin without visas.) The selected based on geographical proximity and ease of data collection. Based on their deﬁnition, we categorize electronic travel authorization (eTA) and on-arrival visa as visa-free access. ( g i,j · (1 − g j,i )) i,j ∈ Brunei ,...,

Viet Nam . From this ﬁgure, we can expect the existence of a certain level ofdegree heterogeneity. More speciﬁcally, Cambodia, for example, has ﬁve outgoing one-way links in thissub-network, suggesting that this country would have a larger sender eﬀect A . In contrast, countries such asJapan and South Korea would exhibit a larger receiver eﬀect B .Figure 5.1: Eastern and Southeastern Asian sub-network(Left panel: the whole sub-network, right panel: only one-way links.)For estimating the network formation model, we consider ﬁve covariates; for their deﬁnitions, see Table5.2. The summary statistics of the covariates are provided in Table C.2 in Appendix C.2. With thesevariables, we consider the following payoﬀ function: u i,j ( q ) = (ln gdp_pc i )(ln gdp_pc j ) β , + | free i − free j | β , + { region i = region j } β , + ln( export ij + 1) β , + ln( import ij + 1) β , + α q + A ,i + B ,j − (cid:15) i,j , for i (cid:54) = j, where we assume that ( (cid:15) i,j , (cid:15) j,i ) have the standard bivariate normal distribution with correlation coeﬃcient ρ . To estimate our network formation model with grouped degree heterogeneity, we ﬁrst need to determinethe number of groups for the sender eﬀects { A ,i } and that for the receiver eﬀects { B ,i } , K A and K B ,respectively. Then, following Ke et al. (2016) and Wang and Su (2020), the optimal ( K A , K B ) is selected asthe minimizer of the BIC criterion: − (cid:98) L n ( (cid:98) δ n ) + (6 + K A + K B ) ln(1596) . Then, as a result of searchingover the models with ( K A , K B ) ∈ { , . . . , } , we ﬁnd that the model with ( K A , K B ) = (7 , achieves thesmallest BIC (see Table C.3 in Appendix C.2 for more detailed information), and this is the model reportedhere. For comparison, we estimate not only our proposed model, which we call the grouped heterogeneity19odel, but also a dyadic bivariate probit model without strategic interaction and degree heterogeneities asa benchmark. For the estimation of the group memberships, we employ the repartitioning method with twoiterations (i.e., BS2 in the previous section).Table 5.2: Deﬁnitions of VariablesVariables Deﬁnitions gdp_pc i GDP per capita in 2018 (1,000 USD) free i Freedom rating (1 = Most Free, 7 = Least Free) a region i Categorical variable: East Asia, Southeast Asia, Central Asia, Europe, Middle East, or Oceania. export ij Total export value from country i to j in 2018 (million USD) b import ij Total import value from country i to j in 2018 (million USD) b Sources: (a)

Freedom in the World 2018 , Freedom House ( https://freedomhouse.org/ ); (b) IMF DATA( https://data.imf.org/ ).The estimation results are summarized in Table 5.3. First of all, as expected, our proposed modelsuggests that there is a signiﬁcant strategic complementarity in the network formation behavior. We can alsoﬁnd a certain level of degree heterogeneity in terms of both the sender and the receiver eﬀects. Comparingthe grouped heterogeneity model and the benchmark model, the log-likelihood value for the former isapparently signiﬁcantly larger than that for the latter. This large diﬀerence in the degree of model ﬁttingalso demonstrates the signiﬁcance of strategic eﬀect and unobserved heterogeneity (note however that thesemodels are not nested). For speciﬁc parameter estimates, we can observe several non-negligible diﬀerencesbetween the two models. For example, the eﬀect of the export amount is predicted to be positive in thegrouped heterogeneity model, whereas the benchmark model predicts a signiﬁcantly negative impact. Theerror correlation parameter is not signiﬁcantly diﬀerent from zero in our model but is weakly positivelysigniﬁcant in the benchmark model. This result would be understandable since the benchmark model canaccount for the interdependence of the links only through the error correlation. Additionally, there are severalinteresting ﬁndings. For both models, if countries i and j are located in the same region, they become morelikely to allow visa-free access, as expected. Not only in terms of geographical proximity, but we can alsoobserve signiﬁcant homophily in terms of the political system.For the estimation results of country-speciﬁc eﬀects, we report the estimated group memberships inTable 5.4. As expected from the above discussion, countries such as Cambodia are indeed classiﬁed intothe highest group (i.e., Group 7) in terms of the sender eﬀect. The other two countries that have Group-7sender eﬀect are Nepal and Sri Lanka. For the receiver eﬀect, these two countries are classiﬁed as Group 2,and Cambodia is in Group 3. Overall, interestingly, there seems to be a weak negative correlation betweenthe sender eﬀects and the receiver eﬀects. As expected from the above discussion, Japan and South Koreaindeed belong to the group with the highest receiver eﬀect (i.e., Group 6). The magnitudes of the receivereﬀects seem to roughly correlate with the size of the countries’ economies (with some exceptions, such as20able 5.3: Estimation ResultsGrouped Heterogeneity Model Benchmark ModelEstimate t -value Estimate t -valueIntercept -0.208 -1.196Strategic eﬀect: α (ln gdp_pc j )(ln gdp_pc i ) : β | free i − free j | : β -0.229 -7.044 -0.075 -1.692 { region i = region j } : β ln( export ij + 1) : β ln( import ij + 1) : β ρ -0.080 -0.450 0.088 1.656Sender eﬀects: A Group 1: a a a a a a a B Group 1: b -5.670 -14.388Group 2: b -4.590 -16.159Group 3: b -3.902 -14.559Group 4: b -3.446 -14.240Group 5: b -2.561 -10.952Group 6: b -1.558 -6.722Log-likelihood -810.798 -1531.496 This paper proposed a network formation model with pairwise strategic interaction and grouped degree het-erogeneity. Assuming some parametric form for the error distribution, we proved that the model parameterscan be identiﬁed under the availability of agent-speciﬁc covariates that have large supports and also havevariations across all potential partners. For estimating the model, based on the same idea as in Bresnahanand Reiss (1990) and Berry (1992), we proposed the three-step ML procedure: in the ﬁrst-step, the model isestimated without considering the group structure; subsequently, we estimate the group memberships usingthe BS algorithm given the estimates for the heterogeneity parameters obtained in the ﬁrst step; and, ﬁnally,21able 5.4: Estimated Group Memberships

Sender Eﬀect : A Receiver Eﬀect : B Group 1 Australia, China, Iraq, Nauru, Oman, Russia Iraq, PakistanGroup 2 Bhutan, India, Japan, New Zealand, UAE Bangladesh, Iran, Jordan, Lebanon, Nepal, SriLanka, YemenGroup 3 Bahrain, Cyprus, Estonia, Georgia, Kiribati,Kuwait, Latvia, Lithuania, Mongolia, Myanmar,Papua New Guinea, Qatar, Saudi Arabia, SouthKorea, Tonga, Turkey, Ukraine, Viet Nam, Yemen Armenia, Cambodia, Laos, Myanmar, Qatar, Rus-sia, Viet NamGroup 4 Azerbaĳan, Belarus, Brunei, Fĳi, Hong Kong, Is-rael, Kazakhstan, Kyrgyzstan, Moldova, Pakistan,Singapore, Thailand, Uzbekistan Belarus, Bhutan, China, Fĳi, Georgia, In-dia, Indonesia, Kazakhstan, Kiribati, Kyrgyzstan,Moldova, Mongolia, Nauru, Papua New Guinea,Philippines, Tajikistan, Thailand, Tonga, Ukraine,UzbekistanGroup 5 Armenia, Bangladesh, Jordan, Lebanon, Malaysia,Philippines, Tajikistan, Vanuatu Azerbaĳan, Bahrain, Kuwait, Latvia, Lithuania,Oman, Saudi Arabia, Turkey, UAE, VanuatuGroup 6 Indonesia, Iran, Laos Australia, Brunei, Cyprus, Estonia, Hong Kong,Israel, Japan, Malaysia, New Zealand, Singapore,South KoreaGroup 7 Cambodia, Nepal, Sri Lanka based on the estimated group memberships, we re-estimate the model. Under certain regularity conditions,we showed that the proposed estimator is asymptotically unbiased and distributed as normal at the parametricrate. The results of the Monte Carlo simulations show that our estimator performs reasonably well in ﬁnitesamples. An empirical application to international visa-free travel networks indicates the usefulness of theproposed model.Several limitations and extensions are as follows. First, our approach can be used only in pairwisenetwork formation games with no network externalities to/from the rest of the links, and this limits theempirical applicability. Therefore, it would be worthwhile to extend our results to network formation modelswith general network externalities involving more than two agents. However, we conjecture that we wouldresort to partial identiﬁcation to achieve this. Second, our approach requires that the degree heterogeneityparameters have discrete support, although, in reality, it is possible that they are continuous. To address thisissue, it is of interest to modify our model in a similar manner to Bonhomme et al. (2017) and investigatethe three-step ML estimator in which K A and K B grows slowly to inﬁnity. Third, as our model is a dyadicbinary game model, where a pairwise network formation model is its special case, we can consider itsordered-response game version as a natural extension. For example, we might be interested in analyzingbilateral military relations: non-alliance, quasi-alliance, or alliance. We expect that such extension can berelatively easily achieved by adopting the ML estimator discussed in Aradillas-Lopez and Rosen (2019).Finally, related to the empirical application in this study, we might be interested in investigating the causaleﬀect of visa policies between two countries on the ﬂows of tourists between them; this is a dyadic treatmentevaluation problem when the treatment variable is determined strategically. To deal with such situations,22ombining the results of this study and the marginal treatment eﬀect framework developed in Hoshino andYanagi (2020) would be beneﬁcial. We leave these topics for future research.23 ppendix A Notations

Variables and parameters W i,j ≡ ( Z (cid:62) i,j , χ (cid:62) n,i , χ (cid:62) n,j ) (cid:62) θ ≡ ( β (cid:62) , α, ρ ) (cid:62) A ≡ ( A , . . . , A n ) (cid:62) , A − ≡ ( A , . . . , A n ) (cid:62) , B ≡ ( B , . . . , B n ) (cid:62) γ ≡ ( A (cid:62) , B (cid:62) ) (cid:62) , γ − ≡ ( A (cid:62)− B (cid:62) ) (cid:62) Π ≡ ( β (cid:62) , γ (cid:62) ) (cid:62) π i,j ≡ Z (cid:62) i,j β + A i + B j = W (cid:62) i,j Π δ ≡ ( θ (cid:62) , a , . . . , a K A , b , . . . , b K B ) (cid:62) . Functions and derivatives

Throughout this appendix, for notational simplicity, we denote ∂ a g ( a ) = ∂g ( a ) / ( ∂a ) , ∂ ab g ( a, b ) = ∂ g ( a, b ) / ( ∂a∂b ) , and so fourth. P i,j ( θ, γ ) ≡ F ( W (cid:62) i,j Π) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) (cid:96) i,j ( θ, γ ) ≡ y i,j ln P i,j ( θ, γ ) + y j,i ln P j,i ( θ, γ ) + (1 − y i,j − y j,i ) ln(1 − P i,j ( θ, γ ) − P j,i ( θ, γ )) L n ( θ, γ ) ≡ N n (cid:88) i =1 (cid:88) j>i (cid:96) i,j ( θ, γ ) p ,i,j ( θ, γ ) ≡ ∂ π i,j P i,j ( θ, γ ) = f ( W (cid:62) i,j Π) − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) p ,i,j ( θ, γ ) ≡ ∂ π j,i P i,j ( θ, γ ) = − H ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) H ρ ( · , · , ; ρ ) ≡ ∂ ρ H ( · , · ; ρ ) , where f is the derivative of F , and H l ( · , · ; ρ ) is the derivative of H ( · , · ; ρ ) with respect to the l -th argument. ∂ A k P i,j ( θ, γ ) = p ,i,j ( θ, γ ) { i = k } + p ,i,j ( θ, γ ) { j = k } ∂ B k P i,j ( θ, γ ) = p ,i,j ( θ, γ ) { j = k } + p ,i,j ( θ, γ ) { i = k } s ,i,j ( θ, γ ) ≡ ∂ π i,j (cid:96) i,j ( θ, γ ) = y i,j p ,i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i p ,j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ p ,i,j ( θ, γ ) + p ,j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) s ,i,j ( θ, γ ) ≡ ∂ π j,i (cid:96) i,j ( θ, γ ) = y i,j p ,i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i p ,j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ p ,i,j ( θ, γ ) + p ,j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) s A k i,j ( θ, γ ) ≡ ∂ A k (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) { i = k } + s ,i,j ( θ, γ ) { j = k } s B k i,j ( θ, γ ) ≡ ∂ B k (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) { j = k } + s ,i,j ( θ, γ ) { i = k } s A i,j ( θ, γ ) ≡ ∂ A − (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) χ n,i, − + s ,i,j ( θ, γ ) χ n,j, − s B i,j ( θ, γ ) ≡ ∂ B (cid:96) i,j ( θ, γ ) = s ,i,j ( θ, γ ) χ n,j + s ,i,j ( θ, γ ) χ n,i ξ i,j ( θ, γ ) ≡ ∂ θ P i,j ( θ, γ ) = (cid:2) p ,i,j ( θ, γ ) Z (cid:62) i,j + p ,i,j ( θ, γ ) Z (cid:62) j,i , p ,i,j ( θ, γ ) , − H ρ ( W (cid:62) i,j Π , W (cid:62) j,i Π + α ; ρ ) (cid:3) (cid:62) θi,j ( θ, γ ) ≡ ∂ θ (cid:96) i,j ( θ, γ ) = y i,j ξ i,j ( θ, γ ) P i,j ( θ, γ ) + y j,i ξ j,i ( θ, γ ) P j,i ( θ, γ ) − (1 − y i,j − y j,i )[ ξ i,j ( θ, γ ) + ξ j,i ( θ, γ )]1 − P i,j ( θ, γ ) − P j,i ( θ, γ ) , where χ n,i, − and χ n,j, − are ( n − × vectors deﬁned by removing the ﬁrst element of χ n,i and χ n,j , respectively.Using these notations, we can write S n,θ ( θ, γ ) ≡ ∂ θ L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i s θi,j ( θ, γ ) S n, γ ( θ, γ ) ≡ ∂ γ − L n ( θ, γ ) = 2 N n (cid:88) i =1 (cid:88) j>i (cid:2) s A i,j ( θ, γ ) (cid:62) , s B i,j ( θ, γ ) (cid:62) (cid:3) (cid:62) . Further, writing (cid:96) i,j ( δ ) ≡ (cid:96) i,j (cid:16) θ, (cid:110)(cid:80) K A k =1 a k · { i ∈ C A ,k } (cid:111) , (cid:110)(cid:80) K B k =1 b k · { i ∈ C B ,k } (cid:111)(cid:17) so that L n ( δ ) = N (cid:80) ni =1 (cid:80) j>i (cid:96) i,j ( δ ) ,we deﬁne s a k i,j ( δ ) ≡ ∂ a k (cid:96) i,j ( δ ) = s ,i,j ( δ ) { i ∈ C A ,k } + s ,i,j ( δ ) { j ∈ C A ,k } s b k i,j ( δ ) ≡ ∂ b k (cid:96) i,j ( δ ) = s ,i,j ( δ ) { j ∈ C B ,k } + s ,i,j ( δ ) { i ∈ C B ,k } s δi,j ( δ ) ≡ ∂ δ (cid:96) i,j ( δ ) = (cid:104) s θi,j ( δ ) (cid:62) , s a i,j ( δ ) , . . . , s a KA i,j ( δ ) , s b i,j ( δ ) , . . . , s b KB i,j ( δ ) (cid:105) (cid:62) , where the deﬁnitions of s ,i,j ( δ ) , s ,i,j ( δ ) , and s θi,j ( δ ) should be clear from the context. Hessian matrix

Deﬁne h ,i,j ( θ, γ ) ≡ ∂ π i,j s ,i,j ( θ, γ ) h ,i,j ( θ, γ ) ≡ ∂ π j,i s ,i,j ( θ, γ ) = ∂ π i,j s ,i,j ( θ, γ ) h ,i,j ( θ, γ ) ≡ ∂ π j,i s ,i,j ( θ, γ ) . It is easy to see that E h ,i,j ( θ , γ ) = − p ,i,j P i,j − p ,j,i P j,i − [ p ,i,j + p ,j,i ] − P i,j − P j,i E h ,i,j ( θ , γ ) = − p ,i,j p ,i,j P i,j − p ,j,i p ,j,i P j,i − [ p ,i,j + p ,j,i ][ p ,i,j + p ,j,i ]1 − P i,j − P j,i , (A.1)where we have used p ,i,j , p ,i,j , and P i,j to denote p ,i,j ( θ , γ ) , p ,i,j ( θ , γ ) , and P i,j ( θ , γ ) , respectively, forsimplicity. Hereinafter, when the dependence on the parameters ( θ, γ ) is suppressed, it means that the functions areevaluated at the true value ( θ , γ ) .Note that, since (cid:96) i,j ( θ, γ ) = (cid:96) j,i ( θ, γ ) , we have h ,i,j ( θ, γ ) = ∂ π i,j π i,j (cid:96) i,j ( θ, γ ) = ∂ π i,j π i,j (cid:96) j,i ( θ, γ ) = h ,j,i ( θ, γ ) and h ,i,j ( θ, γ ) = h ,j,i ( θ, γ ) . By tedious calculations, we have ∂ A l A k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,k,j ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l, k ≥ ∂ B l B k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,j,k ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l, k ≥ A l B k L n ( θ, γ ) = 2 N (cid:88) j (cid:54) = k h ,k,j ( θ, γ ) { l = k } + 2 N h ,l,k ( θ, γ ) { l (cid:54) = k } ( for l ≥ , k ≥ . Hence, H n, AA ( θ, γ ) ≡ ∂ A − A (cid:62)− L n ( θ, γ ) = 2 N  (cid:80) j (cid:54) =2 h , ,j ( θ, γ ) · · · h , ,n ( θ, γ ) ... . . . ... h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,n,j ( θ, γ )  H n, BB ( θ, γ ) ≡ ∂ BB (cid:62) L n ( θ, γ ) = 2 N  (cid:80) j (cid:54) =1 h ,j, ( θ, γ ) · · · h , ,n ( θ, γ ) ... . . . ... h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,j,n ( θ, γ )  H n, AB ( θ, γ ) ≡ ∂ A − B (cid:62) L n ( θ, γ ) = 2 N  h , , ( θ, γ ) (cid:80) j (cid:54) =2 h , ,j ( θ, γ ) · · · h , ,n ( θ, γ ) ... ... . . . ... h ,n, ( θ, γ ) h ,n, ( θ, γ ) · · · (cid:80) j (cid:54) = n h ,n,j ( θ, γ )  H n, γγ ( θ, γ ) = (cid:32) H n, AA ( θ, γ ) H n, AB ( θ, γ ) H n, BA ( θ, γ ) H n, BB ( θ, γ ) (cid:33) (A.2) B Proofs of Theorems

Proof of Theorem 2.3 (i) We ﬁrst conﬁrm that the true parameter vector ( θ , γ ) is a maximizer of E L n ( θ, γ ) . We can observe that E L n ( θ, γ ) − E L n ( θ , γ ) = 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i E (cid:8) ln (cid:2) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i (cid:3) − ln (cid:2) P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:3)(cid:9) = 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i E (cid:40) ln P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) ≤ N n (cid:88) i =1 (cid:88) j (cid:54) = i ln E (cid:40) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) , (B.1)where the last inequality follows from Jensen’s inequality. Further, E (cid:40) P i,j ( θ, γ ) y i,j P j,i ( θ, γ ) y j,i [1 − P i,j ( θ, γ ) − P j,i ( θ, γ )] − y i,j − y j,i P y i,j i,j P y j,i j,i [1 − P i,j − P j,i ] − y i,j − y j,i (cid:41) = E [ y i,j ] P i,j ( θ, γ ) P i,j + E [ y j,i ] P j,i ( θ, γ ) P j,i + E [1 − y i,j − y j,i ] 1 − P i,j ( θ, γ ) − P j,i ( θ, γ )1 − P i,j − P j,i = 1 , implying that the left-hand side term of (B.1) is less than or at most equal to zero for any given ( θ, γ ) . Then, since ρ s known, it is suﬃcient to show that N n (cid:88) i =1 (cid:88) j (cid:54) = i (cid:8) P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) (cid:54) = P i,j ( θ , γ ) (cid:9) > for suﬃciently large n and for all ( β, α, γ ) ∈ B × A × C n such that ( β, α, γ ) (cid:54) = ( β , α , γ ) . The existence of pairssatisfying P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) (cid:54) = P i,j ( θ , γ ) contributes to a non-negligible diﬀerence between E L n ( θ, γ ) and E L n ( θ , γ ) , allowing us to distinguish ( θ, γ ) and ( θ , γ ) . Here, by Assumptions 2.1(i) and (ii), F ( a ) − H ( a, b ; ρ ) is strictly increasing in a and decreasing in b , respectively. Therefore, we have W (cid:62) i,j Π > W (cid:62) i,j Π , W (cid:62) j,i Π + α < W (cid:62) j,i Π + α = ⇒ P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) > P i,j ( θ , γ ) W (cid:62) i,j Π < W (cid:62) i,j Π , W (cid:62) j,i Π + α > W (cid:62) j,i Π + α = ⇒ P i,j (( β (cid:62) , α, ρ ) (cid:62) , γ ) < P i,j ( θ , γ ) . Then, Assumption 2.4 gives the desired result.(ii) Since ( θ , γ ) is a maximizer of E L n ( θ, γ ) as conﬁrmed above, ρ must be a maximizer of L ∗ n ( ρ ) , wherethe deﬁnition of L ∗ n ( ρ ) can be found in (2.5), and ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) = ( β , α , γ ) holds. For all ρ ∈ R , E L n (( β (cid:62) , α, ρ ) (cid:62) , γ ) − E L n (( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , ρ ) (cid:62) , (cid:101) γ ( ρ )) ≤ holds by deﬁnition. Then, by the same argument as inthe proof of (i), we can identify ( (cid:101) β ( ρ ) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ )) uniquely for all ρ ∈ R as Assumption 2.4 is independent of thevalue of ρ . Thus, if ρ is identiﬁed as a unique maximizer of L ∗ n ( ρ ) , all the parameters of the model are identiﬁed. Proof of Theorem 3.1 (i) First, note that Assumptions 2.1–2.3 imply that there exist constants κ , κ ∈ (0 , such that P i,j ( θ, γ ) ∈ ( κ , − κ ) and − P i,j ( θ, γ ) − P j,i ( θ, γ ) ∈ ( κ , − κ ) for all possible parameter values. Observe that L n ( θ, γ ) − E L n ( θ, γ )= 1 N n (cid:88) i =1 (cid:88) j (cid:54) = i [2( y i,j − E y i,j ) ln P i,j ( θ, γ ) − y i,j − E y i,j ) ln (1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))]= 2 N n (cid:88) i =1 (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) , where ψ i,j ( θ, γ ) ≡ ln [ P i,j ( θ, γ ) / (1 − P i,j ( θ, γ ) − P j,i ( θ, γ ))] . Let ¯ ψ ≡ ln((1 − κ ) /κ ) , so that − (1 − κ ) ¯ ψ < ( y i,j − E y i,j ) ψ i,j ( θ, γ ) < (1 − κ ) ¯ ψ, where the inequalities are uniform in ( θ, γ ) ∈ Θ × C n . By the triangle inequality, |L n ( θ, γ ) − E L n ( θ, γ ) | ≤ n n (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Further, by Hoeﬀding’s inequality, Pr (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t  ≤ (cid:32) − n − t (cid:80) j (cid:54) = i (2(1 − κ ) ¯ ψ ) (cid:33) (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) . Hence, Boole’s inequality gives Pr  max ≤ i ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = i ( y i,j − E y i,j ) ψ i,j ( θ, γ ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > t  ≤ n exp (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) . Setting t = C (cid:112) ln n/n for a suﬃciently large constant C > , we have n exp (cid:18) − ( n − t − κ ) ¯ ψ (cid:19) = 2 n exp (cid:18) − n − − κ ) ¯ ψ C ln nn (cid:19) = 2 exp (cid:18) ln n − (cid:18) C ( n − /n − κ ) ¯ ψ (cid:19) ln n (cid:19) → as n → ∞ . This implies that sup ( θ, γ ) ∈ Θ × C n |L n ( θ, γ ) − E L n ( θ, γ ) | = O P (cid:32)(cid:114) ln nn (cid:33) . (B.2)Then, with Assumption 3.1, the rest of the proof follows from the same argument as in the proof of Theorem 2 inGraham (2017).(ii) (iii) We prove the result by contradiction. Suppose that there exists a positive constant c such that max (cid:40) n n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) A n,i − A ,i (cid:12)(cid:12)(cid:12) , n n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) B n,i − B ,i (cid:12)(cid:12)(cid:12)(cid:41) ≥ c > w.p.a.1. Under Assumption 2.2(ii), this implies that there is a non-vanishing potion of observations with either or both (cid:98) A n,i and (cid:98) B n,i being not in the neighborhood of A ,i and B ,i , respectively. Therefore, by Assumption 3.1, there exista constant η ( c ) > and n ( c ) < ∞ such that η ( c ) < E L n ( θ , A , B ) − E L n ( θ , (cid:98) A n , (cid:98) B n ) (B.3)for all n ≥ n ( c ) . Note that (B.2) implies that E L n ( θ , A , B ) < L n ( θ , A , B ) + η ( c ) (B.4)w.p.a.1. By the deﬁnition of the ML estimator, L n ( θ , A , B ) < L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) + η ( c ) . (B.5)In addition, by the continuous mapping theorem and result (i), we have L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) < L n ( θ , (cid:98) A n , (cid:98) B n ) + η ( c ) (B.6) .p.a.1. Now, combining the inequalities (B.3)–(B.6) gives E L n ( θ , (cid:98) A n , (cid:98) B n ) < E L n ( θ , A , B ) − η ( c ) < L n ( θ , A , B ) − η ( c ) < L n ( (cid:98) θ n , (cid:98) A n , (cid:98) B n ) − η ( c ) < L n ( θ , (cid:98) A n , (cid:98) B n ) − η ( c ) w.p.a.1. The last line implies that η ( c ) < L n ( θ , (cid:98) A n , (cid:98) B n ) − E L n ( θ , (cid:98) A n , (cid:98) B n ) w.p.a.1; however, this contradicts with(B.2). Hence, as the choice of c is arbitrary, we obtain the desired result.(iv) Note that, for each i ( i (cid:54) = 1 ), it holds that (cid:98) A n,i = argmax A i ∈ A L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) , A ,i = argmax A i ∈ A E L n ( θ , A i , A , − i , B ) , where A − i ≡ ( A , . . . , A i − , A i +1 , . . . , A n ) (cid:62) , and L n ( θ, A i , A − i , B ) = L n ( θ, A , B ) . Pick any c > , and let A ci ≡ { A ∈ A : | A − A ,i | ≥ c } . Deﬁne ε n ( c ) as follows: ε n ( c ) ≡ min ≤ i ≤ n (cid:20) E L n ( θ , A ,i , A , − i , B ) − max A i ∈ A ci E L n ( θ , A i , A , − i , B ) (cid:21) . By Assumption 3.1, there exists n ( c ) < ∞ such that ε n ( c ) is strictly larger than zero for all n ≥ n ( c ) . By the deﬁnitionof (cid:98) A n,i , we have L n ( (cid:98) θ n , (cid:98) A n,i , (cid:98) A n, − i , (cid:98) B n ) > L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / . (B.7)By the triangle inequality, (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , (cid:98) A n, − i , (cid:98) B n ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) L n ( θ , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , (cid:98) B n ) (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) L n ( θ , A i , A , − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) ∂ A (cid:62)− i L n ( θ , A i , ¯ A n, − i , (cid:98) B n )[ (cid:98) A n, − i − A , − i ] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) ∂ B (cid:62) L n ( θ , A i , A , − i , ¯ B n )[ (cid:98) B n − B ] (cid:12)(cid:12)(cid:12) + o P (1) , where the second inequality follows from the mean value expansion with result (i), ¯ A n, − i ∈ [ (cid:98) A n, − i , A , − i ] , and B n ∈ [ (cid:98) B n , B ] . Here, the ﬁrst term on the right-hand side has the following form: ∂ A (cid:62)− k L n ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i ∂ A (cid:62)− k (cid:96) i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n ) χ (cid:62) n,i, − k [ (cid:98) A n, − k − A , − k ] + 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n ) χ (cid:62) n,j, − k [ (cid:98) A n, − k − A , − k ]= 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,i − A ,i ] { i (cid:54) = k } + 2 N n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,j − A ,j ] { j (cid:54) = k } = 2 N n (cid:88) i =1 (cid:88) j (cid:54) = i s ,i,j ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n,i − A ,i ] { i (cid:54) = k } , where χ n,i, − k and χ n,j, − k are ( n − × vectors deﬁned by removing the k -th element of χ n,i and χ n,j , respectively,and the last equality holds because s ,i,j ( θ, A , B ) = ∂ π j,i (cid:96) i,j ( θ, A , B ) = ∂ π j,i (cid:96) j,i ( θ, A , B ) = s ,j,i ( θ, A , B ) . Then,for a constant c > independent of A k and k , we have (cid:12)(cid:12)(cid:12) ∂ A (cid:62)− k L n ( θ , A k , ¯ A n, − k , (cid:98) B n )[ (cid:98) A n, − k − A , − k ] (cid:12)(cid:12)(cid:12) ≤ cn n (cid:88) i =1 (cid:12)(cid:12)(cid:12) (cid:98) A n,i − A ,i (cid:12)(cid:12)(cid:12) = o P (1) by result (ii). Based on the same argument, we can also show that (cid:12)(cid:12)(cid:12) ∂ B (cid:62) L n ( θ , A i , A , − i , ¯ B n )[ (cid:98) B n − B ] (cid:12)(cid:12)(cid:12) = o P (1) ,implying that (cid:12)(cid:12)(cid:12) L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) = o P (1) uniformly in A i ∈ A and i . Similarly, we can show that (cid:12)(cid:12)(cid:12) E L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − E L n ( θ , A i , A , − i , B ) (cid:12)(cid:12)(cid:12) = o P (1) . Hence, the following inequalities hold w.p.a.1: L n ( θ , A i , A , − i , B ) > L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / (B.8) E L n ( (cid:98) θ n , A i , (cid:98) A n, − i , (cid:98) B n ) > E L n ( θ , A i , A , − i , B ) − ε n ( c ) / (B.9)uniformly in A i ∈ A and i . In addition, (B.2) implies that E L n ( θ , (cid:98) A n,i , A , − i , B ) > L n ( θ , (cid:98) A n,i , A , − i , B ) − ε n ( c ) / (B.10) L n ( (cid:98) θ n , A ,i , (cid:98) A , − i , (cid:98) B n ) > E L n ( (cid:98) θ n , A ,i , (cid:98) A , − i , (cid:98) B n ) − ε n ( c ) / (B.11)w.p.a.1. Then, combining the inequalities (B.7) and (B.8)–(B.11) yields E L n ( θ , (cid:98) A n,i , A , − i , B ) > L n ( θ , (cid:98) A n,i , A , − i , B ) − ε n ( c ) / > L n ( (cid:98) θ n , (cid:98) A n,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / > L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / > E L n ( (cid:98) θ n , A ,i , (cid:98) A n, − i , (cid:98) B n ) − ε n ( c ) / E L n ( θ , A ,i , A , − i , B ) − ε n ( c )= max A i ∈ A ci E L n ( θ , A i , A , − i , B )+ (cid:20) E L n ( θ , A ,i , A , − i , B ) − max A i ∈ A ci E L n ( θ , A i , A , − i , B ) (cid:21) − ε n ( c ) (cid:124) (cid:123)(cid:122) (cid:125) ≥ ≥ max A i ∈ A ci E L n ( θ , A i , A , − i , B ) w.p.a.1 for all i . The last line implies that (cid:98) A n,i / ∈ A ci . As the choice of c is arbitrary, this further implies that max ≤ i ≤ n | (cid:98) A n,i − A ,i | p → . Analogously, we can also show that max ≤ i ≤ n | (cid:98) B n,i − B ,i | p → . Lemma B.1.

For any ( θ, γ ) ∈ Θ × C n such that || θ − θ || = o (1) and || γ − γ || ∞ = o (1) ,(i) max ≤ k ≤ n (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,k,j ( θ, γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12) = o P (1) ,(ii) max ≤ k ≤ n (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,j,k ( θ, γ ) − E h ,j,k ( θ , γ )) (cid:12)(cid:12)(cid:12) = o P (1) .Proof. We only prove (i) since (ii) is completely analogous. By the triangle inequality, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ , γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (B.12)With Assumption 2.1(iii), the mean value expansion gives h ,k,j ( θ, γ ) − h ,k,j ( θ , γ ) = h ,k,j ( θ, γ ) − h ,k,j ( θ , γ ) + h ,k,j ( θ , γ ) − h ,k,j ( θ , γ )= ∂ θ (cid:62) h ,k,j (¯ θ, γ )[ θ − θ ] + ∂ A (cid:62)− h ,k,j ( θ, ¯ γ )[ A − − A , − ] + ∂ B (cid:62) h ,k,j ( θ, ¯ γ )[ B − B ] , where ¯ θ ∈ [ θ, θ ] , and ¯ γ ∈ [ γ , γ ] . Further, letting h ,k,j ( θ, γ ) ≡ ∂ π k,j h ,k,j ( θ, γ ) and h ,k,j ( θ, A ) ≡ ∂ π j,k h ,k,j ( θ, γ ) , we have ∂ A (cid:62)− h ,k,j ( θ, ¯ γ )[ A − − A , − ] = h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k, − [ A − − A , − ] + h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j, − [ A − − A , − ] ∂ B (cid:62) h ,k,j ( θ, ¯ γ )[ B − B ] = h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j [ B − B ] + h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k [ B − B ] . Then, for some large constant c > , (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ∂ θ (cid:62) h ,k,j (¯ θ, γ )[ θ − θ ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k, − [ A − − A , − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j, − [ A − − A , − ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,j [ B − B ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k h ,k,j ( θ, ¯ γ ) χ (cid:62) n,k [ B − B ] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ c (cid:107) θ − θ (cid:107) + 2 c (cid:107) A − A (cid:107) ∞ + 2 c (cid:107) B − B (cid:107) ∞ . As the right-hand side term in the last line is independent of k , we have (cid:12)(cid:12)(cid:12) n − (cid:80) j (cid:54) = k ( h ,k,j ( θ, γ ) − h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12) = o (1) for all k for any ( θ, γ ) such that || θ − θ || = o (1) and || γ − γ || ∞ = o (1) .For the second term of (B.12), note that the random components involved in h ,k,j ( θ , γ ) are only ( y k,j , y j,k ) and, thus, that { h ,k,j ( θ , γ ) } j (cid:54) = k are independent by Assumption 2.1(ii). Further, as h ,k,j ( θ , γ ) is uniformlybounded, using Hoeﬀding’s and Boole’s inequalities similarly as above, we can show that max ≤ k ≤ n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k ( h ,k,j ( θ , γ ) − E h ,k,j ( θ , γ )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (cid:32)(cid:114) ln nn (cid:33) . This completes the proof.

Lemma B.2. (i) (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ = o P (1) for any θ ∈ Θ such that || θ − θ || = o (1) ,(ii) n (cid:80) ni =1 (cid:12)(cid:12)(cid:12) (cid:101) A n,i ( θ ) − (cid:101) A ,i ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) ,(iii) n (cid:80) ni =1 (cid:12)(cid:12)(cid:12) (cid:101) B n,i ( θ ) − (cid:101) B ,i ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) ,(iv) (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ = O P ( (cid:112) ln n/n ) .Proof. (i) By the triangle inequality, (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ ≤ (cid:107) (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ + (cid:107) (cid:101) γ ( θ ) − (cid:101) γ ( θ ) (cid:107) ∞ . For the ﬁrst term on the right-hand side, the same argument as in the proof of Theorem 3.1(iv) achieves || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || ∞ = o P (1) for any θ in the neighborhood of θ under Assumption 3.2. For the second term, Assumption 3.2and Berge’s theorem implies that every element of (cid:101) γ ( θ ) is continuous in the neighborhood of θ (see, e.g., CorollaryA4.8, Kreps, 2012). Thus, || (cid:101) γ ( θ ) − (cid:101) γ ( θ ) || ∞ = o (1) holds.(ii) (iii) By the ﬁrst-order condition and the mean value expansion, (2 n − × = n · S n, γ ( θ , (cid:101) γ n ( θ ))= n · S n, γ − ( − n · H n, γγ ( θ , ¯ γ n )) [ (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ )] , where ¯ γ n ∈ [ (cid:101) γ n ( θ ) , (cid:101) γ ( θ )] . Then, by result (i) and Assumption 3.3(i), we have (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ ) = ( − n · H n, γγ ( θ , ¯ γ n )) − n · S n, γ (B.13)w.p.a.1; thus, || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = || (cid:101) γ n, − ( θ ) − (cid:101) γ , − ( θ ) || ≤ O P (1) · || n · S n, γ || . urther, observe that E || n · S n, γ || = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) s A i,j ( θ , γ ) (cid:62) s A k,l ( θ , γ ) + s B i,j ( θ , γ ) (cid:62) s B k,l ( θ , γ ) (cid:3) = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,k,l χ n,k, − + s ,k,l χ n,l, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,k,l χ n,l + s ,k,l χ n,k ) (cid:3) = 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,i,j χ n,i, − + s ,i,j χ n,j, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,i, − + s ,i,j χ (cid:62) n,j, − )( s ,j,i χ n,j, − + s ,j,i χ n,i, − ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,i,j χ n,j + s ,i,j χ n,i ) (cid:3) + 4( n − n (cid:88) i =1 (cid:88) j>i E (cid:2) ( s ,i,j χ (cid:62) n,j + s ,i,j χ (cid:62) n,i )( s ,j,i χ n,i + s ,j,i χ n,j ) (cid:3) = O (1) by Assumption 2.1(ii). Then, it holds that || n · S n, γ || = O P (1) by Markov’s inequality; thus, we have || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = O P (1) . Finally, by the basic norm inequality, it holds that n (cid:88) i =1 | (cid:101) A n,i ( θ ) − (cid:101) A ,i ( θ ) | + n (cid:88) i =1 | (cid:101) B n,i ( θ ) − (cid:101) B ,i ( θ ) | ≤ √ n − || (cid:101) γ n ( θ ) − (cid:101) γ ( θ ) || = O P ( √ n ) , which gives the desired result.(iv) Let γ − k = ( A (cid:62)− k , B (cid:62) ) (cid:62) for k (cid:54) = 1 and write L n ( θ, A k , γ − k ) as L n ( θ, γ ) . By the ﬁrst-order condition andmean value expansion, n · ∂ A k L n ( θ , (cid:101) A n,k ( θ ) , (cid:101) γ n, − k ( θ ))= 2 n − n (cid:88) i =1 (cid:88) j>i s A k i,j + n · ∂ A k L n ( θ , (cid:101) A n,k ( θ ) , (cid:101) γ n, − k ( θ )) − n · ∂ A k L n ( θ , (cid:101) A ,k ( θ ) , (cid:101) γ n, − k ( θ ))+ n · ∂ A k L n ( θ , A ,k , (cid:101) γ n, − k ( θ )) − n · ∂ A k L n ( θ , A ,k , (cid:101) γ , − k ( θ ))= 2 n − n (cid:88) i =1 (cid:88) j>i s A k i,j + 2 n − (cid:88) j (cid:54) = k h ,k,j ( θ , ¯ A n,k , (cid:101) γ n, − k ( θ ))[ (cid:101) A n,k ( θ ) − (cid:101) A ,k ( θ )]+ 2 n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) γ n, − k ( θ ) − (cid:101) γ , − k ( θ )] , where ¯ A n,k ∈ [ (cid:101) A n,k ( θ ) , (cid:101) A ,k ( θ )] , and ¯ γ n, − k ∈ [ (cid:101) γ n, − k ( θ ) , (cid:101) γ , − k ( θ )] . In view of (A.1), Lemma B.1(i), and result(i) imply that n − (cid:80) j (cid:54) = k h ,k,j ( θ , ¯ A n,k , (cid:101) γ n, − k ( θ )) is bounded and away from zero w.p.a.1 uniformly in k . Then, (cid:12)(cid:12)(cid:12) (cid:101) A n,k ( θ ) − (cid:101) A ,k ( θ ) (cid:12)(cid:12)(cid:12) ≤ ( c + o p (1)) {| T ,n,k | + | T ,n,k |} or some c > , where T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i s A k i,j ,T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) γ n, − k ( θ ) − (cid:101) γ , − k ( θ )]= 2 n − n (cid:88) i =1 (cid:88) j>i ∂ A (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) A n, − k ( θ ) − (cid:101) A , − k ( θ )]+ 2 n − n (cid:88) i =1 (cid:88) j>i ∂ B (cid:62) s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:101) B n ( θ ) − (cid:101) B ( θ )] ≡ T ,n,k + T ,n,k , say.First, observe that T ,n,k = 2 n − n (cid:88) i =1 (cid:88) j>i s ,i,j { i = k } + 2 n − n (cid:88) i =1 (cid:88) j>i s ,i,j { j = k } = 2 n − (cid:88) j>k s ,k,j + 2 n − (cid:88) ji ∂ A (cid:62)− k s A k i,j ( θ, γ ) = n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ, γ ) { i = k } χ (cid:62) n,i, − k + n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ, γ ) { i = k } χ (cid:62) n,j, − k + n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ, γ ) { j = k } χ (cid:62) n,i, − k + n (cid:88) i =1 (cid:88) j>i s ,i,j ( θ, γ ) { j = k } χ (cid:62) n,j, − k = (cid:88) j>k s ,k,j ( θ, γ ) χ (cid:62) n,j, − k + (cid:88) j , the triangle nequality gives | T ,n,k | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n − (cid:88) j (cid:54) = k s ,k,j ( θ , A ,k , ¯ γ n, − k ) χ (cid:62) n,j, − k [ (cid:101) A n, − k ( θ ) − (cid:101) A , − k ( θ )] (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ cn − (cid:88) j (cid:54) = k (cid:12)(cid:12)(cid:12) χ (cid:62) n,j, − k [ (cid:101) A n, − k ( θ ) − (cid:101) A , − k ( θ )] (cid:12)(cid:12)(cid:12) = cn − (cid:88) j (cid:54) = k (cid:12)(cid:12)(cid:12) (cid:101) A n,j ( θ ) − (cid:101) A ,j ( θ ) (cid:12)(cid:12)(cid:12) ≤ cn − n (cid:88) j =1 (cid:12)(cid:12)(cid:12) (cid:101) A n,j ( θ ) − (cid:101) A ,j ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) by result (ii). Note that the last inequality is independent of k . Analogously, we can show that | T ,n,k | = O P ( n − / ) uniformly in k by result (iii). Thus, we have max ≤ k ≤ n | T ,n,k | = O P ( n − / ) . (B.15)Combining (B.14) and (B.15) yields max ≤ k ≤ n | (cid:101) A n,k ( θ ) − (cid:101) A ,k ( θ ) | = O P ( (cid:112) ln n/n ) + O P ( n − / ) .Similarly as above, using Lemma B.1(ii), we can also show that max ≤ k ≤ n | (cid:101) B n,k ( θ ) − (cid:101) B ,k ( θ ) | = O P ( (cid:112) ln n/n )+ O P ( n − / ) . This completes the proof. Lemma B.3. || (cid:98) θ n − θ || = O P ( n − / ) .Proof. Applying the implicit function theorem to S n, γ ( θ, (cid:101) γ n ( θ )) = ∂ γ L n ( θ, (cid:101) γ n ( θ )) = (2 n − × for θ ∈ Θ yields (2 n − × ( d z +2) = ∂ θ (cid:62) S n, γ ( θ, (cid:101) γ n ( θ ))= H n, γ θ ( θ, (cid:101) γ n ( θ )) + H n, γγ ( θ, (cid:101) γ n ( θ )) ∂ θ (cid:62) (cid:101) γ n ( θ )= ⇒ ∂ θ (cid:62) (cid:101) γ n ( θ ) = − [ n · H n, γγ ( θ, (cid:101) γ n ( θ ))] − n · H n, γ θ ( θ, (cid:101) γ n ( θ )) , where the right-hand side exists w.p.a.1 for θ in a neighborhood of θ by Lemma B.2(i) and Assumption 3.3(i). Then,by the second-order Taylor expansion, ≤ L n ( (cid:98) θ n , (cid:101) γ n ( (cid:98) θ n )) − L n ( θ , (cid:101) γ n ( θ ))= S n,θ ( θ , (cid:101) γ n ( θ )) (cid:62) ( (cid:98) θ n − θ )+ 12 ( (cid:98) θ n − θ ) (cid:62) (cid:2) H n,θθ (¯ θ n , (cid:101) γ n (¯ θ n )) + H n,θ γ (¯ θ n , (cid:101) γ n (¯ θ n )) { ∂ θ (cid:62) (cid:101) γ n (¯ θ n ) } (cid:3) ( (cid:98) θ n − θ ) ≤ (cid:107)S n,θ ( θ , (cid:101) γ n ( θ )) (cid:107) · (cid:13)(cid:13)(cid:13)(cid:98) θ n − θ (cid:13)(cid:13)(cid:13) − λ min (cid:0) −I n,θθ (¯ θ n , (cid:101) γ n (¯ θ n )) (cid:1) · (cid:13)(cid:13)(cid:13)(cid:98) θ n − θ (cid:13)(cid:13)(cid:13) = ⇒ (cid:13)(cid:13)(cid:13)(cid:98) θ n − θ (cid:13)(cid:13)(cid:13) ≤ (cid:107)S n,θ ( θ , (cid:101) γ n ( θ )) (cid:107) λ min (cid:0) −I n,θθ (¯ θ n , (cid:101) γ n (¯ θ n )) (cid:1) , where ¯ θ n ∈ [ (cid:98) θ n , θ ] . Here, since Theorem 3.1(i) and Lemma B.2(i) imply that (cid:101) γ n (¯ θ n ) is uniformly consistent for γ , λ min (cid:0) −I n,θθ (¯ θ n , (cid:101) γ n (¯ θ n )) (cid:1) > c θ w.p.a.1 by Assumption 3.3(ii). Thus, it suﬃces to show that ||S n,θ ( θ , (cid:101) γ n ( θ )) || = O P ( n − / ) . e decompose S n,θ ( θ , (cid:101) γ n ( θ )) into the following two terms: S n,θ ( θ , (cid:101) γ n ( θ )) = 2 N n (cid:88) i =1 (cid:88) j>i s θi,j ( θ , γ ) + 2 N n (cid:88) i =1 (cid:88) j>i [ s θi,j ( θ , (cid:101) γ n ( θ )) − s θi,j ( θ , (cid:101) γ ( θ ))] ≡ s θ ,n + s θ ,n , say.Since E s θ ,n = ( d z +2) × , by Assumption 2.1(ii),Var (cid:2) s θ ,n (cid:3) = 4 N n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k E [ s θi,j s θ (cid:62) k,l ]= 4 N n (cid:88) i =1 (cid:88) j>i E [ s θi,j s θ (cid:62) i,j + s θi,j s θ (cid:62) j,i ] = O ( n − ) . Thus, by Chebyshev’s inequality, || s θ ,n || = O P ( n − ) . For s θ ,n , observe that there exist bounded functions s θ ,i,j ( θ, γ ) ≡ ∂ π i,j s θi,j ( θ, γ ) and s θ ,i,j ( θ, γ ) ≡ ∂ π j,i s θi,j ( θ, γ ) , such that s θi,j ( θ , (cid:101) γ n ( θ )) − s θi,j ( θ , (cid:101) γ ( θ ))= ∂ A (cid:62)− s θi,j ( θ , ¯ γ n )[ (cid:101) A n, − ( θ ) − (cid:101) A , − ( θ )] + ∂ B (cid:62) s θi,j ( θ , ¯ γ n )[ (cid:101) B n ( θ ) − (cid:101) B ( θ )]= s θ ,i,j ( θ , ¯ γ n ) χ (cid:62) n,i, − [ (cid:101) A n, − ( θ ) − (cid:101) A , − ( θ )] + s θ ,i,j ( θ , ¯ γ n ) χ (cid:62) n,j, − [ (cid:101) A n, − ( θ ) − (cid:101) A , − ( θ )]+ s θ ,i,j ( θ , ¯ γ n ) χ (cid:62) n,j [ (cid:101) B n ( θ ) − (cid:101) B ( θ )] + s θ ,i,j ( θ , ¯ γ n ) χ (cid:62) n,i [ (cid:101) B n ( θ ) − (cid:101) B ( θ )] ≡ t ,i,j + t ,i,j + t ,i,j + t ,i,j , say, (B.16)where ¯ γ n ∈ [ (cid:101) γ n ( θ ) , (cid:101) γ ( θ )] . Thus, for some c > , (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) N n (cid:88) i =1 (cid:88) j>i t ,i,j (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) ≤ cN n (cid:88) i =1 (cid:88) j>i (cid:12)(cid:12)(cid:12) χ (cid:62) n,i, − [ (cid:101) A n, − ( θ ) − (cid:101) A , − ( θ )] (cid:12)(cid:12)(cid:12) = cn n (cid:88) i =2 (cid:12)(cid:12)(cid:12) (cid:101) A n,i ( θ ) − (cid:101) A ,i ( θ ) (cid:12)(cid:12)(cid:12) = O P ( n − / ) by Lemma B.2(ii). Similarly, it is straightforward to see that || N (cid:80) ni =1 (cid:80) j>i t ,i,j || = O P ( n − / ) and that || N (cid:80) ni =1 (cid:80) j>i t ,i,j || = O P ( n − / ) and || N (cid:80) ni =1 (cid:80) j>i t ,i,j || = O P ( n − / ) by Lemma B.2(iii). Hence, wehave || s θ ,n || = O P ( n − / ) , and this completes the proof. Proof of Theorem 3.2 (i) (ii) By the ﬁrst-order condition and the mean value expansion, (2 n − × = n · S n, γ ( (cid:98) θ n , (cid:98) γ n )= n · S n, γ + n · S n, γ ( (cid:98) θ n , (cid:98) γ n ) − n · S n, γ ( θ , (cid:98) γ n ) + n · S n, γ ( θ , (cid:98) γ n ) − n · S n, γ ( θ , γ )= n · S n, γ + n · ∂ θ (cid:62) S n, γ (¯ θ n , (cid:98) γ n )[ (cid:98) θ n − θ ] − ( − n · H n, γγ ( θ , ¯ γ n )) [ (cid:98) γ n, − − γ , − ] , here ¯ θ n ∈ [ (cid:98) θ n , θ ] , and ¯ γ n ∈ [ (cid:98) γ n , γ ] . Thus, under Assumption 3.3(i), (cid:98) γ n, − − γ , − = ( − n · H n, γγ ( θ , ¯ γ n )) − n · S n, γ + ( − n · H n, γγ ( θ , ¯ γ n )) − n · ∂ θ (cid:62) S n, γ (¯ θ n , (cid:98) γ n )[ (cid:98) θ n − θ ] . As shown in the proof of Lemma B.2(ii)–(iii), || n · S n, γ || = O P (1) . For the second term, noting that ∂ θ (cid:62) S n, γ ( θ, γ ) = N (cid:80) ni =1 (cid:80) j>i [ ∂ A (cid:62)− s θi,j ( θ, γ ) , ∂ B (cid:62) s θi,j ( θ, γ )] (cid:62) . Hence, (cid:107) n · ∂ θ (cid:62) S n, γ ( θ, γ ) (cid:107) = n · tr (cid:110) [ ∂ θ (cid:62) S n, γ ( θ, γ )] (cid:62) ∂ θ (cid:62) S n, γ ( θ, γ ) (cid:111) = tr  n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k (cid:104) ∂ A (cid:62)− s θi,j ( θ, γ ) (cid:105) (cid:104) ∂ A (cid:62)− s θk,l ( θ, γ ) (cid:105) (cid:62)  + tr  n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k (cid:2) ∂ B (cid:62) s θi,j ( θ, γ ) (cid:3) (cid:2) ∂ B (cid:62) s θk,l ( θ, γ ) (cid:3) (cid:62)  ≡ u A n ( θ, γ ) + u B n ( θ, γ ) , say.Further, u A n ( θ, γ ) = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k tr (cid:8)(cid:2) s θ ,i,j ( θ, γ ) χ (cid:62) n,i, − + s θ ,i,j ( θ, γ ) χ (cid:62) n,j, − (cid:3) (cid:2) χ n,k, − s θ ,k,l ( θ, γ ) (cid:62) + χ n,l, − s θ ,k,l ( θ, γ ) (cid:62) (cid:3)(cid:9) = 4( n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k (cid:2) χ (cid:62) n,i, − χ n,k, − · s θ ,k,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ ) + χ (cid:62) n,j, − χ n,k, − · s θ ,k,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ )+ χ (cid:62) n,i, − χ n,l, − · s θ ,k,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ ) + χ (cid:62) n,j, − χ n,l, − · s θ ,k,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ ) (cid:3) . Noting that χ (cid:62) n,i, − χ n,k, − = { i = k > } , we have n − n (cid:88) i =1 (cid:88) j>i n (cid:88) k =1 (cid:88) l>k χ (cid:62) n,i, − χ n,k, − · s θ ,k,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ ) = 4( n − n (cid:88) i =2 (cid:88) j>i (cid:88) l>i s θ ,i,l ( θ, γ ) (cid:62) s θ ,i,j ( θ, γ )= O ( n ) for any ( θ, γ ) ∈ Θ × C n . Applying the same discussion to the other terms, we obtain u A n ( θ, γ ) = O ( n ) . By the sameargument, we can easily show that u B n ( θ, γ ) = O ( n ) for any ( θ, γ ) ∈ Θ × C n . Then, combined with Lemma B.3, weobtain || n · ∂ θ (cid:62) S n, γ (¯ θ n , (cid:98) γ n )[ (cid:98) θ n − θ ] || ≤ || n · ∂ θ (cid:62) S n, γ (¯ θ n , (cid:98) γ n ) || · || (cid:98) θ n − θ || = O P (1) .From these results, under Assumption 3.3(i), it holds that || (cid:98) γ n − γ || = O P (1) . Finally, we obtain the desiredresult by the basic norm inequality, as in the proof of Lemma B.2(ii)–(iii).(iii) Recall that, as in the proof of Lemma B.2(iv), we write L n ( θ, A k , γ − k ) = L n ( θ, γ ) for k (cid:54) = 1 . By theﬁrst-order condition and mean value expansion, n · ∂ A k L n ( (cid:98) θ n , (cid:98) A n,k , (cid:98) γ n, − k )= 2 n − n (cid:88) i =1 (cid:88) j>i s A k i,j + n · ∂ A k L n ( (cid:98) θ n , (cid:98) A n,k , (cid:98) γ n, − k ) − n · ∂ A k L n ( (cid:98) θ n , A ,k , (cid:98) γ n, − k )+ n · ∂ A k L n ( (cid:98) θ n , A ,k , (cid:98) γ n, − k ) − n · ∂ A k L n ( θ , A ,k , (cid:98) γ n, − k ) + n · ∂ A k L n ( θ , A ,k , (cid:98) γ n, − k ) − n · ∂ A k L n ( θ , A ,k , γ , − k ) n − n (cid:88) i =1 (cid:88) j>i s A k i,j + 2 n − (cid:88) j (cid:54) = k h ,k,j ( (cid:98) θ n , ¯ A n,k , (cid:98) γ n, − k )[ (cid:98) A n,k − A ,k ]+ 2 n − n (cid:88) i =1 (cid:88) j>i ∂ θ (cid:62) s A k i,j (¯ θ n , A ,k , (cid:98) γ n, − k )[ (cid:98) θ n − θ ] + 2 n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:98) γ n, − k − γ , − k ] , where ¯ θ n ∈ [ (cid:98) θ n , θ ] , ¯ A n,k ∈ [ (cid:98) A n,k , A ,k ] , and ¯ γ n, − k ∈ [ (cid:98) γ n, − k , γ , − k ] . Then, similar to the proof of Lemma B.2(iv),we have (cid:12)(cid:12)(cid:12) (cid:98) A n,k − A ,k (cid:12)(cid:12)(cid:12) ≤ ( c + o p (1)) {| T ,n,k | + | T ,n,k | + | T ,n,k |} for some c > , where T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i s A k i,j , T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i ∂ γ (cid:62)− k s A k i,j ( θ , A ,k , ¯ γ n, − k )[ (cid:98) γ n, − k − γ , − k ] ,T ,n,k ≡ n − n (cid:88) i =1 (cid:88) j>i ∂ θ (cid:62) s A k i,j (¯ θ n , A ,k , (cid:98) γ n, − k )[ (cid:98) θ n − θ ] . As shown in (B.14) and (B.15), max ≤ k ≤ n | T ,n,k | = O P ( (cid:112) ln n/n ) and max ≤ k ≤ n | T ,n,k | = O P ( n − / ) . For T ,n,k , observe that n − n (cid:88) i =1 (cid:88) j>i ∂ θ (cid:62) s A k i,j ( θ, γ ) = 2 n − n (cid:88) i =1 (cid:88) j>i (cid:2) { i = k } s θ ,i,j ( θ, γ ) (cid:62) + { j = k } s θ ,i,j ( θ, γ ) (cid:62) (cid:3) = 2 n − (cid:88) j>k s θ ,k,j ( θ, γ ) (cid:62) + 2 n − (cid:88) ii ∂ θ (cid:62) s A k i,j ( θ, γ ) || = O (1) for any ( θ, γ ) ∈ Θ × C n . Then, with LemmaB.3, max ≤ k ≤ n | T ,n,k | = O P ( n − / ) . Combining these results yields max ≤ k ≤ n | (cid:98) A n,k − A ,k | = O P ( (cid:112) ln n/n ) + O P ( n − / ) .Similarly as above, we can also show that max ≤ k ≤ n | (cid:98) B n,k − B ,k | = O P ( (cid:112) ln n/n ) + O P ( n − / ) . Thiscompletes the proof. Proof of Theorem 3.3

To simplify the discussion, we focus on the estimation of C A with K A = 3 only. The other cases can be provedanalogously. In addition, for notational simplicity, we omit the superscript A in this proof.Now, let u n,i ≡ (cid:98) A n,i − A ,i for i = 1 , . . . , n. In particular, u n, = 0 holds by the normalization. In accordance with the ordering (cid:98) A n, (1) ≤ · · · ≤ (cid:98) A n, ( n ) , wepermutate A ,i ’s and obtain { A , ( i ) } . By Theorem 3.2(iii), we have max ≤ i ≤ n | u n,i | = O P ( (cid:112) ln n/n ) . Hence, .p.a.1, the sequence { A , ( i ) } contains two “true” break points ( t , t ) ≡ ( t n, , t n, ) in the following manner: A , ( i ) =  a , if ≤ i ≤ t a , if t + 1 ≤ i ≤ t a , if t + 1 ≤ i ≤ n. We can assume, without loss of generality, that (cid:98) S ,n ( t ) < (cid:98) S ,n ( t ) . If (cid:98) S ,n ( t ) > (cid:98) S ,n ( t ) , by reversing the orderof { (cid:98) A n, ( i ) } and re-labeling the break points appropriately, we can prove the theorem completely analogously. Recallthat (cid:98) t = argmin ≤ κ for any m < t under Assumption 3.4(i). Meanwhile, r ( m ) − r ( t ) = 1 n  m (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,m ) − t (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,t )  + 1 n  n (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) − n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n )  + 2 a m n t (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n )+ 2 n  a m t (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) − a t t (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n )  n  a m n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) − a t n n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n )  . Then, by carefully examining each term of the right hand, we can ﬁnd that r ( m ) − r ( t ) = t − mn · { O P (ln n/n ) + O P ( (cid:112) ln n/n ) } . Hence, we have (cid:98) S ,n ( m ) − (cid:98) S ,n ( t ) = µ ( m ) − µ ( t ) (cid:124) (cid:123)(cid:122) (cid:125) = n − ( t − m ) · c n ( c n > + r ( m ) − r ( t ) (cid:124) (cid:123)(cid:122) (cid:125) n − ( t − m ) · o P (1) ≥ w.p.a.1 for any m < t . Thus, since (cid:98) t is a minimizer of (cid:98) S ,n ( m ) , Pr( (cid:98) t < t ) = Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) ≥ , (cid:98) t < t ) + Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) < , (cid:98) t < t )= Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) < , (cid:98) t < t ) ≤ Pr( (cid:98) S ,n ( m ) − (cid:98) S ,n ( t ) < , m < t for some m ) → as n → ∞ . Thus, (i) follows.(ii) For a given t < m ≤ t , we have ¯ A n, ,m = 1 m m (cid:88) l =1 ( A , ( l ) + u n, ( l ) ) = t m a , + m − t m a , + ¯ u n, ,m ¯ A n,m +1 ,n = 1 n − m n (cid:88) l = m +1 ( A , ( l ) + u n, ( l ) ) = ( t − m ) a , n − m + ( n − t ) a , n − m + ¯ u n,m +1 ,n . Hence, since (cid:98) A n, ( l ) − ¯ A n, ,m =  a m + u n, ( l ) − ¯ u n, ,m if ≤ l ≤ t a m + u n, ( l ) − ¯ u n, ,m if t + 1 ≤ l ≤ m, where a m ≡ ( m − t )( a , − a , ) m , and a m ≡ t ( a , − a , ) m , we have (cid:98) ∆(1 , m ) = t a m + ( m − t ) a m + m (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,m ) + 2 a m t (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,m ) + 2 a m m (cid:88) l = t +1 ( u n, ( l ) − ¯ u n, ,m ) . Similarly, since (cid:98) A n, ( l ) − ¯ A n,m +1 ,n =  a m + u n, ( l ) − ¯ u n,m +1 ,n if m + 1 ≤ l ≤ t a m + u n, ( l ) − ¯ u n,m +1 ,n if t + 1 ≤ l ≤ n, For this calculation, see pp. 8–9 in Supplementary Material for Wang and Su (2020). here a m ≡ ( n − t )( a , − a , ) n − m , and a m ≡ ( t − m )( a , − a , ) n − m , we have (cid:98) ∆( m + 1 , n ) = ( t − m ) a m + ( n − t ) a m + n (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) + 2 a m t (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) + 2 a m n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) . Then, it holds that (cid:98) S ,n ( m ) = n (cid:16) (cid:98) ∆(1 , m ) + (cid:98) ∆( m + 1 , n ) (cid:17) = µ ( m ) + r ( m ) , where µ ( m ) ≡ t n a m + m − t n a m + t − mn a m + n − t n a m r ( m ) ≡ n (cid:34) m (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,m ) + n (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) (cid:35) + 2 a m n t (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,m ) + 2 a m n m (cid:88) l = t +1 ( u n, ( l ) − ¯ u n, ,m )+ 2 a m n t (cid:88) l = m +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) + 2 a m n n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,m +1 ,n ) . To be consistent with the above notations, we re-write (cid:98) S ,n ( t ) = n (cid:16) (cid:98) ∆(1 , t ) + (cid:98) ∆( t + 1 , n ) (cid:17) = µ ( t ) + r ( t ) ,where µ ( t ) ≡ t − t n a t + n − t n a t r ( t ) ≡ n  t (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,t ) + n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n )  + 2 a t n t (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n ) + 2 a t n n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n ) , with a t ≡ ( n − t )( a , − a , ) n − t , and a t ≡ ( t − t )( a , − a , ) n − t . By tedious calculation, we can ﬁnd that µ ( m ) − µ ( t ) = m − t n (cid:20) t m ( a , − a , ) − ( n − t ) ( n − m )( n − t ) ( a , − a , ) (cid:21) ≥ m − t n (cid:20) t m ( a , − a , ) − t ( n − t ) m ( n − t ) ( a , − a , ) (cid:21) = m − t n t m (cid:20) t t ( a , − a , ) − n − t n − t ( a , − a , ) (cid:21) , where the inequality follows because ( n − t ) / ( n − m ) ≤ t /m . Further, it can be shown that r ( m ) − r ( t ) = n − ( m − t ) · o P (1) uniformly in t < m ≤ t . Here, recall that we have assumed (cid:98) S ,n ( t ) < (cid:98) S ,n ( t ) . Through traightforward calculation, we can ﬁnd that < (cid:98) S ,n ( t ) − (cid:98) S ,n ( t ) = t n a t + t − t n a t − t − t n a t − n − t n a t + t − t n · o P (1)= t − t n (cid:20) t t ( a , − a , ) − n − t n − t ( a , − a , ) (cid:21) + t − t n · o P (1) , where a t ≡ ( t − t )( a , − a , ) t , and a t ≡ t ( a , − a , ) t . This implies that t t ( a , − a , ) − n − t n − t ( a , − a , ) → τ τ + τ ( a , − a , ) − τ τ + τ ( a , − a , ) > by Assumption 3.4(ii). Hence, µ ( m ) − µ ( t ) > holds for any t < m ≤ t for suﬃciently large n . Then,combining these results implies that (cid:98) S ,n ( m ) − (cid:98) S ,n ( t ) > w.p.a.1 uniformly in t < m ≤ t , leading to the desiredresult: Pr( t < (cid:98) t ≤ t ) = Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) ≥ , t < (cid:98) t ≤ t ) + Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) < , t < (cid:98) t ≤ t )= Pr( (cid:98) S ,n ( (cid:98) t ) − (cid:98) S ,n ( t ) < , t < (cid:98) t ≤ t ) ≤ Pr( (cid:98) S ,n ( m ) − (cid:98) S ,n ( t ) < , t < m ≤ t for some m ) → . Then, (ii) also follows. For case (iii), since this case is symmetric to (i), we can safely omit the proof.Once the ﬁrst break point is obtained, we can partition { (cid:98) A n, ( i ) } into two subregions { (cid:98) A n, ( i ) } (cid:98) t i =1 and { (cid:98) A n, ( i ) } ni = (cid:98) t +1 .We next estimate in which group the second break point exists. Given the above consistency result, w.p.a.1, wehave (cid:98) S , (cid:98) t ( (cid:98) t ) = (cid:98) S ,t ( t ) and (cid:98) S (cid:98) t +1 ,n ( n ) = (cid:98) S t +1 ,n ( n ) . Since t < t , it suﬃces to show that Pr( (cid:98) S ,t ( t ) < (cid:98) S t +1 ,n ( n )) → . We can observe that (cid:98) S ,t ( t ) = t (cid:98) ∆(1 , t ) = r ∗ t and (cid:98) S t +1 ,n ( n ) = n − t (cid:98) ∆( t + 1 , n ) = µ ∗ n + r ∗ n ,where r ∗ t ≡ t t (cid:88) l =1 ( u n, ( l ) − ¯ u n, ,t ) µ ∗ n ≡ t − t n − t ( a ∗ n ) + n − t n − t ( a ∗ n ) r ∗ n ≡ n − t n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n ) + 2 a ∗ n n − t t (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n ) + 2 a ∗ n n − t n (cid:88) l = t +1 ( u n, ( l ) − ¯ u n,t +1 ,n ) , with a ∗ n ≡ ( n − t )( a , − a , ) n − t , and a ∗ n ≡ ( t − t )( a , − a , ) n − t . Through some calculation, we can ﬁnd that µ ∗ n = ( t − t )( n − t )( n − t ) ( a , − a , ) → τ τ ( τ + τ ) ( a , − a , ) > under Assumptions 3.4(i) and (ii). It can be easily seen that r ∗ n − r ∗ t = o P (1) . Hence, (cid:98) S t +1 ,n ( n ) − (cid:98) S ,t ( t ) > holds w.p.a.1, as desired. Then, once the subset { (cid:98) A n, ( i ) } ni = (cid:98) t +1 is selected for the detection of the second break point,it can be estimated by (cid:98) t = argmin (cid:98) t +1 ≤ κi s δi,j ( δ ) + o P (1) . Note that, under the assumptions made, { s δi,j ( δ ) } are uniformly bounded, and (cid:80) ni =1 (cid:80) j>i s δi,j ( δ ) is a sum ofindependent random variables. Thus, the asymptotic normality result follows from the central limit theorem forbounded random variables (see, e.g., Example 27.4 in Billingsley (2012)).The asymptotic distributional equivalence result is straightforward from Pr (cid:32)(cid:114) N (cid:98) δ n − δ ) ∈ C (cid:33) = Pr (cid:32)(cid:114) N (cid:98) δ n − δ ) ∈ C, ( (cid:98) C An , (cid:98) C Bn ) = ( C A , C B ) (cid:33) + Pr (cid:32)(cid:114) N (cid:98) δ n − δ ) ∈ C, ( (cid:98) C An , (cid:98) C Bn ) (cid:54) = ( C A , C B ) (cid:33) = Pr (cid:32)(cid:114) N (cid:98) δ oracle n − δ ) ∈ C (cid:33) + o (1) for any C ⊆ R d z + K A + K B +1 . C Supplementary Materials

C.1 Explicit form of ∂ L ∗ n ( ρ ) / ( ∂ρ ) For notational simplicity, let C = ( β (cid:62) , α, γ (cid:62) ) (cid:62) and (cid:101) C ( ρ ) = ( (cid:101) β ( ρ ) (cid:62) , (cid:101) α ( ρ ) , (cid:101) γ ( ρ ) (cid:62) ) (cid:62) , and write L n ( θ, γ ) equivalently as L n ( ρ, C ) , so that (cid:101) C ( ρ ) = argmax C ∈B×A× C n L n ( ρ, C ) and L ∗ n ( ρ ) = E L n ( ρ, (cid:101) C ( ρ )) . By the chainrule, ∂ ρ L ∗ n ( ρ ) = ∂ ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ ) = ∂ ρ E L n ( ρ, (cid:101) C ( ρ )) , where the second equality holds for any ρ ∈ R by the ﬁrst-order condition for (cid:101) C ( ρ ) , and we have denoted ∂ ρ to standfor the partial derivative with respect to the “ﬁrst” ρ . Then, we also have ∂ ρρ L ∗ n ( ρ ) = ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ ρ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ ) . In addition, applying the implicit function theorem to ∂ C E L n ( ρ, (cid:101) C ( ρ )) = ( d z +2 n ) × for ρ ∈ R yields ( d z +2 n ) × = ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) = ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) + ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) ∂ ρ (cid:101) C ( ρ )= ⇒ ∂ ρ (cid:101) C ( ρ ) = − (cid:104) ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:105) − ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) . herefore, we obtain ∂ ρρ L ∗ n ( ρ ) = ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) − ∂ ρ C (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:104) ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) (cid:105) − ∂ C ρ E L n ( ρ, (cid:101) C ( ρ )) . Note that since ∂ CC (cid:62) E L n ( ρ, (cid:101) C ( ρ )) is negative semideﬁnite by the second-order condition of maximization, the secondterm on the right-hand side is non-positive. Thus, to ensure that ∂ ρρ L ∗ n ( ρ ) is strictly negative, ∂ ρ ρ E L n ( ρ, (cid:101) C ( ρ )) must also be negative and larger in magnitude than the second term. C.2 Supplementary tables for Section 5

Table C.1: In-degree and Out-degreeCountry In-degree Out-degreeArmenia 21 46 (cont’) (cont’)Australia 43 6 Japan 48 14 Papua New Guinea 18 21Azerbaĳan 24 24 Jordan 13 41 Philippines 22 38Bahrain 33 22 Kazakhstan 29 34 Qatar 29 31Bangladesh 6 39 Kiribati 22 15 Russia 37 18Belarus 26 33 Kuwait 35 23 Saudi Arabia 31 19Bhutan 15 2 Kyrgyzstan 24 33 Singapore 47 34Brunei 43 26 Laos 17 49 South Korea 49 25Cambodia 15 54 Latvia 40 19 Sri Lanka 7 53China 24 9 Lebanon 13 34 Tajikistan 23 36Cyprus 41 19 Lithuania 31 18 Thailand 31 37Estonia 42 19 Malaysia 46 46 Tonga 23 21Fĳi 23 29 Moldova 26 28 Turkey 35 27Georgia 29 32 Mongolia 21 20 UAE 44 20Hong Kong 43 33 Myanmar 13 16 Ukraine 36 29India 13 3 Nauru 21 7 Uzbekistan 22 28Indonesia 25 51 Nepal 9 55 Vanuatu 27 33Iran 10 48 New Zealand 45 18 Viet Nam 16 14Iraq 5 5 Oman 34 5 Yemen 7 9Israel 33 21 Pakistan 5 21Table C.2: Summary StatisticsMean Std. Dev. Min. Max. gdp_pc i free i export ij import ij K B K A References

Abbe, E., 2017. Community detection and stochastic block models: recent developments,

The Journal of MachineLearning Research , 18 (1), 6446–6531.Amemiya, T., 1985.

Advanced Econometrics , Harvard university press.Aradillas-Lopez, A. and Rosen, A.M., 2019. Inference in ordered response games with complete information,

Workingpaper .Bai, J., 1997. Estimating multiple breaks one at a time,

Econometric Theory , 315–352.Berry, S.T., 1992. Estimation of a model of entry in the airline industry,

Econometrica , 60 (4), 889–917.Bickel, P.J. and Chen, A., 2009. A nonparametric view of network models and Newman–Girvan and other modularities,

Proceedings of the National Academy of Sciences , 106 (50), 21068–21073.Billingsley, P., 2012.

Probability and Measure: Anniversary Edition , John Wiley & Sons.Bjorn, P.A. and Vuong, Q.H., 1984. Simultaneous equations models for dummy endogenous variables: a game theoreticformulation with an application to labor force participation,

Working paper .Bonhomme, S., 2020. Econometric analysis of bipartite networks, in:

B.S. Graham and Á. de Paula, eds.,

TheEconometric Analysis of Network Data , Elsevier, 83–121.Bonhomme, S., Lamadon, T., and Manresa, E., 2017. Discretizing unobserved heterogeneity,

University of Chicago,Working paper .Bonhomme, S. and Manresa, E., 2015. Grouped patterns of heterogeneity in panel data,

Econometrica , 83 (3),1147–1184.Bresnahan, T.F. and Reiss, P.C., 1990. Entry in monopoly market,

The Review of Economic Studies , 57 (4), 531–553.Chandrasekhar, A.G., 2016. Econometrics of network formation, in:

Y. Bramoullé, A. Galeotti, and B. Rogers, eds.,

The Oxford Handbook of the Economics of Networks , Oxford University Press, 303–357.Chesher, A. and Rosen, A.M., 2020. Structural modeling of simultaneous discrete choice, cemmap Working paper ,CWP9/20. iliberto, F. and Tamer, E., 2009. Market structure and multiple equilibria in airline markets, Econometrica , 77 (6),1791–1828.de Paula, Á., 2013. Econometric analysis of games with multiple equilibria,

Annual Review of Economics , 5, 107–131.de Paula, Á., 2020. Econometric models of network formation,

Annual Review of Economics , 12, 775–799.Dzemski, A., 2019. An empirical model of dyadic link formation in a network with unobserved heterogeneity,

Reviewof Economics and Statistics , 101 (5), 763–776.Fortunato, S., 2010. Community detection in graphs,

Physics Reports , 486 (3-5), 75–174.Graham, B.S., 2016. Homophily and transitivity in dynamic network formation,

NBER Working Paper , (22186).Graham, B.S., 2017. An econometric model of network formation with degree heterogeneity,

Econometrica , 85 (4),1033–1063.Graham, B.S. and Pelican, A., 2020. Testing for externalities in network formation using simulation, in:

B.S. Grahamand Á. de Paula, eds.,

The Econometric Analysis of Network Data , Elsevier, 63–82.Hoﬀ, P.D., Raftery, A.E., and Handcock, M.S., 2002. Latent space approaches to social network analysis,

Journal ofthe American Statistical association , 97 (460), 1090–1098.Hoshino, T. and Yanagi, T., 2020. Treatment eﬀect models with strategic interaction in treatment decisions, arXiv ,(1810.08350).Jochmans, K., 2018. Semiparametric analysis of network formation,

Journal of Business & Economic Statistics , 36 (4),705–713.Karrer, B. and Newman, M.E., 2011. Stochastic blockmodels and community structure in networks,

Physical ReviewE , 83 (1), 016107.Ke, Y., Li, J., Zhang, W., et al. , 2016. Structure identiﬁcation in panel data analysis,

The Annals of Statistics , 44 (3),1193–1233.Khan, S. and Nekipelov, D., 2018. Information structure and statistical information in discrete response models,

Quantitative Economics , 9 (2), 995–1017.Kline, B., 2015. Identiﬁcation of complete information games,

Journal of Econometrics , 189 (1), 117–131.Kline, B., 2016. The empirical content of games with bounded regressors,

Quantitative Economics , 7 (1), 37–81.Kreps, D.M., 2012.

Microeconomic Foundations I: Choice and Competitive Markets , Princeton University Press.Leung, M.P., 2015. Two-step estimation of network-formation models with incomplete information,

Journal of Econo-metrics , 188 (1), 182–195.Lewbel, A., 2007. Coherency and completeness of structural models containing a dummy endogenous variable,

International Economic Review , 48 (4), 1379–1392. ian, H., Qiao, X., and Zhang, W., 2019. Homogeneity pursuit in single index models based panel data analysis, Journal of Business & Economic Statistics , in press.Liu, R., Shang, Z., Zhang, Y., and Zhou, Q., 2020. Identiﬁcation and estimation in panel models with overspeciﬁednumber of groups,

Journal of Econometrics , 215 (2), 574–590.McKay, A. and Tekleselassie, T.G., 2018. Tall paper walls: The political economy of visas and cross-border travel,

TheWorld Economy , 41 (11), 2914–2933.Mele, A., 2017. A structural model of dense network formation,

Econometrica , 85 (3), 825–850.Neiman, B. and Swagel, P., 2009. The impact of post-9/11 visa policies on travel to the united states,

Journal ofInternational Economics , 78 (1), 86–99.Neumayer, E., 2010. Visa restrictions and bilateral travel,

The Professional Geographer , 62 (2), 171–181.Newman, M.E. and Girvan, M., 2004. Finding and evaluating community structure in networks,

Physical Review E ,69 (2), 026113.Okui, R. and Wang, W., 2020. Heterogeneous structural breaks in panel data models,

Journal of Econometrics .Pelican, A. and Graham, B.S., 2020. An optimal test for strategic interaction in social and economic network formationbetween heterogeneous agents,

NBER Working Paper , 27793.Rohe, K., Chatterjee, S., Yu, B., et al. , 2011. Spectral clustering and the high-dimensional stochastic blockmodel,

TheAnnals of Statistics , 39 (4), 1878–1915.Rothenberg, T.J., 1971. Identiﬁcation in parametric models,

Econometrica , 577–591.Sheng, S., 2020. A structural econometric analysis of network formation games through subnetworks,

Econometrica ,88 (5), 1829–1858.Su, L., Shi, Z., and Phillips, P.C., 2016. Identifying latent structures in panel data,

Econometrica , 84 (6), 2215–2264.Tamer, E., 2003. Incomplete simultaneous discrete response model with multiple equilibria,

The Review of EconomicStudies , 70 (1), 147–165.Wang, W., Phillips, P.C., and Su, L., 2018. Homogeneity pursuit in panel data models: Theory and application,

Journalof Applied Econometrics , 33 (6), 797–815.Wang, W. and Su, L., 2020. Identifying latent group structures in nonlinear panels,

Journal of Econometrics .Yan, T., Jiang, B., Fienberg, S.E., and Leng, C., 2019. Statistical inference in a directed network model with covariates,

Journal of the American Statistical Association , 114 (526), 857–868.Yan, T., Leng, C., and Zhu, J., 2016. Asymptotics in directed exponential random graph models with an increasingbi-degree sequence,

The Annals of Statistics , 44 (1), 31–57.Zhang, Y., Chen, K., Sampson, A., Hwang, K., and Luna, B., 2019. Node features adjusted stochastic block model,

Journal of Computational and Graphical Statistics , 28 (2), 362–373.Zhou, Y., 2019. Identiﬁcation and estimation of entry games under symmetry of unobservables,

SSRN Working paper ..