[PDF] Estimating Unobserved Individual Heterogeneity Using Pairwise Comparisons

Abstract

We propose a new method for studying environments with unobserved individual heterogeneity. Based on model-implied pairwise inequalities, the method classifies individuals in the sample into groups defined by discrete unobserved heterogeneity with unknown support. We establish conditions under which the groups are identified and consistently estimated through our method. We show that the method performs well in finite samples through Monte Carlo simulation. We then apply the method to estimate a model of lowest-price procurement auctions with unobserved bidder heterogeneity, using data from the California highway procurement market.

Full PDF

EE STIMATING U NOB SERVED I NDIVIDUAL H ETEROGENEITY U SING P AIRWISE C OMPARISONS

Elena Krasnokutskaya, Kyungchul Song, and Xun Tang

Johns Hopkins University, University of British Columbia, Rice University A BSTRACT . We propose a new method for studying environments with unobservedindividual heterogeneity. Based on model-implied pairwise inequalities, the methodclassiﬁes individuals in the sample into groups deﬁned by discrete unobservedheterogeneity with unknown support. We establish conditions under which thegroups are identiﬁed and consistently estimated through our method. We showthat the method performs well in ﬁnite samples through Monte Carlo simulation.We then apply the method to estimate a model of lowest-price procurement auc-tions with unobserved bidder heterogeneity, using data from the California high-way procurement market. K EY WORDS : Unobserved Individual Heterogeneity, Discrete Unobserved Heterogeneity,Pairwise Comparisons, Nonparametric Classiﬁcation, ConsistencyJEL C

LASSIFICATION : C12, C21, C31

1. Introduction

The empirical analysis of many economic settings requires accounting for unobservedindividual heterogeneity (UIH) which reﬂects agent-speciﬁc factors that inﬂuence agents’decisions but are not recorded in the data. Failing to account for UIH generally leads tobiased estimates and affects the validity of counterfactual prediction.In this paper, we consider a generic economic model where UIH induces a group struc-ture among agents according to their types. We provide conditions for identiﬁcation ofthe group structure, and propose a method to recover the group structure from data.

Date : April 7, 2020.We are grateful to the Associate Editor and three referees for valuable comments. All errors are ours.Corresponding address: Elena Krasnokutskaya, Department of Economics, Johns Hopkins University, 3100Wyman Park Drive, Baltimore, MD 21218, USA. Email address: [email protected]. a r X i v : . [ ec on . E M ] A p r Our main idea is based on the insight that UIH often implies pairwise inequality re-strictions on endogenous observable quantities. For instance, in multi-attribute auctions,bidders with higher (unobserved) quality levels have a greater chance of winning the auc-tion, controlling for the bid and the set of competitors. In a labor market setting, agentswith higher (unobserved) productivity receive higher wages than less productive ones.In Section 2.2 we provide further examples of economic applications in which pairwiseinequality restrictions arise naturally from the behavior of agents in equilibrium.We develop a statistical method to recover the group structure (that is, to classify indi-viduals into groups deﬁned by UIH) using a pairwise comparison approach. Our methodtreats UIH as individual-speciﬁc discrete parameters which may affect the distributionof other observed or unobserved variables. We assume that the support of UIH is a ﬁ-nite, ordered set that is not known to the econometrician. Such ﬂexibility is important instructural models where individuals interact strategically and the UIH of all agents jointlyaffects the equilibrium outcome.One may attempt to order individuals on a pair-by-pair basis using pairwise inequalitytests. However, it may not produce an ordering that is transitive in ﬁnite samples. Ourmethod recovers the whole group structure by sequentially sub-dividing the set of agentson the basis of the p -values of tests of pairwise inequality restrictions. The method recov-ers the group structure for each assumed number of groups, and then selects the numberof groups (and the associated group structure) using a penalization scheme. We showthat our estimator of the group structure is consistent under mild regularity conditionsand performs well in small samples.In many settings, classifying individuals into groups deﬁned by UIH offers key eco-nomic insights. For example, our method can be used to identify colluding bidders inauctions, and ﬁrms’ cost asymmetries or product quality differences. In addition, re-covering the group structure also often serves as the ﬁrst step for estimating structuralmodels with strategic interactions, such as dynamic industry models or auctions withasymmetric bidders. This approach offers a feasible way to identify and estimate games with UIH. Speciﬁ-cally, a traditional approach in a setting with UIH would be to treat UIH as “ﬁxed effects”and estimate them jointly with other structural parameters. This approach poses practi-cal challenges in a setting with strategic interdependence among agents, especially when Estimation of discrete unobserved individual heterogeneity does not affect subsequent estimation of otherstructural parameters in terms of pointwise asymptotics. However, establishing uniform asymptotics re-mains an open question. This problem is analogous to that of post-model-selection inference. For discus-sion on the issues, see P¨otcher (1991), Leeb and P¨otcher (2005), and Andrews and Guggenberger (2009)and the references therein. Uniform asymptotics in our setup is complex because of the need to considerevery possible direction of local perturbation from the actual group structure in data-generating process. Afull theoretical investigation of the issue in our context merits a separate paper. equilibrium outcomes admits no closed-form expressions. First, it is generally not obvi-ous what variation in the data may identify model components including the ﬁxed effectsin such settings. One of the contributions of our paper is to point out the variation whichidentiﬁes the group structure associated with “ﬁxed effects.”Further, we propose to separate the recovery of the group structure from the estimationof other structural parameters. Recovering the group membership of every agent facil-itates, and is often needed for, identifying the remaining structural elements in modelswith strategic interactions. A classical example is English auctions among bidders withunobserved types (in the sense that bidders’ private values are drawn independently fromthe distributions “labeled” by bidder type) and where the data only report the transactionprice and the identity of the winner. Athey and Haile (2007) show that the distributionsof private values cannot be identiﬁed in this model if the type of the auction winner isunknown. We provide details and additional examples in the supplemental note to thispaper.Finally, our approach offers a way to estimate games with UIH with low computationalcost, compared with the alternative approach of estimating the ﬁxed effects jointly withother structural parameters. For example, consider an environment with a large numberof players where many independent games (each containing only a small subset of play-ers) are observed in the data. The numerical optimization (such as simulated GMM orMLE) requires evaluating the objective function which involves computing equilibriumfor each value of the ﬁxed effects and other structural parameters. This can be computa-tionally prohibitive in practice. In contrast, our method requires the game to be solvedonly for the estimated conﬁguration of the agents’ group memberships rather than forevery possible conﬁguration as is required under the joint estimation. Our method is advantageous especially in settings where the number of agents is mod-erately large but each market (observation) in the data involves only a small subset ofparticipants. For example, the total number of participants may be several hundreds buteach market may contain only several participants. In this case, despite the large numberof markets observed, the researcher may have only a small number of markets whichcontain the same set of participants. We call this issue the problem of the sparsely commonset of agents . In such settings, the researcher cannot build inference on the conditionalmoments given the full set of participating agents in a market, as typically done in thestructural empirical literature, because we do not have many such observations. Hencethe researcher needs to “aggregate” the markets or the agents in order to conduct reli-able inference with sufﬁciently many observations. Pairwise restrictions are testable withaccuracy even when the data exhibit sparsely common sets of agents since the number Krasnokutskaya, Song, and Tang (2020b) used this classiﬁcation method as a ﬁrst step in the structuralanalysis of an online service market for computer coding. of markets where a given pair of agents is present tends to be large even if the numberof markets with the same set of participants is small. Thus, pairwise restrictions andthe classiﬁcation procedure offer a natural way to aggregate agents into groups whichpermits estimation of other primitives.We investigate the ﬁnite sample performance of our classiﬁcation method in MonteCarlo simulation. The data-generating process (DGP) is a lowest-price procurement auc-tion among asymmetric bidders whose independent private values are drawn from dis-tributions with different means. We report the outcome of classiﬁcation for DGPs withvarious numbers of bidders and group structures. Our classiﬁcation method works well.Its performance is better when the number of bidders and groups are smaller relative tothe number of observed markets/games, and when the differences between groups arelarger. We also ﬁnd that the impact of classiﬁcation errors on subsequent estimation ofother structural parameters in the game is non-substantial.We analyze the California highway procurement market by applying our classiﬁcationmethod to a model of asymmetric lowest-price procurement auction. Existing empiri-cal studies of auction markets typically emphasized asymmetry in bidders’ private valuesassociated with their observable characteristics. In comparison, we allow the bidders’private values to be drawn from heterogeneous distributions with different means. Toaccount for other sources of cost heterogeneity, we control for bidders’ distance to theproject site. We also accommodate possible endogeneity in the competitive structure. Weuse the classiﬁcation method to recover the unknown group structure (i.e., partition bid-ders into groups with different mean costs). Then, using this estimated group structure,we estimate group-speciﬁc cost distributions using GMM.Our estimates indicate that the bidders in the data come from several unobservedgroups with substantial differences in mean costs. We also ﬁnd that ignoring such unob-served bidder heterogeneity would lead to biased estimates of how bidders’ costs dependon various factors.

Related Literature.

One of the popular methods of accounting for UIH in structuralmodeling is to adopt ﬁnite mixtures. (See Hu (2008), Hu and Schennach (2008), Kasa-hara and Shimotsu (2009), Hu and Shum (2012), Hu and Shiu (2013), Hu, McAdams,and Shum (2013), and Henry, Kitamura, and Salanie (2014). See also See Kasahara andShimotsu (2014) and Kasahara and Shimotsu (2015) for estimating and testing for thenumber of mixture components in ﬁnite mixture models.) The ﬁnite mixture modeling For example, Athey, Levin, and Seira (2011), Roberts and Sweeting (2013) and Aradillas-Lopez, Gandhi,and Quint (2013) accounted for the bidder heterogeneity associated with size in the timber market (‘mills’vs ‘loggers’); Krasnokutskaya and Seim (2011), Jofre-Bonet and Pesendorfer (2003) and Gentry, Komarova,and Shiraldi (2016) incorporated bidder participation differences in highway procurement market (‘reg-ular’ vs ‘fringe’ bidders); Conley and Decarolis (2016), Asker (2010) and Pesendorfer (2000) allowed forbidder heterogeneity in collusive behaviors. assumes that the UIH is a random variable drawn from some unknown distribution. Thegoal is to identify this distribution and estimate it from data. It does not require eachindividual to appear in many independent games. In contrast, our approach aims to clas-sify individual agents in the sample into disjoint groups deﬁned by their realized unobservedtypes , using their participation outcomes in many games. Thus the two approaches arefundamentally different both in their aims and their data requirements. While generalidentiﬁcation results have been developed in this literature of ﬁnite mixture models (see,e.g., Bonhomme, Jochmans, and Robin (2016)), implementation of the ﬁnite-mixturemethod is impractical in our set-up due to the issue of sparse commonality, and technicalissues associated with high dimensionality. The classiﬁcation algorithm we propose is related to the clustering method in statistics.(See, e.g., Chapter 14 of Hastie, Tibshirani, and Friedman (2009).) The main differenceis that the clustering methods aim to groups individuals based on the similarity of theirobserved attributes, whereas in our setting, a researcher’s objective is to group individu-als according to testable implications of their unobserved attributes . To accomplish this,we exploit the relationship between endogenous outcome and the unobserved types ofindividuals implied by an economic model. Our method also requires a data structuredifferent from clustering methods. The literature of clustering methods mostly considersa set-up in which each cross-sectional unit is observed once, whereas our method usesmany observations per individual in the sample.Also related to our approach is the literature of panel models with group-level hetero-geneity. For example, Sun (2005) introduced a linear panel model where parameterstake values in a ﬁnite set according to a logistic probability, and offered methods of es-timating the group structure. Song (2005) considered a panel model with ﬁnite-valuednonstochastic parameters and produced an algorithm to recover the unobserved groupstructure in large panel models. Lin and Ng (2012) provided a method of estimatinga panel model using threshold variables when the group membership is unknown. Su,Shi, and Phillips (2016) developed a new Lasso method to recover the unknown group-speciﬁc parameters. Bonhomme and Manresa (2014) proposed a k-means clustering al-gorithm to recover the group structure in a linear panel model. These papers often focuson models which admit a reduced form for the dependent variable in which its functionalrelation to UIH is made explicit. In contrast, our method targets a set-up where the de-pendence of the outcome variables on the UIH arises only implicitly through equilibriumcontraints in games, and the group structure of UIH is identiﬁed only through pairwise When each market is drawn from a ﬁnite mixture distribution, and there are I agents with each having atype from S values, the number of the mixture components becomes S I which can be very large in pratice,even when I is a moderate number such as ﬁve or ten. In addition, to construct a likelihood or momentcondition, one would need to solve for a different market equilibrium for each component. inequality restrictions. Thus, the approaches developed in the panel literature are notapplicable in settings our proposal focuses on. Roadmap.

This paper is organized as follows. Section 2 introduces the basic environmentand deﬁnes pairwise inequality restrictions. This section also provides several examplesfrom various contexts to motivate our classiﬁcation method.Section 3 establishes identiﬁcation of the unobserved group structure using pairwiseinequality restrictions. Section 4 proposes a consistent estimator of the group structure.Section 5 provides results from Monte Carlo simulation. Section 6 presents the empiricalapplication. Section 7 concludes. Further examples and mathematical proofs are pro-vided in the supplemental note of this paper. The note also contains further simulationresults and details in our empirical application.

2. The Model and Examples

We consider a setting where the econometrician observes L games, and in each game,a set of agents interact with each other. Each agent i is associated with a non-stochastictype, q i , which is not observed by the researcher. We assume that the type is ﬁnite-valuedso that q i ∈ Q = { ¯ q , ..., ¯ q K } , with ¯ q < · · · < ¯ q K . This induces an (ordered) partition ( N , N , ..., N K ) of the set N of agents such that for each k = 2 , ..., K , N k consists ofagents with higher type than those in N k − . The group structure is characterized by afunction τ : N → { , ..., K } that links the identity of a player to his unobserved type sothat q i = ¯ q τ ( i ) and for each k = 1 , ..., K , N k = { i ∈ N : τ ( i ) = k } . The group structure deﬁned by τ is represented as an ordered collection of sets N k : T = ( N , N , ..., N K ) . (2.1)The data available to the researcher contain for every observation (cid:96) = 1 , ..., L : a vectorof observable characteristics for all the agents involved, { X j,(cid:96) } j ∈ S (cid:96) , as well as at least onebut possibly multiple vectors of outcome variables, Y (cid:96) = { Y j,(cid:96) } j ∈ S (cid:96) , where S (cid:96) is the set ofplayers involved in each observation (e.g., a game or a market). Our main focus is onrecovering the ordered partition ( N , N , ..., N K ) of agents from data.The main insight of our paper begins with the observation that in many structural mod-els, the ordering among q i ’s (or equivalently τ ( i ) ’s) coincides with the ordering betweenindexes that can be estimated consistently. Speciﬁcally, we use pairwise indexes δ ij and δ ij that satisfy the following relations: δ ij > if and only if τ ( i ) > τ ( j ); (2.2) δ ij = 0 if and only if τ ( i ) = τ ( j ) , where the indexes δ ij and δ ij can be consistently estimated using the sample. In manyapplications, we can take the indexes as (a variant of) the following form: δ ij = (cid:90) max { E [ Y i,(cid:96) | X i,(cid:96) = x ] − E [ Y j,(cid:96) | X j,(cid:96) = x ] , } dF ( x ) , and(2.3) δ ij = (cid:90) | E [ Y i,(cid:96) | X i,(cid:96) = x ] − E [ Y j,(cid:96) | X j,(cid:96) = x ] | dF ( x ) , where F is a known distribution or the distribution of an observable random vector.For example, suppose that the outcome Y i,(cid:96) admits the following reduced form: Y i,(cid:96) = g ( τ ( i ) , X i,(cid:96) , η i,(cid:96) ) , where g is a function that is strictly increasing in τ ( i ) and η i,(cid:96) is an unobserved componentthat is independent of X i,(cid:96) . Then under regularity conditions, we obtain the pairwiserelations (2.2) with (2.3). The main advantage of our approach is that we do not requirean explicit characterization of the reduced form g . Due to this ﬂexibility, our approach ismost useful for analyzing UIH in structural models where the reduced form for outcomesarises only implicitly through equilibrium constraints. In such a setting, the sign of theindexes δ ij represents the pairwise relation which says that between any two agents, oneagent’s type is higher than the other if and only if his outcome tends to be higher thanthat of the other. As we demonstrate through examples below, many structural modelsimply these pairwise relations through indexes δ ij and δ ij .The main goal of this paper is to develop a statistical procedure to recover the groupstructure τ from data. Our method relies only on the pairwise inequality restrictions in(2.2). Thus so far as the group structure is concerned, the pairwise comparison indexes δ ij and δ ij play the role of a sufﬁcient statistic; the recovery of the group structure doesnot rely on other details of the structural model. We now provide examples of how pairwise inequality restrictions arise as equilibriumimplications in a variety of commonly studied empirical contexts. For the sake of concreteness, our exposition in the paper focuses on this form of indexes. Our proceduresrely on the indexes only through the availability of consistent tests of pairwise inequality restrictions: δ ij > and δ ij = 0 . As long as such consistent tests are available, one can use our method for other formsof pairwise indexes. Unobserved Quality in Multi-attribute Auctions.

Consider a simpliﬁed versionof multi-attribute auctions in ? that abstracts away from observed auction and sellerheterogeneity. Let N denote the total set of sellers and S (cid:96) the set of sellers who submittedbids for a project (cid:96) . Each seller has a discrete unobservable quality: q i ∈ { ¯ q , ..., ¯ q K } , with ¯ q k < ¯ q k (cid:48) whenever k < k (cid:48) . Such a quality is known to buyers but not reported in data.The buyer for project (cid:96) selects a seller among those who submitted bids or chooses anoutside option to maximize his payoff. The payoff to the buyer from engaging services ofseller i ∈ S (cid:96) is given by U i,(cid:96) = α (cid:96) q i + (cid:15) i,(cid:96) − B i,(cid:96) whereas the payoff from an outside optionis U ,(cid:96) . Here α (cid:96) is a non-negative weight the buyer gives to the seller’s quality relative tothe seller’s bid, whereas (cid:15) i,(cid:96) reﬂects a buyer-seller match-speciﬁc stochastic component.Let us suppress the auction subscript (cid:96) and deﬁne for any two sellers i, j , ρ ij ( b ) = P { i wins | B i,(cid:96) = b, i ∈ S (cid:96) , j (cid:54)∈ S (cid:96) } , for all b on the intersection of the supports of B i and B j . Suppose α, S (cid:96) , { C i,(cid:96) , (cid:15) i,(cid:96) } i ∈ S (cid:96) aremutually independent. Proposition 1 of ? showed that(2.4) sign ( ρ ij ( b ) − ρ ji ( b )) = sign ( q i − q j ) , for any b on the intersection of bid supports.On the basis of this property the comparison indexes can be constructed as follows: δ ij ≡ (cid:82) max { ρ ij ( b ) − ρ ji ( b ) , } db and δ ij ≡ (cid:82) | ρ ij ( b ) − ρ ji ( b ) | db . Note that the comparisonindexes do not depend on other details of the structural model such as speciﬁc parametricassumptions for the distribution of buyers’ tastes.2.2.2. Firms’ Cost Efﬁciency and Pricing Decisions.

Consider a population of n ﬁrmsor brands, each of which produces a single brand of product. The data consists of in-dependent markets indexed by (cid:96) = 1 , ..., L . The marginal cost for ﬁrm i on market (cid:96) is c i,(cid:96) = ϕ ( w i,(cid:96) , q i , η i,(cid:96) ) , where w i,(cid:96) are observable cost shifters, q i a brand-speciﬁc unob-served heterogeneity that is ﬁxed across markets, and η i,(cid:96) ’s are i.i.d. idiosyncratic noisesindependent from w i,(cid:96) and q i . We may interpret q i as a measure of ﬁrm i ’s cost efﬁciency.Firms have complete information about each others’ cost efﬁciencies. Firms in the pop-ulation are partitioned into groups with different levels of q i : N = ∪ k N k where i ∈ N k if τ ( i ) = k .Let σ i,(cid:96) ( x (cid:96) , p (cid:96) , Ω (cid:96) ) denote ﬁrm i ’s market shares, which is a function of product at-tributes ( x (cid:96) = { x i,(cid:96) } i ∈ S (cid:96) , where S (cid:96) denotes the set of brands in market (cid:96) ) and prices ( p (cid:96) = This holds, for example, if sellers are not informed of the weights or outside option of the buyer, or theidentities of other sellers in S (cid:96) . This assumption is plausible in certain industries where production efﬁciency is mostly determined byﬁrms’ technology or equipment that is publicly observable. { p i,(cid:96) } i ∈ S (cid:96) ) conditional on the set of products available in market (cid:96) and other market fac-tors denoted by Ω (cid:96) . The proﬁt for ﬁrm i in market (cid:96) is: π i,(cid:96) = ( p i,(cid:96) − c i,(cid:96) ) σ i,(cid:96) ( x (cid:96) , p (cid:96) , Ω (cid:96) ) M (cid:96) ,where M (cid:96) is a measure of potential consumers in market (cid:96) . In any pricing equilibriumwith an interior solution, the ﬁrst-order condition implies(2.5) c i,(cid:96) = p i,(cid:96) + σ i,(cid:96) ∂σ i,(cid:96) /∂p i,(cid:96) . Notice that if η i,(cid:96) is independent from q i and w i,(cid:96) , and ϕ ( w i,(cid:96) , q i , η i,(cid:96) ) is strictly monotonein q i then so is the right-hand side of (2.5), which can be constructed from estimates ofthe demand system. Hence, for any pair i, j ∈ N , q i ≥ q j if and only if E [ z i,(cid:96) | w i,(cid:96) = w ] ≥ E [ z j,(cid:96) | w j,(cid:96) = w ] , for all w , where z i,(cid:96) is deﬁned as the quantity on the right hand side of(2.5). The statement is also true when both inequalities are strict. Thus we can deﬁne apairwise comparison index(2.6) δ ij ≡ (cid:90) max { E [ z i,(cid:96) | w i,(cid:96) = w ] − E [ z j,(cid:96) | w j,(cid:96) = w ] , } dF ( w ) , where F is the distribution of w i,(cid:96) . In equilibrium δ ij > if and only if q i > q j . Likewise,deﬁne δ ij by replacing max {· , } in the integral in δ ij with the absolute value. Thesepairwise comparison indexes do not condition on speciﬁc identities of ﬁrms in a market.2.2.3. Assortative Matching in Labor Market.

Sorting of heterogeneous employeesacross heterogeneous ﬁrms has been studied in Lentz and Mortensen (2010), Abowd,Kramarz, and Margolis (1999), and Lise, Meghir, and Robin (2011). In a typical setting,ﬁrms are heterogeneous in the productivity from a given worker ceteris paribus . Work-ers differ in their unobservable ability q i . Under further restrictions (see Eeckhout andKircher (2011) and Hagedorn, Law, and Manovskii (2016)), workers with higher abilitywould in equilibrium earn higher wages than co-workers at the same ﬁrm, holding otherthings equal.This forms a basis for pairwise comparisons. Speciﬁcally, let w i,f,t = W ( q i , X i,t , Ω f,t ) denote the wage worker i earns at time t while employed by ﬁrm f , where W is a non-stochastic function. Here Ω f,t captures all the relevant ﬁrm-speciﬁc unobservable factorswhile X i,t reﬂects worker i ’s characteristics other than q i . Here we assume that ( X i,t , Ω f,t ) is identically distributed across f ’s and t ’s. Using N f,t to denote the set of workers em-ployed by ﬁrm f at time t , we deﬁne the comparison index as δ ij = (cid:90) max { E [ w i,f,t | X i,t = x ] − E [ w j,f,t | X j,t = x ] , } dF ( x ) , i, j ∈ N f,t , where F is the distribution of X i,t . Then δ ij > if and only if q i > q j , under regularityconditions such as strict mononicity of W in q i . Likewise before, deﬁne δ ij by replacingthe max operator in δ ij with its absolute value. In this setting comparison of workersis complicated by the (unobserved) ﬁrm heterogeneity and sorting of workers across MarketsAgents MarketsCommon Sets of Agents across the MarketsAgents Sparsely Common Sets of Agents across the MarketsAgents

Markets

Pairwise Comparisons across the Markets F IGURE The ﬁrst panel shows an example of a data set where the set of participants isthe same across all markets. The second panel shows another example of data where onlya small fraction of markets share the same set of participants. As illustrated here, thereare only three markets where the set of participants is precisely { , , } . The third panelshows that there are many more markets in the second example where both agents and (represented by white ellipses) participate. Thus a researcher can estimate populationquantities that condition on the joint participation of these two agents with better accuracythan quantities that condition on the whole set of participants. ﬁrms. Pairwise comparisons allow researchers to circumvent these issues by focusing onworkers’ wages earned while they are employed by the same ﬁrm. Our pairwise comparison method is most useful in a setting where players appear in amarket only sparsely. To express this data feature, deﬁne for any S ⊂ N , L ( S ) = { ≤ (cid:96) ≤ L : S (cid:96) = S } . Thus L ( S ) represents the set of markets where the set of participants in a market S (cid:96) is precisely S . In this paper, we refer to the setting as that of a sparsely common setof agents , if the proportion max S ⊂ N |L ( S ) | /L is negligible in ﬁnite samples. In otherwords, only a small fraction of the markets in the sample share exactly the same set ofparticipants.The setting with a sparsely common set of agents is illustrated in Figure 1, where eachcolumn symbolizes a “market” and each row an individual agent. The ellipses in each column represent agents participating in a market. The ﬁrst panel shows a standard set-up where all the agents appear in all the markets. The second panel shows an example ofa data set where only very few markets share exactly the same set of participants { , , } .Therefore, the conditional choice probability given the same set of agents simultaneouslyparticipating in the market cannot be accurately estimated. However, if we focus ononly subsets with two agents { , } , there are many more markets in which the twoagents participate. If each pair of agents appear in many markets simultaneously, wemay aggregate over these markets, and infer accurately the ordering between the twoagents using an inequality test. Given the p-values from inequality tests across pairs ofagents, it remains to recover the whole group structure of the agents from these pairwise p -values. We develop an algorithm that recover the group structure from the pairwise p -values consistently.

3. Identiﬁcation of the Ordered Group Structure

Let us discuss conditions for the identiﬁcation of the group structure. First, let P bethe distribution of observed random variables that belong to a market. (We assume that P is the same across the markets.) We say that agents ( i, j ) are comparable if there existpairwise indexes δ ij and δ ij such that the indexes are identiﬁed by P and (2.2) holds.In this identiﬁcation analysis, we assume that a researcher knows whether each pair ofagents is comparable through some pairwise comparison index or not. The determinationof such comparability can be done in practice by checking whether the data containssufﬁciently many markets so that the pairwise indexes may be accurately estimated.Let E be the collection of pairs ( i, j ) that are comparable. We refer to comparableagents as adjacent , so that the set E forms the set of edges in a graph on the set of agents N . We call this graph (denoted by G = ( N, E ) ) the comparability graph . We say agroup structure τ is identiﬁed if it is uniquely determined once the comparability graph G and the vectors of pairwise indexes ( δ ij , δ ij ) ij ∈E are known. Thus when τ is identiﬁed, itis only through the identiﬁcation of the comparability graph G and the pairwise indexes ( δ ij , δ ij ) ij ∈E , not through other speciﬁcation details of the structural model.Let us explore the identiﬁcation of τ given the comparability graph G and the vector ofpairwise indexes. It is easy to see that if E contains only a small subset of possible pairs,we may not be able to identify the group structure. The identiﬁcation of the orderedgroup structure τ is not guaranteed even when many pairs of agents are comparable. For In a graph (or network) G = ( N, E ) the set N represents the set of vertices (or nodes) and E consists ofsome pairs ij , with i, j ∈ N , where each pair ij is called an edge (or link). Thus, if ( i, j ) ∈ E , i and j are adjacent. A path is a set of vertices { i , i , ..., i M } such that i i , i i , ...i M − i M ∈ E . Two vertices arecalled connected if there is a path having i and j as end vertices. A graph is called connected if all pairs ofvertices are connected in the graph. HighMedLow Node 5Node 1Node 2Node 3Node 4Type HighMedLow Node 5Node 1Node 2Node 3Node 4Type F IGURE This ﬁgure illustrates an example where the group structure is not identiﬁedeven when all nodes are connected. The comparability graph is G = ( N, E ) , where N = { , , ..., } and E = { , , , } . Pairwise comparison is feasible only between nodeslinked by solid black lines (a.k.a. links). The two different group structures in this ﬁgureare compatible with the same pairwise ordering. Therefore we cannot identify the groupstructure from pairwise orderings, given this particular comparability graph. example, even if G is a connected graph (where any two agents are connected at leastindirectly), the ordered group structure τ may not be identiﬁed. This is illustrated ina counterexample in Figure 2. Certainly, when every pair of agents are adjacent in thegraph G , i.e., G is a complete graph, the ordered group structure τ is identiﬁed. Below we establish a necessary and sufﬁcient condition for the group structure to beidentiﬁed from a potentially incomplete graph G and the pairwise comparison indexes.Let us introduce some deﬁnitions. Deﬁnition 3.1. (i) A graph G τ is the τ -collapsed graph of G if (a) any two adjacentvertices i and j in G with τ ( i ) = τ ( j ) collapse to a single vertex (denoted by ( ij ) ) in G τ ,(b) any edge in G joining a vertex k to either i or j joins vertex k to ( ij ) in G τ when τ ( i ) = τ ( j ) and (c) all the remaining vertices and edges in G τ consist of the remainingvertices and edges in G .(ii) A path in G τ is monotone if τ ( i ) is monotone as i runs along the path.(iii) A vertex i is said to be identiﬁed if its type τ ( i ) is identiﬁed.The τ -collapsed graph of G is constructed by reducing any comparable pair of agentsin G who have the same type to a single “agent”, and retaining edges as in the originalgraph of G . Certainly, a τ -collapsed graph G τ is uniquely determined by δ ij ’s and G . Anypair of adjacent agents in the τ -collapsed graph must have different types, and hence the If all pairs of agents are comparable, we can split the set of agents into one group with the lowest typeand the other group with the remaining agents. Then we split these remaining agents into one group withthe lowest type within these agents and the remaining agents. By continuing this process, we can identifythe whole group structure. HighMedLow Node 5Node 1Node 2Node 4 Node 6Type HighMedLow Node 5Node 1Node 23Node 4 Node 6TypeNode 3 HighMedLow Node 5Node 1Node 23Node 4 Node 6Type F IGURE This ﬁgure shows an example where the condition N = N ∗ in Theorem 3.1is violated. The ﬁrst panel depicts the comparability graph as one connecting 6 vertices (ornodes). The second panel shows the τ -collapsed graph where the two comparable nodes2 and 3 that have the same type are collapsed into one node named 23. The last panelshows that Nodes 23, 4, and 5 (expressed as solid black nodes) are identiﬁed, because theyare on a monotone path of length K − . In this example, the group memebershipof Nodes 1 and 6 are not identiﬁed and thus the comparable graph does not lead to theidentiﬁcation of the group structure. types of agents on a monotone path are strictly monotone. This means that every vertexon a monotone path in G τ of length K − is identiﬁed. Also by similar logic, everyvertex on a monotone path with end vertices i H and i L is identiﬁed if the path has length τ ( i H ) − τ ( i L ) and the end vertices i H and i L are identiﬁed. Using these two facts, we canrecover the set of vertices that are identiﬁed as follows.First, let N [1] ⊂ N denote the set of vertices such that each vertex in the τ -collapsedgraph G τ is on a monotone path in G τ of length K − . For j ≥ generally, let N [ j +1] be the set of vertices each of which belongs to a monotone path, say, P , such that its endvertices i H and i L are from N [ j ] and τ ( i H ) − τ ( i L ) is equal to the length of the monotonepath P . Then deﬁne N ∗ ≡ (cid:91) j ≥ N [ j ] . Given G τ , N ∗ is uniquely determined as a subset of N . It is not hard to see that if N = N ∗ and K is identiﬁed, the type structure τ is identiﬁed. The following theorem shows thatthis condition is in fact necessary for the identiﬁcation of τ as well. The proof of thetheorem is found in the appendix. Theorem 3.1.

Let G be a given comparability graph and G τ be its τ -collapsed graph. Thegroup structure τ is identiﬁed if and only if there exists a monotone path in G τ whose lengthis equal to K − and N = N ∗ . The length of a path is deﬁned as the number of the edges in the path. No monotone path in G τ can have length greater than K − . Note that there exists amonotone path in G τ whose length is equal to K − if and only if K is identiﬁed. Theconditions in the theorem are obviously satisﬁed if G contains a monotone path that ismonotone and covers all the vertices. The latter condition is trivially satisﬁed when G isa complete graph. Figure 3 gives a counterexample where the condition that there existsa monotone path in G τ whose length is equal to K − is satisﬁed, but N (cid:54) = N ∗ so thatthe comparability graph does not lead to the identiﬁcation of the group structure.

4. Consistent Estimation of the Ordered Group Structure

In this section, we develop a method to estimate the group structure consistently forthe case where the comparability graph is complete, so that we take E to be all ij with i, j ∈ N, i (cid:54) = j . We ﬁrst formulate three pairwise hypothesis testing problems for eachcomparable pair ij ∈ E : H +0 ,ij : δ ij ≤ against H +1 ,ij : δ ij > ,(4.1) H ,ij : δ ij = 0 against H ,ij : δ ij (cid:54) = 0 and H − ,ij : δ ji ≤ against H − ,ij : δ ji > .In most examples, we have various tests available. Instead of committing ourselves toa particular method of hypothesis testing, let us assume generally that we are given p -values ˆ p + ij , ˆ p ij and ˆ p − ij from the testing of H +0 ,ij , H ,ij and H − ,ij , against H +1 ,ij , H ,ij and H − ,ij respectively. Let L be the size of the sample (i.e., the number of the markets or games)that is used to construct these p -values. We will explain conditions for the p-values laterand explain details for construction of p -values using bootstrap in Section 4.3. The Selection-Split Algorithm.

Let us introduce a method of obtaining an or-dered partition ( ˆ N (cid:48) , ˆ N (cid:48) ) of a given set N (cid:48) using p -values ˆ p sij , s ∈ { + , , −} . Deﬁnition 4.1.

For a subset N (cid:48) ⊂ N , we say that the ordered partition of N (cid:48) into ( ˆ N (cid:48) , ˆ N (cid:48) ) is obtained by the Split Algorithm if it is obtained as follows. For each i ∈ N (cid:48) , we let ˆ N (cid:48) ( i ) = { j ∈ N (cid:48) \{ i } : log ˆ p + ij ≤ log ˆ p − ij − r L } and ˆ N (cid:48) ( i ) = { j ∈ N (cid:48) \{ i } : log ˆ p − ij ≤ log ˆ p + ij − r L } , where r L → ∞ satisﬁes Assumption 4.1 below. Set i ∗ = argmin i ∈ N (cid:48) min { s ( i ) , s ( i ) } ,where s ( i ) = 1 | ˆ N (cid:48) ( i ) | (cid:88) j ∈ ˆ N (cid:48) ( i ) log ˆ p + ij , and s ( i ) = 1 | ˆ N (cid:48) ( i ) | (cid:88) j ∈ ˆ N (cid:48) ( i ) log ˆ p − ij . (We set s ( i ) = 0 if ˆ N (cid:48) ( i ) is empty, and similarly with s ( i ) .) Then we take ( ˆ N (cid:48) , ˆ N (cid:48) ) = ( ˆ N (cid:48) ( i ∗ ) , N (cid:48) \ ˆ N (cid:48) ( i ∗ )) , if s ( i ∗ ) ≤ s ( i ∗ );( ˆ N (cid:48) , ˆ N (cid:48) ) = ( N (cid:48) \ ˆ N (cid:48) ( i ∗ ) , ˆ N (cid:48) ( i ∗ )) , if s ( i ∗ ) > s ( i ∗ ) . The set ˆ N (cid:48) ( i ) estimates the set of agents of lower type than i , and the set ˆ N (cid:48) ( i ) estimatesthe set of agents of higher type than i . Let N (cid:48) ( i ) = { j ∈ N (cid:48) \ { i } : τ ( i ) > τ ( j ) } , and N (cid:48) ( i ) = { j ∈ N (cid:48) \ { i } : τ ( i ) < τ ( j ) } . A necessary condition for ˆ N (cid:48) ( i ) to coincide with N (cid:48) ( i ) is that i has higher type thanthose in ˆ N (cid:48) ( i ) . The more negative the quantity s ( i ) is, the more likely that this necessarycondition is met. A similar observation applies to s ( i ) as well. Thus we choose a partitionbased on i ∗ that minimizes min { s ( i ) , s ( i ) } over i .Suppose that we are given an ordered partition ( ˆ N (cid:48) , ..., ˆ N (cid:48) s ) of N . The Selection-Split Algorithm that we propose produces an ordered partition ( ˆ N (cid:48)(cid:48) , ..., ˆ N (cid:48)(cid:48) s +1 ) of N from ( ˆ N (cid:48) , ..., ˆ N (cid:48) s ) using two steps, the Selection Step and the Split Step, as follows.

1. The Selection Step : Let ˆ p k = min i,j ∈ ˆ N (cid:48) k : i (cid:54) = j ˆ p ij , k = 1 , ..., s , and select ˆ N (cid:48) k ∗ such that ˆ p k ∗ = min ≤ k ≤ s ˆ p k .

2. The Split Step : We split ˆ N (cid:48) k ∗ into ( ˆ N (cid:48) k ∗ , , ˆ N (cid:48) k ∗ , ) using the Split Algorithm, and relabelthe partition: ( ˆ N (cid:48) , ..., ˆ N (cid:48) k ∗ − , ˆ N (cid:48) k ∗ , , ˆ N (cid:48) r ∗ , , ˆ N (cid:48) k ∗ +1 , ..., ˆ N (cid:48) s ) = ( ˆ N (cid:48)(cid:48) , ..., ˆ N (cid:48)(cid:48) s +1 ) .The Selection Step chooses a group ˆ N (cid:48) k ∗ that is most likely to contain agents withheterogeneous types and the Split Step splits this group into two sets using the SplitAlgorithm. The Selection-Split algorithm depends on the data only through the p-values ˆ p sij , s ∈ { + , , −} .4.2.2. The Classiﬁcation Method.

For a given positive integer K , partition N into K groups as follows. First, split N into ( ˆ N [2]1 , ˆ N [2]2 ) using the Split Algorithm to N , and applythe Selection-Split Algorithm sequentially to obtain ( ˆ N [3]1 , ˆ N [3]2 , ˆ N [3]3 ) , ( ˆ N [4]1 , ..., ˆ N [4]4 ) , and In many cases, it sufﬁces to consider a sequence such that r L / log L → . In practice, we propose r L = (log L ) / which satisﬁes Assumption 4.1 below under lower level regularity conditions. See SectionC.3 in the supplemental note for details. so on, until we have ( ˆ N [ K ]1 , ..., ˆ N [ K ] K ) for a given number K . For each K , we deﬁne ˆ V ( K ) = 1 K K (cid:88) k =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) min i,j ∈ ˆ N [ K ] k log ˆ p ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , and then select ˆ K = argmin ≤ K ≤ n ˆ V ( K ) + Kg ( L ) , where g ( L ) is slowly increasing in L . We take ˆ T ˆ K = ( ˆ N [ ˆ K ]1 , ..., ˆ N [ ˆ K ]ˆ K ) to be our estimatedgroup structure. The component ˆ V ( K ) measures the goodness-of-ﬁt of the classiﬁcation,and the second component Kg ( L ) represents a penalty term that prevents overﬁtting. Weshow that ˆ T ˆ K is consistent for the underlying group structure T deﬁned in (2.1) underregularity conditions. p -Values Using Bootstrap In most applications, we can use bootstrap to construct p -values for testing the in-equality restrictions of (4.1). For the sake of concreteness, we explain the bootstrapprocedure along the proposal made by Lee, Song, and Whang (2018). Suppose that weare given observations { Z (cid:96) } L(cid:96) =1 , where Z (cid:96) = ( Z i,(cid:96) ) ni =1 denotes the observations pertainingto market (cid:96) and Z i,(cid:96) denotes the vector of observations speciﬁc to agent i . Suppose thatfor each pair of agents i and j , there exists a nonparametric function, say, r ij ( x ) such that τ ( i ) ≥ τ ( j ) if and only if r ij ( x ) ≥ for all x ∈ X , where X is the common domain of thefunction r ij ( · ) , i, j ∈ N .To construct a test statistic, we ﬁrst estimate r ij ( x ) using the sample { Z (cid:96) } L(cid:96) =1 to ob-tain ˆ r ij ( x ) (e.g., using a kernel regression estimator). Then we construct the followingindexes: ˆ δ ij = (cid:90) max { ˆ r ij ( x ) , } dx and ˆ δ ij = (cid:90) | ˆ r ij ( x ) | dx. (4.2)For p -values, we re-sample { Z ∗ (cid:96) } L(cid:96) =1 (with replacement) from the empirical distributionof { Z (cid:96) } L(cid:96) =1 and construct a nonparametric estimator ˆ r ∗ ij ( x ) for each pair ( i, j ) in the sameway as we did using the original sample. Using these bootstrap estimators, we constructthe following bootstrap test statistics: ˆ δ ∗ ij = (cid:90) max (cid:8) ˆ r ∗ ij ( x ) − ˆ r ij ( x ) , (cid:9) dx and ˆ δ ∗ ij = (cid:90) (cid:12)(cid:12) ˆ r ∗ ij ( x ) − ˆ r ij ( x ) (cid:12)(cid:12) dx. The choice of g ( L ) = log log L appears to work very well from our numerous Monte Carlo simulationexperiments. See Bugni (2010), Andrews and Shi (2013), Chernozhukov, Lee, and Rosen (2013), Lee, Song, andWhang (2013), and Lee, Song, and Whang (2018), among many others, and references therein. Note that the bootstrap test statistic involves recentering to impose the null hypothesis.Now, the p -values, ˆ p + ij , ˆ p − ij , and ˆ p ij can be constructed from the bootstrap distributions of ˆ δ ∗ ij , ˆ δ ∗ ji , and ˆ δ ∗ ij respectively, using ˆ δ ij , ˆ δ ji and ˆ δ ij as test statistics. We prove consistency of the estimated classiﬁcation ˆ T ˆ K as L → ∞ while n ﬁxed. (Con-sistency results and the proof for the case of both n and L increasing to inﬁnity are foundin the supplemental note.) Let P be the collection of the distributions P of the wholevector of the observations in each market (cid:96) . For each ε > , ij ∈ E and s ∈ { + , , −} , wedeﬁne P s ,ij = { P ∈ P : δ sij ( P ) ≤ } , and P sε,ij = { P ∈ P : δ sij ( P ) ≥ ε } , where we write the pairwise indexes δ sij as δ sij ( P ) to reﬂect that the pairwise indexes de-pend on P . Thus P s ,ij is the collection of probabilities under the pairwise null hypothesis H s ,ij , and P sε,ij is the collection of probabilities under the pairwise alternative hypotheses H s ,ij such that δ sij ( P ) is away from zero at least by ε . Then we deﬁne P ,ε = (cid:91) s ∈{ + , , −} (cid:91) ij ∈E ( P s ,ij ∪ P sε,ij ) . We assume that the p -value takes the following form: ˆ p sij = 1 − ˜ F sij ( ˜ T sij ) , s ∈ { + , , −} , where ˜ F sij is a CDF and ˜ T sij is a random variable both of which depend on the data.Typically, ˜ T sij represents an appropriately normalized test statistic and ˜ F sij represents theCDF of the bootstrap distribution of the test statistic after recentering. We make thefollowing assumption. Assumption 4.1.

There exist sequences λ L → ∞ and ρ L → and constants c sij , s ∈{ + , , −} such that along each sequence of probabilities P L ∈ P ,ε and for each pair i, j ∈ N , the following holds for all s ∈ { + , , −} , as L → ∞ .(i) If τ ( i ) = τ ( j ) , ˜ T sij → d W sij , for some random variable W sij .(ii) If τ ( i ) > τ ( j ) , ˜ T + ij /λ L → P c + ij and ˜ T − ij = O P (1) .(iii) If τ ( i ) < τ ( j ) , ˜ T − ij /λ L → P c − ij and ˜ T + ij = O P (1) .(iv) If τ ( i ) (cid:54) = τ ( j ) , ˜ T ij /λ L → P c ij .(v) sup t ∈ R | ˜ F sij ( t ) − F sij, ∞ ( t ) | = O P ( ρ L ) , where F sij, ∞ is the CDF of W sij .(vi) r − L log(1 − F sij, ∞ ( c λ L ) + c ρ L ) → −∞ , for all constants c , c > .Here we are assuming that for each pair of agents the data has enough observationson markets in which the pair of agents participate so that we can perform consistent testsbased on pairwise comparison. Assumption 4.1 is typically satisﬁed for various choices of test statistics that arise in the literature of moment inequality testing. Assumption 4.1 is ahigh level assumption designed to accommodate various testing procedures. We providelower level conditions in the case of nonparametric tests based on Lee, Song, and Whang(2018) in the next subsection. Theorem 4.1.

Suppose that Assumption 4.1 holds, and that g ( L ) → ∞ and g ( L ) /r L → as L → ∞ . Then, for any ε > , along a sequence of probabilities P L from P ,ε , P L { ˆ K = K } → , as L → ∞ , and the estimated group structure ˆ T ˆ K satisﬁes that as L → ∞ ,P L { ˆ T ˆ K = T } → . The proof of Theorem 4.1 is in the supplemental note. It proceeds in two steps. First,we show that ˆ T K is consistent for T . Second, we show that ˆ K is consistent for K . Tosee the intuition for this second step, note that when K ≥ K , the component ˆ V ( K ) is O P (1) , and when K < K , the component ˆ V ( K ) diverges at a rate faster than g ( L ) . Fromthis, we obtain that ˆ K is consistent for K . Lower level conditions for Assumption 4.1 can be found from the literature of testingfor moment inequality restrictions. For the sake of concreteness, we focus on the situationwhere nonparametric function r ij ( x ) introduced in Section 4.3 arises from differencebetween two nonparametric regression functions and the testing procedure is done bythe method proposed in Lee, Song, and Whang (2018). Suppose that we have g i ( x ) > g j ( x ) , ∀ x ∈ X if and only if τ ( i ) > τ ( j ); (4.3) g i ( x ) = g j ( x ) , ∀ x ∈ X if and only if τ ( i ) = τ ( j ); g i ( x ) < g j ( x ) , ∀ x ∈ X if and only if τ ( i ) < τ ( j ) , where g i ( x ) = E [ Y i,(cid:96) | X i,(cid:96) = x ] , i ∈ N , and (cid:96) = 1 , ..., L , is the sample unit index. We take r ij ( x ) = g i ( x ) − g j ( x ) . Let us deﬁne a kernel estimator of g i ( x ) as follows: ˆ g i ( x ) = L (cid:88) (cid:96) =1 Y i,(cid:96) K h ( X i,(cid:96) − x ) L (cid:88) (cid:96) =1 K h ( X i,(cid:96) − x ) , where K h ( x ) = K ( x/h ) /h , and K ( · ) is a multivariate kernel and h is a bandwidth. We let ˆ r ij ( x ) = ˆ g i ( x ) − ˆ g j ( x ) . Then the test statistics we use are deﬁned as ˆ δ + ij = (cid:90) X max { ˆ r ij ( x ) , } dx, ˆ δ − ij = (cid:90) X max { ˆ r ji ( x ) , } dx, and(4.4) ˆ δ ij = (cid:90) X | ˆ r ij ( x ) | dx. As for the bootstrap test statistics, we ﬁrst obtain bootstrap samples { ( Y ∗ i,(cid:96) , X ∗ i,(cid:96) ) i ∈ N } L(cid:96) =1 byresampling the vector ( Y ∗ i,(cid:96) , X ∗ i,(cid:96) ) i ∈ N from the empirical distribution of { ( Y i,(cid:96) , X i,(cid:96) ) i ∈ N } L(cid:96) =1 with replacement. Using the bootstrap sample, we construct ˆ r ∗ ij ( x ) = ˆ g ∗ i ( x ) − ˆ g ∗ j ( x ) , where ˆ g ∗ i ( x ) = L (cid:88) (cid:96) =1 Y ∗ i,(cid:96) K h ( X ∗ i,(cid:96) − x ) L (cid:88) (cid:96) =1 K h ( X ∗ i,(cid:96) − x ) . Then the bootstrap test statistics we use are deﬁned as ˆ δ + ∗ ij = (cid:90) X max { ˆ r ∗ ij ( x ) − ˆ r ij ( x ) , } dx, ˆ δ −∗ ij = (cid:90) X max { ˆ r ∗ ji ( x ) − ˆ r ji ( x ) , } dx, and(4.5) ˆ δ ∗ ij = (cid:90) X | ˆ r ∗ ij ( x ) − ˆ r ij ( x ) | dx. Let the CDF of the bootstrap distribution of ˆ δ s ∗ ij be denoted by F sij . Then we set thepairwise p -value to be ˆ p sij = 1 − F sij (ˆ δ sij ) . In this situation, we provide a low level conditionfor Assumption 4.1.Let a sij,L and σ sij,L be sequences of constants such that a sij,L = O (1) , and σ sij,L → σ sij > , (4.6)as n, L → ∞ . We take ˜ T sij = ( √ L ˆ δ sij − h − d/ a sij,L ) /σ sij,L , and(4.7) ˜ T s ∗ ij = ( √ L ˆ δ s ∗ ij − h − d/ a sij,L ) /σ sij,L . (The researcher does not need to know, estimate, or use the constants a sij,L and σ sij,L forthe construction of the pairwise p -values and for the implementation of the classiﬁcation One could also use alternative bootstrap statistics using estimated contact sets as in Lee, Song, andWhang (2018) to enhance the power. For simplicity of exposition, here we present the case where we usethe least favorable conﬁgurations. algorithm of this paper.) Then we can rewrite ˆ p sij = 1 − ˜ F sij ( ˜ T sij ) , where ˜ F sij is the CDF of the bootstrap distribution of ˜ T s ∗ ij . Let (cid:107) · (cid:107) ∞ denote the sup norm,i.e., (cid:107) f (cid:107) ∞ = sup x | f ( x ) | for any real function f . Let us consider the following set ofassumptions. Assumption 4.2. (i) For each s ∈ { + , , −} and each i, j ∈ N , there exist sequences ofconstants a sij,L and σ sij,L such that the conditions in (4.6) hold, and ˜ T sij → d N (0 , , and (cid:107) ˜ F sij − Φ (cid:107) ∞ = O P ( L − α ) for some α > , and Φ is the CDF of N (0 , .(ii) sup x ∈X | ˆ r ij ( x ) − r ij ( x ) | = o P (1) , as L → ∞ .(iii) Suppose that Lh d → ∞ while h → as L → ∞ .The lower level conditions for Condition (i) can be found in Lee, Song, and Whang(2018). Condition (ii) follows if the kernel regression estimators ˆ g i ( x ) are uniformlyconsistent. (See, e.g., Hansen (2008).) Condition (iii) is a standard bandwidth conditionin the literature of kernel estimators. Then, we obtain the following lemma. Lemma 4.1.

Suppose that Assumption 4.2 holds, and that the sequence r L → ∞ is suchthat r L / log L → as L → ∞ . Then Assumption 4.1 holds. The proof of this lemma is found in the appendix.

The estimated group structure can be used as a ﬁrst-step estimator in a two-step pro-cedure for estimating a structural parameter. Recall that τ : N → { , ..., K } deﬁnes thegroup structure. Let us denote τ ( · ; P ) to indicate that the group structure is identiﬁed.Suppose that θ is a structural parameter that is identiﬁed as follows: Q ( θ, τ ( · ; P ); P ) asa function of θ has a unique minimizer in a parameter space Θ and θ = arg min θ ∈ Θ Q ( θ, τ ( · ; P ); P ) . In many applications Q ( θ, τ ( · ; P ); P ) is a population objective function that arises fromGeneralized Method of Moment (GMM) estimation or Maximum Likelihood Estimation(MLE). The two step estimator ˆ θ of θ is deﬁned as ˆ θ = arg min θ ∈ Θ ˆ Q ( θ, ˆ τ ( · )) , where ˆ τ ( · ) is the ﬁrst-step estimator of the group structure that is consistent, i.e., for all ε > , P (cid:26) max ≤ i ≤ n | ˆ τ ( i ) − τ ( i ; P ) | > ε (cid:27) → , (4.8)as L → ∞ . As we saw before, one can obtain such estimator ˆ τ using our Select-Splitalgorithm. Let ˜ θ be such that ˜ θ = arg min θ ∈ Θ ˆ Q ( θ, τ ( · ; P )) , so that ˜ θ is an infeasible estimator when one uses the true group structure τ ( · ; P ) ratherthan the estimated version ˆ τ ( · ) . The asymptotic normality of √ L (˜ θ − θ ) can be derivedusing the standard arguments, for example, using the general results in Newey and Mc-Fadden (1994). Then it is not hard to see that √ L (˜ θ − θ ) has the same limit distribution.To see this, by taking ε < , we ﬁnd from (4.8) that P (cid:26) max ≤ i ≤ n | ˆ τ ( i ) − τ ( i ; P ) | > (cid:27) → . Therefore, P { ˆ θ (cid:54) = ˜ θ } = P (cid:26) arg min θ ∈ Θ ˆ Q ( θ, ˆ τ ( · )) (cid:54) = arg min θ ∈ Θ ˆ Q ( θ, τ ( · ; P )) (cid:27) ≤ P (cid:26) max ≤ i ≤ n | ˆ τ ( i ) − τ ( i ; P ) | > (cid:27) → , as L → ∞ . Hence ˆ θ = ˜ θ with probability approaching one, as L → ∞ . This means that ifthe inference based on ˜ θ is asymptotically valid, so is that based on ˆ θ .Note that this asymptotic validity holds pointwise in P . Given the known failure ofuniform validity (uniform in P ) for post-model selection inference (Leeb and P¨otcher(2005)), it is likely that the inference based on this two-step estimator ˆ θ fails to satisfyasymptotic validity uniformly in P , unless one modiﬁes the procedure appropriately. Inthis general set-up, it is far from trivial to ﬁnd such a modiﬁcation. We leave it to futureresearch.

5. Monte Carlo Simulations

We use a model of a ﬁrst-price procurement auction with asymmetric independentprivate costs to study performance of our classiﬁcation procedure. (See Appendix B.1 ofthe supplemental note.) Bidders are classiﬁed into K groups. We abstract away from theformation of equilibrium strategies, and draw bids from a normal distribution N ( µ k , σ ) Table 1: Group Structure in ExperimentsStructure n K n k S1 12 2 6S2 12 4 3S3 40 2 20S4 40 4 10

Note: n denotes the total number of the bidders; K denotes the number of the groups; n k denotes the number of actual bidders from group k . For each structure in the simulation design,groups all have the same number of bidders. for each group k = 1 , ..., K . Let L denote the number of auctions in which any givenpair of bidders participate. We consider two speciﬁcations of µ k ’s. In one speciﬁcation, µ = 2 . , µ = 2 . , µ = 3 . , and µ = 3 . with increment D µ = 0 . , and in the otherspeciﬁcation, µ = 2 . , µ = 2 . , µ = 2 . , and µ = 2 . with increment D µ = 0 . . Thevariance σ is taken to be . .Table 1 summarizes the designs of group structures in our simulation. The ﬁrst twostructures involve a total of 12 bidders and the last two 40 bidders. The ﬁrst and thirdare designed to be coarser group structures than the second and fourth respectively. Weconstruct p -values using the procedure in Section 4.3 and obtain group classiﬁcation from500 simulated samples. For each estimate, we used 200 bootstrap iterations to calculate p -values.To evaluate the performance of our classiﬁcation method, we deﬁne a measure of dis-crepancy between two ordered partitions T and T : δ ( T , T ) = 1 K K (cid:88) k =1 min ≤ j ≤ K | N k (cid:52) N j | , (5.1)where T = ( N , ..., N K ) and T = ( N , ..., N K ) are ordered partitions of N and (cid:52) denotes set-difference: A (cid:52) B = ( A \ B ) ∪ ( B \ A ) . We evaluate our classiﬁcation methodusing two criterion: (1) Expected Average Discrepancy (EAD) deﬁned as E ( δ ( T, ˆ T ˆ K )) and(2) HAD ( λ ) ≡ P { δ ( T, ˆ T ˆ K ) > λn } for < λ < .Table 2 reports estimates when there is no unobserved heterogeneity among bidders( K = 1 ). In this case, our procedure detects the absence of unobserved heterogeneityeffectively. For a given n , there is a moderate increase in the accuracy of classiﬁcation as L increases, both in terms of EAD and HAD( λ ).Table 3 reports results for K = 2 and K = 4 . In both cases, the estimates for K aremostly correct. Estimation accuracy increases with the difference between group means.For a given number of groups, the performance in terms of EAD and HAD are both betterwith greater group differences and larger sample sizes. Table 2: Performance of the Classiﬁcation with One Group ( K = 1 and unknown) n L ˆ K EAD HAD(.10) HAD(.25) HAD(.50)12 400 1.002 0.012 0.001 0.000 0.00012 200 1.003 0.014 0.002 0.000 0.00012 100 1.003 0.018 0.002 0.001 0.00040 400 1.003 0.082 0.005 0.003 040 200 1.006 0.084 0.008 0.002 040 100 1.008 0.096 0.010 0.004 0

Note: n is the number of bidders in data; and L the number of markets. ˆ K is the average number ofestimated groups in 500 simulation samples. EAD is the average number of mismatched bidders acrosstrue groups and simulated samples. HAD( λ ) is the hazard rate of average discrepancy. For example,HAD(.10) = 0.002 means that in 499 simulated samples (out of a total of 500) the average number ofmismatched bidders is less than 10 percent of the total number of bidders. Table 3: Performance of the Classiﬁcation with Multiple Groups ( K ≥ and unknown) K = 2 K = 4 n L D µ ˆ K EAD HAD(.25) HAD(.75) ˆ K EAD HAD(.25) HAD(.75)12 400 0.6 2.00 0.00 0.00 0.000 3.96 0.03 0.01 0.0012 400 0.2 2.00 0.01 0.00 0.000 3.94 0.04 0.03 0.0012 100 0.6 2.00 0.00 0.00 0.000 3.98 0.01 0.01 0.0012 100 0.2 2.03 0.52 0.07 0.004 3.24 1.53 0.24 0.0040 400 0.6 2.00 0.01 0.00 0.000 3.97 0.08 0.02 0.0040 400 0.2 2.01 0.01 0.00 0.000 3.83 0.43 0.09 0.0040 100 0.6 2.01 0.01 0.00 0.000 3.95 0.13 0.03 0.0040 100 0.2 2.18 1.91 0.02 0.000 3.06 1.93 0.49 0.11

Note: ˆ K , EAD and HAD( λ ) are deﬁned as in Table 2. D µ is the difference between group means µ and µ . Conditional on the number of markets ( L ) and the number of bidders in population ( n ), theclassiﬁcation task is harder when the difference between group means D µ is smaller. Misclassiﬁcation errors tend to arise less often when the number of true groups issmaller, with the exception of D µ = 0 . and L = 100 . Intuitively, this is because whenthe set of bidders is partitioned into fewer groups, we can use pairwise inequalities forclassiﬁcation. In this section we use a simple structural model of procurement auctions to investigatethe impact of classiﬁcation errors on subsequent estimation of structural parameters. Aset of N providers (bidders) is partitioned into K groups, each with a distinct distributionof private costs. Let N k denote the set of providers in N with type k ∈ { , , ..., K } , and let | N k | denote its cardinality. The cost for a provider i with type τ ( i ) ∈ { , , ..., K } isgiven by c i,(cid:96) = µ τ ( i ) + (cid:15) i,(cid:96) , where (cid:15) i,(cid:96) follows N (0 , σ ) with the support [ c, ¯ c ] . Auction participants are determined in two steps. First, two out of K groups, τ l, and τ l, , are chosen at random. Next, n and n providers are randomly drawn from thecorresponding groups N τ l, and N τ l, and their costs were constructed as above. Here n and n denote the numbers of actual participants (those who submitted bids in theauction). Then participants bid based on their realized costs. The participant with thelowest bid wins. The identity of each participant and its bid are both reported in data. Weconsider two speciﬁcations: ( S ) with K = 4 , | N k | = 4 for all k , and ( S ) with K = 4 , | N k | = 10 for all k . For both speciﬁcations, we set µ = (2 , . , . , . and σ = 0 . . Werun the following four experiments with different speciﬁcation and sample sizes: (A) S , L = 200 ; (B) S , L = 200 ; (C) S , L = 400 ; and (D) S , L = 400 . We set n = 1 and n = 1 so that each auction has actual participants chosen from K different types.Structural parameters K , τ ( · ) , θ = { ( µ k ) K k =1 , σ } are estimated via two steps. First,the group structure ( ˆ τ ( · ) and ˆ K ) are estimated using our classiﬁcation algorithm. Next,we apply GMM to estimate the remaining structural parameters using the following mo-ments for k = 1 , ..., K : (1) within-group means of bids: (cid:80) ni =1 E [ B i,(cid:96) − µ B,k ( θ ; I )]1 { ˆ τ ( i ) = k } = 0 ; and (2) within-group second moment of bids: (cid:80) ni =1 E [ B i,(cid:96) − ( µ B,k ( θ ; I ) + σ B,k ( θ ; I ) )]1 { ˆ τ ( i ) = k } = 0 , where µ B,k ( θ ; I ) and σ B,k ( θ ; I ) denote the mean and stan-dard deviation of equilibrium bid distribution for bidders from group k , θ the vector ofparameters and I the proﬁle of participant types. Standard errors are computed from theanalytic expression for the covariance matrix in asymptotic distribution.To compute µ B,k ( θ ; I ) and σ B,k ( θ ; I ) for a given θ and proﬁle of participant types I =( τ l, , τ l, , n , n ) we simulate the equilibrium bidding functions. The bidding functionsare then combined with the cost distributions implied by a vector of trial parameters, θ ,to obtain the distribution of bids: F B,k ( b | θ , I ) = F C,k ( β − k ( b ) | θ ) . Here F C,k ( . | θ ) denotesthe distribution of project’s cost for a bidder belonging to group k which is correspondsto a parameter vector θ and β k ( c ) , β − k ( b ) are the bid and the inverse bid functionsused by such bidder. We then compute the mean and the standard deviation of this biddistribution.Tables 4 and 5 report the bias and mean squared errors (MSEs) of two estimators for ( µ k ) K k =1 and σ . The ﬁrst is an “infeasible” estimator that uses the knowledge of the truegroup structure. The second is the two-step estimator we propose, which requires bidder We set the upper and lower bounds of costs to c = K (cid:80) k ( µ k − . × σ ) and ¯ c = K (cid:80) k ( µ k + 1 . × σ ) .True parameters are chosen so that c is strictly positive. Speciﬁcally, we start from the analytical bidding function when all participants belong to the same group,and use a modiﬁed version of the numerical method in Marshall, Meurer, Richard, and Stromquist (1994)to solve for bidding strategies in the presence of multiple groups. We impose a sample version of c and ¯ c by replacing σ with its sample analog. Table 4: Simulation Results from Speciﬁcations A and B µ µ µ µ σ Spec A Using True Groups Rej. Prob. 0.0150 0.0515 0.0523 0.0546 0.0149Bias -0.0189 -0.0252 -0.0610 -0.0511 0.0242MSE 0.0005 0.0008 0.0035 0.0039 0.0039Using Est’d Groups Rej. Prob. 0.0148 0.0542 0.0510 0.0485 0.0151Bias 0.0059 0.0329 -0.0241 -0.0225 -0.0549MSE 0.0041 0.0083 0.0035 0.0027 0.0383Spec B Using True Groups Rej. Prob. 0.0120 0.0515 0.0512 0.0514 0.0111Bias -0.0211 -0.0233 -0.0621 -0.0622 0.0236MSE 0.0005 0.0007 0.0039 0.0039 0.0034Using Est’d Groups Rej. Prob. 0.0131 0.0550 0.0540 0.0530 0.0160Bias -0.0213 -0.0218 -0.0763 -0.0765 0.0211MSE 0.0004 0.0015 0.0411 0.0441 0.0023

Note: Speciﬁcation A uses K = 4 , n k = 4 , and L = 200 and Speciﬁcation B uses K = 4 , n k = 10 , and L = 200 . Here n k is the number of bidders in group k , and L the number of markets. The rejectionprobabilities are from t -tests for the individual parameters. The nominal rejection probability is set to0.05. classiﬁcation in the ﬁrst step. These two tables also report the rejection probabilities fromt-tests of individual parameters.Table 4 reports results for a smaller sample with L = 200 . The rejection probabilitiesare close to the nominal rejection rate 0.05, except for parameters µ and σ . The MSEand bias for all parameters are reasonably small. Table 4 also shows that the rejectionprobabilities for the infeasible estimator using the true groups and those for the actualestimator using the estimated groups are very similar. There is some minor differencebetween these two estimators in the MSE of some group means. The discrepancy seemsmore pronounced when the size of each group is increased from | N k | = 4 to | N k | = 10 .Table 5 reports the same results for larger samples with L = 400 . The performanceof the estimators improves slightly relative to Table 4. Again, the rejection probabilitiesfor the two estimators are similar. There is also evidence that with a larger number ofthe markets, our classiﬁcation method performs better given the same number of within-group bidders. Overall, Table 4 and 5 provide simulation evidence that the classiﬁcationerrors in the ﬁrst-step do not have any major impact on the ﬁnite sample performance ofthe two-step estimators for structural parameters. Table 5: Simulation Results from Speciﬁcations C and D µ µ µ µ σ Spec C Using True Groups Rej. Prob. 0.0149 0.0500 0.0513 0.0520 0.0149Bias -0.0214 -0.0222 -0.0615 -0.0713 0.0237MSE 0.0005 0.0007 0.0039 0.0039 0.0006Using Est’d Groups Rej. Prob. 0.0151 0.0485 0.0526 0.0545 0.0151Bias 0.0131 0.0412 -0.0625 -0.0656 -0.0241MSE 0.0060 0.0008 0.0053 0.0031 0.0054Spec D Using True Groups Rej. Prob. 0.0133 0.0480 0.0510 0.0520 0.0108Bias -0.0227 -0.0219 -0.0611 -0.0231 0.0236MSE 0.0005 0.0007 0.0039 0.0039 0.0006Using Est’d Groups Rej. Prob. 0.0126 0.0498 0.0520 0.0520 0.0128Bias -0.0229 -0.0123 -0.0761 -0.0361 0.0098MSE 0.0093 0.0056 0.0068 0.0061 0.0007

Note: Speciﬁcation C uses K = 4 , n k = 4 , and L = 400 , and Speciﬁcation D, K = 4 , n k = 10 , and L = 400 . Here n k is the number of bidders in group k and L the number of markets. The nominalrejection probability is set to 0.05.

6. Empirical Application: California Market for Highway Pro-curement

We apply our methodology to analyze procurement auctions conducted by the Cali-fornia Department of Transportation (CalTrans) to allocate projects for highway repairwork. Our goal is to demonstrate the performance of our method in the empirical set-ting, and to highlight the consequences of ignoring agent unobserved heterogeneity inestimation.

Model.

We follow the literature in modeling the auction market so that each project at-tracts a set of potential bidders who decide whether to participate in the auction and,if deciding to participate, choose a bid to submit. Our main innovation is to allow forcontractors participating in this market to differ in a way that is not observed by the re-searcher. Speciﬁcally, each contractor is characterized by a contractor-speciﬁc cost factor(invariant across projects) q i which takes discrete values in { ¯ q , ¯ q , ..., ¯ q K } . This un-observed cost factor captures the difference in cost efﬁciencies across ﬁrms generatedperhaps by the differences in managerial ability or other factors associated with the ﬁrmorganization. As in our basic set up this cost factor induces partitioning of the populationof ﬁrms participating in this market into the groups: N = ∪ K k =1 N k with N k = { i : q i = ¯ q k } and so that τ ( i ) = k if and only if i ∈ N k . Following the convention in the empirical auction literature, we assume that eachproject (cid:96) auctioned in this market is summarized by a set of observable characteristics X (cid:96) and an unobservable factor U (cid:96) . The latter is distributed according to normal distribu-tion with mean zero and standard deviation σ U . The set of ﬁrms which are potentiallyinterested in project (cid:96) (potential bidders), denoted here by S (cid:96) , is exogenously drawn from N . A contractor i that is a potential bidder for project (cid:96) is characterized by private entrycosts, E i,(cid:96) , and the private cost of completing the project, C i,(cid:96) . We assume that privatecosts vary independently across bidders and auctions. The entry costs additionally areindependent of U (cid:96) , and are distributed according to the exponential distribution witha rate parameter λ i,(cid:96) . The costs of completing the work are drawn from a log normaldistribution with mean µ i,(cid:96) and standard deviation σ C . The mean of the cost distribu-tion depends on project characteristics including the distance between the project andthe bidder’s locations, D i,(cid:96) , as well as an unobserved cost factor q i . , Reﬂecting thesefeatures, we set µ i,(cid:96) = X (cid:96) α + D i,(cid:96) α + K (cid:88) k =1 ¯ q k { τ ( i ) = k } + U (cid:96) , and λ i,(cid:96) = X (cid:96) γ + K (cid:88) k =1 ˜ q k { τ ( i ) = k } . A potential bidder decides to participate in the auction for project (cid:96) if his ex-ante ex-pected proﬁt conditional on participation exceeds entry costs. The set of such bidders isdenoted by A (cid:96) . A bidder who decides to participate observes realization of his costs andthe identities of other contractors who decided to participate. He chooses a bid to maxi-mize his interium proﬁt which reﬂects the probability of winning the project conditionalon his costs and the set of competitors. We assume that the observed outcomes reﬂect atype-symmetric pure-strategy Bayesian Nash equilibrium (psBNE). Estimation Details.

The estimation methodology consists of two steps. In the ﬁrst stepwe use the pairwise comparison indexes to recover the unobserved group structure. In thesecond step the parameters of the model are estimated through a GMM procedure while The groups reﬂect differences in the contractors’ cost efﬁciencies related to the project work. While entrycosts may also vary across groups, there is no reason for the group differences in project costs to coincidewith the group differences in entry costs. For this reason, we explicitly distinguish between the parameterscapturing the former ( ¯ q k ) and the latter ( ˜ q k ) effects. Following the literature, we distinguish between the bidders who regularly participate in the procure-ment market (regular bidders) and those who only appear in a very small number of auctions (fringebidders). We assume that all fringe bidders are associated with the same ﬁxed level of the unobserved costfactor ¯ q . The expected proﬁt reﬂects his expectation over the participation decisions of other potential bidders,the expectation over his costs of completing the project, and reﬂects expected probability of winning theproject which depends on the costs draws of his competitors. In such an equilibrium, participants who are ex ante identical in an auction (cid:96) (i.e. i, j ∈ S (cid:96) such that q i = q j , and D i,(cid:96) = D j,(cid:96) ) adopt the same strategies. imposing the group structure recovered in the ﬁrst step. We assume bids are rationalizedby a single equilibrium.In the ﬁrst step we use the pairwise comparison indexes derived in Appendix B.1 inthe supplemental note to recover the unobserved group structure. Speciﬁcally, in ac-cordance with the notation used in the paper, we deﬁne δ ij ≡ (cid:82) max { r ij ( b ) , } db and δ ij ≡ (cid:82) | r ij ( b ) | db with r ij ( b ) = G ji ( b | d ) − G ij ( b | d ) and G ij ( b | d ) = P { B i,(cid:96) ≥ b | D i,(cid:96) = d, D j,(cid:96) = d, i ∈ A (cid:96) , j ∈ A (cid:96) } . We obtain empirical counterparts of these indexes byreplacing G ij ( b | d ) with its sample analog ˆ G ij ( b | d ) . We implement classiﬁcation using thebootstrap testing procedure described previously.In the second stage, we consider the following moments: (a) the ﬁrst and the secondmoment of bid distribution for a given level of d and for a given group of bidders; (b) thecovariance between bids and the observable project characteristics; (c) the covariancebetween any two bids submitted in the same auction; (d) the expected number of partici-pants in any given auction for every ( d, q ) -group; (e) the covariance between the numberof participants and the observable project characteristics. We search for the set of param-eters which minimizes the distance between the empirical and theoretical counterpartsof these moments subject to participation constraints. Estimation Results.

We implement the analysis using the data for California HighwayProcurement projects auctioned between 2002 and 2012. The projects in our sample areworth $523,000 and last for around three months on average; 38% of these projects arepartially supported through federal funds. There are 25 ﬁrms that participate regularlyin this market. The other ﬁrms are referred to as “fringes”. An average auction attractssix regular potential bidders and eight fringe bidders. Since only a fraction of potentialbidders submits bids, an entry decision plays an important role in this market. Finally,the distance to the company location varies quite a bit and is around 28 miles on averagefor regular potential bidders.In the ﬁrst step, we obtain through our classiﬁcation method the grouping of the bid-ders into eight groups that consist of 2, 3, 8, 3, 2, 3, 2 and 2 bidders respectively. The The pairwise comparison indexes are derived using Corollary 3 of Lebrun (1999) which for G ij ( b ) deﬁnedas P { B i,(cid:96) ≥ b | i, j ∈ S (cid:96) } establishes that G ij ( b ) ≤ G ji ( b ) for all b in the common support of B i,(cid:96) and B j,(cid:96) whenever τ ( i ) ≤ τ ( j ) . The inequality holds strictly at least over some interval with positive Lebesguemeasure and holds unconditionally when aggregated over bidder identities and auction characteristics. We recover group structure on the basis of the indexes which aggregate over the values of the distance d . As a robustness check we also compute groupings on the basis of subsets of distances. We ﬁnd that theresults of classiﬁcation are very similar across these approaches. We do not explicitly solve for participation strategies. Instead, we discretize the support of auctioncharacteristics ( X (cid:96) , U (cid:96) ) and treat the probabilities of participation for bidders of various ( q, d ) − types cor-responding to these grid values as auxiliary parameters. We follow the spirit of Dube, Fox, and Su (2012)by maximizing a moment-based objective function subject to the constraints that the optimality of theparticipation strategies is satisﬁed on the grid of the project characteristics’ values. Table 6. Parameter EstimatesEstimate Std. Error Estimate Std. Error P-valueThe Distribution of Project CostsConstant ( ¯ q ) 0.127 ∗∗∗ (0.0129) 0.113 ∗∗∗ (0.0119) 0.216Eng. Estimate -0.0004 ∗∗∗ (0.0002) -0.0005 ∗∗∗ (0.0002) 0.392Duration 0.00026 ∗ (0.00036) 0.00022 ∗ (0.00027) 0.212Distance 0.0012 ∗∗∗ (0.00022) 0.00086 ∗∗∗ (0.00019) 0.041Bridge -0.0092 ∗∗∗ (0.0018) -0.012 ∗∗∗ (0.0011) 0.074Federal Aid -0.043 ∗∗∗ (0.0103) -0.078 ∗∗∗ (0.009) 0.012Regular Bidder -0.035 ∗∗∗ (0.003) σ C ∗∗∗ (0.032) 0.112 ∗∗∗ (0.022) 0.087 σ U ∗∗∗ (0.009) 0.0207 ∗∗∗ (0.008) 0.452The Distribution of Entry CostsConstant ( ˜ q ) -0.0114 ∗ (0.0078) -0.0161 ∗ (0.0091) 0.212Eng. Estimate 0.0055 ∗∗∗ (0.0016) 0.0051 ∗∗∗ (0.0012) 0.333Number of Items 0.0018 ∗ (0.0011) 0.0011 ∗∗∗ (0.0005) 0.082Regular Bidder -0.022 ∗∗∗ (0.004) Note: In the results above the distance is measured in miles. The fringe bidders are the reference group.The results are based on the data for 1,054 medium-sized projects that involve paving and bridge work.Standard errors are computed using bootstrap. The ﬁrst two columns correspond to the speciﬁcationwhich allows for the unobserved bidder heterogeneity; the next two columns correspond to thespeciﬁcation without unobserved bidder heterogeneity. The last column reports the p-value of thebootstrap-based test of the equality of coefﬁcients estimated under the speciﬁcations with and withoutunobserved bidder heterogeneity. Details of the test can be found in the Supplemental Note to the paper. parameter estimates obtained in the second stage of our estimation procedure and theirstandard errors are summarized in Table 6. We normalize bids by the engineer’s estimatein the estimation. Therefore all the parameters measure the effects relative to the projectsize.The ﬁrst two columns present the estimates which are obtained when the unobservedgroup structure is taken into account in the estimation. The results indicate signiﬁcantdifferences in bidders’ costs across the groups. Speciﬁcally, fringe bidders (the referencegroup) tend to have the highest costs whereas the difference in costs between the group offringe bidders and the groups of regular bidders is comparable in impact to the shorteningof the distance to the project site by 42.5 (i.e., by 0.051/0.0012), 10.1, 26.67, 48.33,11.67, 6.67, 7.5, and 41.67 miles respectively. The distance increases project costs(additional 8.33 miles result in costs which are 1% higher on average). The entry costs The estimates of the group-speciﬁc ﬁxed effects are omitted for brevity. The full table that contains theseestimates is found in the supplemental note to the paper. Recall that the coefﬁcients reﬂect the impact on costs in terms of the fraction of the engineer’s estimate.The distance resulting in 0.01 increase of average costs can thus be computed as 0.01/0.012. of regular bidders are signiﬁcantly lower than entry costs of fringe bidders. However,they appear to be quite similar across the groups of regular bidders.The next two columns of Table 6 show the parameter estimates under the speciﬁcationwhen the unobserved group structure of the regular bidders is ignored in the estimation.The parameter estimates are obtained by the GMM estimation procedure using the sameset of moments by imposing that only two groups of sellers are present in the data: fringeand regular bidders. Under this speciﬁcation, the cost reduction due to the federal aidis estimated to be much higher (7.8% rather than 4.3%), the impact of the distance isestimated to be lower (the distance to the project has to be 11.67 miles higher in order toincrease the average cost by 1%). Additionally, the entry costs are estimated to be lowerrelative to the baseline speciﬁcation.The last column reports the results of the bootstrap-based test of the equality of coef-ﬁcients estimated under the speciﬁcations with and without unobserved bidder hetero-geneity. The results indicate that the difference is signiﬁcant for the mean parameters infront of the distance, the indicator for the federal aid, the indicator that a project entailsbridge-related work, and the standard deviation for the distribution of project costs. Theeffect of the number of items on the distribution of entry costs is also signiﬁcant. Ourresults thus conﬁrm that regular participants in the highway procurement market arecharacterized by important unobserved cost differences that persist in the data.

7. Conclusion

This paper makes a number of contributions to the literature. First, for models withstrategic interdependence between multiple agents, we develop a method to classify theseagents based on their discrete unobserved individual heterogeneity, using pairwise in-equalities implied by an economic model. Second, we show such pairwise inequalitiesarise in a number of game-theoretical settings where identiﬁcation of model primitives ischallenging. Third, we propose a computationally feasible method which consistently es-timates the group structure deﬁned by unobserved heterogeneity. We apply this methodto California highway procurement data to show that unobserved bidder heterogeneityplays an important role in this procurement market.The classiﬁcation method proposed in this paper is especially useful in settings wherethe analysis of unobserved individual heterogeneity is complicated by the presence ofstrategic interdependence in the model. We offer new insights into the identiﬁcation andestimation of such models. Speciﬁcally, classiﬁcation could be used as a ﬁrst step in thestructural studies of many environments where analyses would otherwise be infeasibledue to the identiﬁcation or computational challenges.

8. Appendix: Mathematical Proofs

Proof of Theorem 3.1 : Sufﬁciency is obvious. We focus on necessity. Let us assume that τ is identiﬁed. First consider the two facts:Fact 1: If G τ does not contain a monotone path of length K − , τ is not identiﬁed.Fact 2: A vertex i is identiﬁed if and only if there is a monotone path P containing i suchthat its end vertices i H and i L are identiﬁed and τ ( i H ) − τ ( i L ) = (cid:96) ( P ) , where (cid:96) ( P ) denotes the length of P .By Fact 1, the necessity of G τ containing a monotone path of length K − follows,and Fact 2 completes the proof of the necessity part of the theorem.Now let us prove Fact 1. Suppose that G τ does not contain a monotone path of length K − . Let N max be the set of vertices such that for each vertex i in N max , all his G τ -neighbors have lower type than the vertex i . Then there is no edge in G τ which joins anytwo vertices from the set N max . Choose a vertex i ∗ from N max which is an end vertex of alongest monotone path, say, with length K − < K − . This identiﬁes a lower boundfor K but there is no upper bound for K that we can obtain from G τ . Take any τ (cid:48) suchthat τ (cid:48) ( i ∗ ) > τ ( i ∗ ) and τ (cid:48) ( i ) = τ ( i ) for all i ∈ N \ { i ∗ } . Then τ (cid:48) is compatible with G τ andthe given comparison indexes, proving that τ is not identiﬁed from G and the comparisonindexes.Let us prove Fact 2. Sufﬁciency is trivial. Let us focus on necessity. First, suppose tothe contrary that there is no monotone path with identiﬁed end vertices which contains i . Then obviously i is not identiﬁed. Therefore, if i is identiﬁed, there exists a monotonepath with identiﬁed end vertices which contains i . So it sufﬁces to show that if i isidentiﬁed, it is necessary that such a monotone path has to have length equal to τ ( i H ) − τ ( i L ) .Suppose to the contrary that the following condition holds.Condition A: Every monotone path P that contains i and has identiﬁed end vertices i H and i L also satisﬁes τ ( i H ) − τ ( i L ) > (cid:96) ( P ) . Then we will show that i is not identiﬁed.First, assume that there exists a monotone path which contains i but not as one of itsend vertices. Let i ∗ H be a lowest type vertex among all the identiﬁed vertices each of whichis on a monotone path that contains i and is of higher type than i . Also, let i ∗ L be a highesttype vertex among all the identiﬁed vertices each of which is on a monotone path thatcontains i and is of lower type than i . Let P be a monotone path between i ∗ H and i ∗ L thatpasses through i . Then by construction, the type difference τ ( i ∗ H ) − τ ( i ∗ L ) between the twoend vertices is smallest among all the monotone paths that go through i . Furthermore, i ∗ H and i ∗ L are adjacent to i in G τ . By Condition A, we have τ ( i ∗ H ) − τ ( i ∗ L ) > . Therefore,we have multiple different ways to assign τ ( i ∗ H ) − , τ ( i ∗ H ) − , ..., τ ( i ∗ L ) + 1 to the vertex i on the path P . Hence i is not identiﬁed.Second, assume that all the monotone paths that contain i have i as one of their endvertices. Then either all neighbors of i are of higher type than i or all neighbors of i areof lower type than i . Suppose that we are in the former case. (The latter case can bedealt with similarly.) Let i ∗ H be a lowest type vertex among all the vertices each of whichis on a monotone path that contains i and is of higher type than i . Then i ∗ H is adjacent to i in G τ , and by Condition A, τ ( i ∗ H ) − τ ( i ) > . Thus we have multiple different ways toassign τ ( i ∗ H ) − , τ ( i ∗ H ) − , ..., τ ( i ) + 1 , τ ( i ) to the vertex i . Hence i is not identiﬁed. (cid:4) Proof of Lemma 4.1

Condition (i) of Assumption 4.1 follows from Condition (i) of As-sumption 4.2 with W sij being a standard normal random variable. As for Conditions(ii)-(iv) of Assumption 4.1, we focus on only (ii), because the proof for the other twostatements is similar. Observe that when τ ( i ) > τ ( j ) , so that ˜ T + ij / √ L = ( σ + ij,L ) − (cid:18)(cid:90) max { ˆ r ij ( x ) − r ij ( x ) + r ij ( x ) , } dx − L − / h − d/ a + ij,L (cid:19) (8.1) = ( σ + ij ) − (cid:90) max { r ij ( x ) , } dx + o P (1) , by Assumption 4.2 (ii)(iii) and Condition (4.6). Hence Condition (ii) of Assumption4.1(ii) follows with c + ij = ( σ + ij ) − (cid:90) max { r ij ( x ) , } dx, and λ L = √ L .As for Condition (vi) of Assumption 4.1, note that F sij, ∞ = Φ , the standard normal CDF.Hence there exists C > such that for all t > C , − Φ( t ) ≤ C exp (cid:18) − Ct (cid:19) . Therefore, for any constants c , c > , taking λ L = √ L and ρ L = L − α , (from some large L on) r − L log (cid:16) − Φ( c √ L ) + c L − α (cid:17) ≤ r − L log (cid:0) C exp( − Cc L/

2) + c L − α (cid:1) → −∞ , as L → ∞ , by the condition that r L / log L → . (cid:4) References A BOWD , J., F. K

RAMARZ , AND

D. M

ARGOLIS (1999): “High Wage Workers and High WageFirms,”

Econometrica , 67(2), 251–334. A NDREWS , D. W.,

AND

P. G

UGGENBERGER (2009): “Incorrect Asymptotic Size of Sub-sampling Procedures Based on Post-Consistent Model Selection Estimator,”

Journal ofEconometrics , 152(6), 19–27.A

NDREWS , D. W.,

AND

X. S HI (2013): “Inference Based on Conditional Moment Inequal-ities,” Econometrica , 81, 609–666.A

RADILLAS -L OPEZ , A., A. G

ANDHI , AND

D. Q

UINT (2013): “Identiﬁcation and Inferencein Ascending Auctions with Correlated Private Values,”

Econometrica , 81(2), 489–534.A

SKER , J. (2010): “A Study of the Internal Organisation of a Bidding Cartel,”

AmericanEconomic Review , 100, 724–762.A

THEY , S.,

AND

P. H

AILE (2007): “Nonparametric Approaches to Auctions,” in

Handbookof Econometrics, Vol. 6 , ed. by J. J. Heckman, and

E. E. Leamer, chap. 60, pp. 3847–3959. Elsevier, Amsterdam.A

THEY , S., J. L

EVIN , AND

E. S

EIRA (2011): “Comparing Open and Sealed Bid Auctions:Theory and Evidence from Timber Auctions,”

Quarterly Journal of Economics , 126(1),207–257.B

ONHOMME , S., K. J

OCHMANS , AND

J.-M. R

OBIN (2016): “Non-parametric estimation ofﬁnite mixture from repeated measurements,”

Journal of the Royal Statistical Society, B ,78, 211–229.B

ONHOMME , S.,

AND

E. M

ANRESA (2014): “Group Patterns of Heterogeneity in PanelData,” Working paper, Chicago University.B

UGNI , F. A. (2010): “Bootstrap Inference in Partially Identiﬁed Models Deﬁned by Mo-ment Inequalities: Coverage of the Identiﬁed Set,”

Econometrica , 78, 735–753.C

HERNOZHUKOV , V., S. L EE , AND

A. M. R

OSEN (2013): “Intersection Bounds: Estimationand Inference,”

Econometrica , 81, 667–737.C

ONLEY , T.,

AND

F. D

ECAROLIS (2016): “Detecting Bidder Groups in Collusive Auctions,”

American Economic Journal: Microeconomics , 8(2), 1–38.D

UBE , J., J. F OX , AND

C. S U (2012): “Improving the Numerical Performance of Static andDynamic Aggregate Discrete Choice Random Coefﬁcient Demand Estimation,” Econo-metrica , 80, 2231–2267.E

ECKHOUT , J.,

AND

P. K

IRCHER (2011): “Identifying Sorting - In Theory,”

Review of Eco-nomic Studies , 78(3), 872–906.G

ENTRY , M., T. K

OMAROVA , AND

P. S

HIRALDI (2016): “Preferences and Perfomance inSimulateneous First Price Auctions: Structural Analysis,” Discussion paper.H

AGEDORN , M., T. L AW , AND

I. M

ANOVSKII (2016): “Identifying Equilibrium Models ofLabor Market Sorting,”

Econometrica .H ASTIE , T., R. T

IBSHIRANI , AND

J. F

RIEDMAN (2009):

The Elements of Statistical Learning,Second Edition . Springer. H ENRY , M., Y. K

ITAMURA , AND

B. S

ALANIE (2014): “Partial Identiﬁcation of Finite Mix-tures in Econometric Models,”

Quantitative Economics , 5, 123–144.H U , Y. (2008): “Identiﬁcation and estimation of nonlinear models with misclassiﬁcationerror using instrumental variables: A general solution,” Journal of Econometrics , 144,27–61.H U , Y., D. M C A DAMS , AND

M. S

HUM (2013): “Identiﬁcation of First-Price Auctions withNon-Separable Unobserved Heterogeneity.,”

Journal of Econometrics , 174, 186–193.H U , Y., AND

S. S

CHENNACH (2008): “Instrumental Variable Treatment of NonclassicalMeasurement Error Models,”

Econometrica , 76, 195–216.H U , Y., AND

J. S

HIU (2013): “Nonparametric identiﬁcation of dynamic models with un-observed covariates,”

Journal of Econometrics , 175(2), 116–131.H U , Y., AND

M. S

HUM (2012): “Nonparametric identiﬁcation of dynamic models withunobserved state variables,”

Journal of Econometrics , 171(1), 32–44.J

OFRE -B ONET , M.,

AND

M. P

ESENDORFER (2003): “Estimation of a Dynamic AuctionGame,”

Econometrica , 71(5), 1443–1489.K

ASAHARA , H.,

AND

K. S

HIMOTSU (2009): “Nonparametric Identiﬁcation of Mixture Mod-els of Dynamic Discrete Choices,”

Econometrica , 77, 135–175.(2014): “Non-parametric Identiﬁcation and Estimation of the Number of Compo-nents in Multivariate Mixtures,”

Journal of the Royal Statistical Society, B , 76, 97–111.(2015): “Testing the Number of Components in Normal Mixture Regression Mod-els,”

Journal of the American Statistical Association , 110, 1632–1645.K

RASNOKUTSKAYA , E.,

AND

K. S

EIM (2011): “Bid Preference Programs and Participationin Highway Procurement,”

American Economic Review , 101.K

RASNOKUTSKAYA , E., K. S

ONG , AND

X. T

ANG (2020): “The Role of Quality in InternetService Markets,”

Journal of Political Economy , 128, 75–117.L

EBRUN , B. (1999): “First Price Auctions in the Asymmetric N Bidder Case,”

InternationalEconomic Review , 40(1), 125–142.L EE , S., K. S ONG , AND

Y.-J. W

HANG (2013): “Testing for Functional Inequalities,”

Journalof Econometrics , 172, 14–32.(2018): “Testing for a General Class of Functional Inequalities,”

EconometricTheory , 34, 1018–1064.L

EEB , H.,

AND

B. M. P ¨

OTCHER (2005): “Model Section and Inference: Facts and Fiction,”

Econometric Theory , 11, 537–549.L

ENTZ , R.,

AND

D. M

ORTENSEN (2010): “Labor Market Model of Worker and Firm Het-erogeneity,”

Annual Review of Economics , 2(1), 577–603.L IN , C., AND

S. N G (2012): “Estimation of Panel Data Models with Parameter Hetero-geneity When Group Membership is Unknown,” Journal of Econometric Method , 1, 42–55. L ISE , J., C. M

EGHIR , AND

J.-M. R

OBIN (2011): “Matching, Sorting and Wages,” Discus-sion paper.M

ARSHALL , R. C., M. J. M

EURER , J.-F. R

ICHARD , AND

W. S

TROMQUIST (1994): “Numer-ical Analysis of Asymmetric First Price Auctions,”

Games and Economic Behavior , 7(2),193–220.N

EWEY , W. K.,

AND

D. M C F ADDEN (1994): “Large sample estimation and hypothesis test-ing,” in

Handbook of Econometrics , vol. 4, pp. 2111–2245. Noth-Holland, Amsterdam.P

ESENDORFER , M. (2000): “A Study of Collusion in First-price Auctions,”

Review of Eco-nomic Studies , 67, 381–411.P ¨

OTCHER , B. M. (1991): “Effects of Model Selection on Inference,”

Econometric Theory ,7, 163–185.R

OBERTS , J.,

AND

A. S

WEETING (2013): “When Should Sellers Use Auctions,”

AmericanEconomic Review .S ONG , K. (2005): “Semiparametic Speciﬁcation Testing in Econometrics and Heteroge-neous Panel Modeling,” Ph.d. thesis, Yale University.S U , L., Z. S HI , AND

P. P

HILLIPS (2016): “Identifying Latent Structures in Panel Data,”

Econometrica , 84(6), 2215–2264.S UN , Y. (2005): “Estimation and Inference in Panel Structure Models,” Unpublishedmanuscript, university of san diego. S UPPLEMENTAL N OTE FOR “E STIMATING U NOB SERVED A GENT H ETEROGENEITY U SING P AIRWISE C OMPARISONS ” Elena Krasnokutskaya, Kyungchul Song, and Xun Tang

Johns Hopkins University, University of British Columbia, Rice University

Appendix A. Introduction

This note is a supplemental note to Krasnokutskaya, Song, and Tang (2020a). It con-sists of ﬁve parts. Appendix B gives details about how to derive pairwise comparisonindex in examples of ﬁrst-price and English auctions where bidders have asymmetric pri-vate values, or collusive behavior. Appendix C gives the proof of the consistency result ofthe group structure. Appendix D discusses a bootstrap method to construct a conﬁdenceset for the group structure. Appendix E presents further simulation results regarding theperformance of our classiﬁcation algorithm proposed in Krasnokutskaya, Song, and Tang(2020a), and Appendix F reports summary statistics for the data used in the empiricalapplication in the paper and some further results.

Appendix B. Bidders with Asymmetric Values or Collusive Behavior

B.1.

First-Price Auctions with Asymmetric Bidders

Let the population of bidders, N , be partitioned into K groups, each of which ischaracterized by a distinct distribution of private values F k ( · ) . To ﬁx ideas, assume that F k ( · ) has the same shape of distribution, but differs only in their location (means) ¯ q < ¯ q < ... < ¯ q K , with ¯ q k being the mean of F k . (Our method applies in a more generalsetting when F k ’s are stochastically ordered.) In this case, for a bidder i with τ ( i ) = k the mean of the private value distribution is given by q i = ¯ q k . Let S (cid:96) denote the set ofparticipants in auction (cid:96) and B (cid:96) = { B i,(cid:96) } a vector of bids. In a type-symmetric equilibrium B i,(cid:96) = β k ( V i,(cid:96) ) if τ ( i ) = k , with β k ( · ) being a bidding strategy and V i,(cid:96) the private value for i in auction (cid:96) . Deﬁne G ij ( b ) = P { B i,(cid:96) ≤ b | i, j ∈ S (cid:96) } , where S (cid:96) denotes the set of biddersin auction (cid:96) . We assume bids are rationalized by a single equilibrium.Corollary 3 of Lebrun (1999) showed that G ij ( b ) ≥ G ji ( b ) for all b in the common sup-port of B i,(cid:96) and B j,(cid:96) whenever τ ( i ) ≤ τ ( j ) . The inequality holds strictly at least over someinterval with positive Lebesgue measure. This inequality holds unconditionally when ag-gregated over bidder identities and auction characteristics. Thus pairwise comparison A similar property holds in the settings where allocations are implemented through ﬁrst-price procure-ment auctions. The only difference is that in these settings G ij ( b ) should be deﬁned as G ij ( b ) = P { B i,(cid:96) ≥ b | i, j ∈ S (cid:96) } . indexes can be constructed as follows: δ ij ≡ (cid:90) max { G ji ( b ) − G ij ( b ) , } db. (B.1)Likewise, deﬁne δ ij by replacing the integrand in δ ij by the absolute value of G ij ( b ) − G ji ( b ) . These indexes do not condition on the speciﬁc identities of bidders participatingin each auction (cid:96) . Thus it allows us to utilize observations from a large number of auctionswhen constructing a comparison index for any generic pair i and j .For the rest of this subsection, we derive the pairwise comparison inequalities in ageneral model of asymmetric ﬁrst-price auctions where independent private values aredrawn from distributions that are stochastically ordered. That is, F k (cid:48) ﬁrst-order stochas-tically dominates F k whenever k (cid:48) > k . Also assume that the ordering of the distributionsis strict ( F ( v ) > F ( v ) > ... > F K ( v ) ) at least for v within some non-degenerate intervalon the support. Let N ( k ) denote the set of all agents in group k .For simplicity, suppose that a bidder from group k becomes active with a ﬁxed probabil-ity that is exogenously given. Let A denote the set of entrants in a given auction and λ ( A ) denote the structure, or the proﬁle, of entrants. That is, λ ( A ) is a K -vector of integers ( | A (1) | , ..., | A ( K ) | ) , with A ( k ) being the set of entrants from group k . An entrant i submitsbid B i according to his private value v i , taking into account the competitive structure ofan auction λ ( A ) which he observes at the time of bidding. Across auctions in the data, A and { v i } i ∈ A are independent draws from the same population distribution.Let G k ( · ; λ ) be the distribution of B i when i ∈ N ( k ) . The private values are independentof λ A under exogenous entry. Part (i) of Corollary 3 in Lebrun (1999) showed that, givenany realization of λ ( A ) , the supremum of the support of bids is the same for all biddertypes. That is, for any λ , β ( v | λ ) = β ( v | λ ) = ... = β K ( v | λ ) ≡ η ( λ ) < ∞ for some η ( λ ) ∈ ( v, v ) , where β k denotes the equilibrium bidding strategy for a bidder from group k . Furthermore, the corollary also showed that for any λ ( A ) , F k (cid:48) ( β − k (cid:48) ( b | λ ( A ))) ≤ F k ( β − k ( b | λ ( A ))) , for all b ∈ [ v, η ( λ ( A ))] and k < k (cid:48) , and the inequality holds strictly at least over someinterval on [ v, η ( λ ( A ))] . Consider i ∈ N ( k (cid:48) ) and j ∈ N ( k ) with k (cid:48) > k . It then follows that P { B i ≤ b | i, j ∈ A } = (cid:88) λ ( A ) F k (cid:48) ( β − k (cid:48) ( b | λ ( A ))) P { λ ( A ) | i, j ∈ A }≤ (cid:88) λ ( A ) F k ( β − k ( b | λ ( A ))) P { λ ( A ) | i, j ∈ A } = P { B j ≤ b | i, j ∈ A } , with the inequality holding strictly over some non-degenerate interval in the shared bidsupport. The inequality does not condition on the identities of the entrants other than i and j .Finally, note that by a symmetric argument, a similar inequality holds in ﬁrst-priceprocurement auctions with P { B i ≥ b | i, j ∈ A } ≤ P { B j ≥ b | i, j ∈ A } (with inequalitybeing strict over some non-degenerate interval in the shared bid support), whenever theprivate cost distribution for i is stochastically lower than that of j . B.2.

English Auctions with Asymmetric Bidders

Consider the setting in Section B.1, except that the auction format is English (ascend-ing). The data report the identities of entrants in A and the transaction price W in eachauction. In a dominant strategy equilibrium, the price in an auction equals the secondhighest private value among all entrants.With independent private values, we show below that(B.2) P { W ≤ w | i ∈ A, j (cid:54)∈ A } ≤ P { W ≤ w | j ∈ A, i (cid:54)∈ A } , for all w over the intersection of support, whenever τ ( i ) > τ ( j ) . Furthermore, the in-equality holds strictly for some w over a set of positive measure in common support. Thisimplies(B.3) E [ W | i ∈ A, j (cid:54)∈ A ] > E [ W | j ∈ A, i (cid:54)∈ A ] .The intuition behind (B.2) is as follows. Given any structure of entrants who competewith i or j (but not both), the distribution of the transaction price is stochastically higherwhen i is present but j is not than when j is present but i is not. Loosely speaking,when j is replaced by the stronger type i in the set of entrants, the overall proﬁle ofvalue distributions becomes “stochastically higher”. Then the law of iterated expectationsimplies (B.2) and (B.3).To infer the group structure, deﬁne the following indexes: δ ij = max { E [ W | i ∈ A, j (cid:54)∈ A ] − E [ W | j ∈ A, i (cid:54)∈ A ] , } , and δ ij = | E [ W | i ∈ A, j (cid:54)∈ A ] − E [ W | j ∈ A, i (cid:54)∈ A ] | . One can then use our procedure proposed in the main text to classify the bidders basedon pairwise comparison.We now derive (B.2) formally. Let V i denote the private value for bidder i . Considerthe case where i ∈ N ( k (cid:48) ) and j ∈ N ( k ) with k (cid:48) > k . Let λ ( A ) denote the K -vector ofintegers that summarizes the group structure of the set of entrants A . Let k denote theunit vector with the k -th component being . Then deﬁne H j,i ( w ; λ ∗ ) ≡ P { W ≤ w | j ∈ A, i (cid:54)∈

A, λ ( A \{ j } ) = λ ∗ } = P (cid:26) max s ∈ A V s ≤ w (cid:12)(cid:12)(cid:12)(cid:12) λ ( A ) = λ ∗ + 1 k (cid:27) + P (cid:26) max s ∈ A V s > w, W ≤ w (cid:12)(cid:12)(cid:12)(cid:12) λ ( A ) = λ ∗ + 1 k (cid:27) , where the ﬁrst term on the right-hand side equals F k ( w ) (cid:16)(cid:81) K (cid:96) =1 F (cid:96) ( w ) λ ∗ (cid:96) (cid:17) , and the secondon the right-hand side is P (cid:26) max s ∈ A \{ j } V s ≤ w, V j > w (cid:12)(cid:12)(cid:12)(cid:12) λ ( A \{ j } ) = λ ∗ (cid:27) + P (cid:26) V j ≤ w, max s ∈ A \{ j } V s > w (cid:12)(cid:12)(cid:12)(cid:12) λ ( A \{ j } ) = λ ∗ (cid:27) = [1 − F k ( w )] (cid:16)(cid:89) K (cid:96) =1 F (cid:96) ( w ) λ ∗ (cid:96) (cid:17) + F k ( w ) ϕ ( w ; λ ∗ ) , where ϕ ( w ; λ ∗ ) denotes the probability that the maximum value in A \{ j } is strictlygreater than w while the second highest value in A \{ j } is less than or equal to w condi-tional on the classiﬁcation λ ( A \{ j } ) = λ ∗ . Therefore H j,i ( w ; λ ∗ ) = (cid:16)(cid:89) K (cid:96) =1 F (cid:96) ( w ) λ ∗ (cid:96) (cid:17) + F k ( w ) ϕ ( w ; λ ∗ ) .By the same argument, H i,j ( w ; λ ∗ ) ≡ P { W ≤ w | i ∈ A, j (cid:54)∈

A, λ ( A \{ i } ) = λ ∗ } = (cid:16)(cid:89) K (cid:96) =1 F (cid:96) ( w ) λ ∗ (cid:96) (cid:17) + F k (cid:48) ( w ) ϕ ( w ; λ ∗ ) .It is then straightforward to show that for any λ ∗ , that F k (cid:48) (cid:23) F.S.D. F k implies H i,j ( w ; λ ∗ ) ≤ H j,i ( w ; λ ∗ ) over the union of the K supports of { F (cid:96) : 1 ≤ l ≤ K } , and the inequalityholds strictly at least for some w in an interval on the intersection of the K supports of { F (cid:96) : 1 ≤ l ≤ K } . Under exogenous entry, we get P { W ≤ w | i ∈ A, j (cid:54)∈ A } ≤ P { W ≤ w | j ∈ A, i (cid:54)∈ A } , after integrating out λ ∗ . The inequality holds strictly for some w over common support.One may wonder whether we can recover the classiﬁcation of bidders in the Englishauction example through a “global” approach when the identity of the winner is reportedin the data. That is, by comparing the distribution of transaction prices when i is thewinner versus that when j is the winner, as opposed to the pairwise comparison approachproposed above. Let us explain why this is not feasible.For any i ∈ N ( k (cid:48) ) and j ∈ N ( k ) and F k (cid:48) (cid:23) F.S.D. F k , let A \{ i, j } denote the set of entrantsout of N \{ i, j } and let M ( A \{ i, j } ) ≡ max { V s : s ∈ A \{ i, j }} . Let φ ( w ; λ ∗ ) denote thedistribution of M ( A \{ i, j } ) conditional on λ ( A \{ i, j } ) = λ ∗ . Let D denote the identity ofthe winner in the auction; and S k denote the survival function for the private value of atype- k bidder. Then, P { W ≤ w, D = i | i ∈ A } = p j P { W ≤ w, D = i | i, j ∈ A } + (1 − p j ) P { W ≤ w, D = i | i ∈ A, j (cid:54)∈ A } , where p j is shorthand for j ’s entry probability. Also note that, by construction, onceconditioned on the realized set of entrants from A \{ i, j } , we have P { W ≤ w, D = i | i, j ∈ A, λ ( A \{ i, j } ) = λ ∗ } = (cid:90) w −∞ F k ( t ) φ ( t ; λ ∗ ) dF k (cid:48) ( t ) + S k (cid:48) ( w ) F k ( w ) φ ( w ; λ ∗ ) , and P { W ≤ w, D = i | i ∈ A, j (cid:54)∈

A, λ ( A \{ i, j } ) = λ ∗ } = (cid:90) w −∞ φ ( t ; λ ∗ ) dF k (cid:48) ( t ) + S k (cid:48) ( w ) φ ( w ; λ ∗ ) .Likewise P { W ≤ w, D = j | j ∈ A } can be written by swapping the roles of i and j andswapping the roles of k and k (cid:48) respectively. Then it can be shown that(B.4) P { W ≤ w, D = i | i ∈ A, j ∈ A } > P { W ≤ w, D = j | i ∈ A, j ∈ A } . To see why the inequality in (B.4) holds, note for any λ ∗ , P { W ≤ w, D = i | i, j ∈ A, λ ( A \{ i, j } ) = λ ∗ }− P { W ≤ w, D = j | i, j ∈ A, λ ( A \{ i, j } ) = λ ∗ } , where the difference is written as (cid:20)(cid:90) w −∞ F k ( t ) φ ( t ; λ ∗ ) dF k (cid:48) ( t ) − (cid:90) w −∞ F k (cid:48) ( t ) φ ( t ; λ ∗ ) dF k ( t ) (cid:21) + φ ( w ; λ ∗ ) [ S k (cid:48) ( w ) F k ( w ) − S k ( w ) F k (cid:48) ( w )] . The ﬁrst square bracket in the display above is positive because (cid:90) w −∞ F k ( t ) φ ( t ; λ ∗ ) dF k (cid:48) ( t ) > (cid:90) w −∞ F k (cid:48) ( t ) φ ( t ; λ ∗ ) dF k (cid:48) ( t ) > (cid:90) w −∞ F k (cid:48) ( t ) φ ( t ; λ ∗ ) dF k ( t ) .Furthermore, the second square bracket in the display is also positive because “ F k (cid:48) (cid:23) F.S.D. F k ” implies S k (cid:48) ( w ) ≥ S k ( w ) and F k ( w ) ≥ F k (cid:48) ( w ) for all w and these inequalities hold strictly for some set of w with positive measure. Integratingout λ ∗ on both sides of the inequality P { W ≤ w, D = i | i, j ∈ A, λ ( A \{ i, j } ) = λ ∗ } > P { W ≤ w, D = j | i, j ∈ A, λ ( A \{ i, j } ) = λ ∗ } ,yields the ﬁrst inequality in (B.4).Similarly, the difference between P { W ≤ w, D = i | i ∈ A, j (cid:54)∈

A, λ ( A \{ i, j } ) = λ ∗ } and P { W ≤ w, D = j | j ∈ A, i (cid:54)∈

A, λ ( A \{ i, j } ) = λ ∗ } equals (cid:20)(cid:90) w −∞ φ ( t ; λ ∗ ) dF k (cid:48) ( t ) − (cid:90) w −∞ φ ( t ; λ ∗ ) dF k ( t ) (cid:21) + φ ( w ; λ ∗ )[ S k (cid:48) ( w ) − S k ( w )] which must be positive because the two terms in the square brackets are positive. Now we write P { W ≤ w, D = i | i ∈ A } (B.5) = p j P { W ≤ w, D = i | i, j ∈ A } + (1 − p j ) P { W ≤ w, D = i | i ∈ A, j (cid:54)∈ A } where p j ≡ P ( j ∈ A ) . A similar expression exists for P { W ≤ w, D = j | j ∈ A } byswapping the roles of i and j in (B.5). Therefore the difference between P { W ≤ w, D = i | i ∈ A } and P { W ≤ w, D = j | j ∈ A } is also indeterminate in the absence of knowledgeabout p i , p j .B.3. Bidding Cartel in First-Price Procurement Auctions

Our method can be used to detect the identities of cartel members in a model of ﬁrst-price procurement auctions in which a bidding cartel competes with competitive non-colluding bidders (Pesendorfer (2000)). Let the population of bidders/ﬁrms N be parti-tioned into a set of colluding ﬁrms N ( c ) and non-colluding ﬁrms N ( nc ) . In each auction, theset of potential bidders (who are interested in bidding for the contract) A is partitionedinto A ( c ) and A ( nc ) . The cardinality of A ( c ) is common knowledge among the bidders. Thepotential bidders in A ( c ) collude by refraining from participation except for one bidder i ∗ who is chosen among them to submit a bid. In an efﬁcient truth-revealing mechanism considered in Pesendorfer (2000), the cartelmember that has the lowest cost is selected to be the sole bidder from the cartel. That is, i ∗ ( A ( c ) ) = arg min j ∈ A ( c ) C j where C j is the private cost of bidder j . Thus, the set of ﬁnalentrants who are observed to submit bids in the data is A ∗ ≡ { i ∗ ( A ( c ) ) } ∪ A ( nc ) . (The setof colluding potential bidders is not reported in the data available to the researcher.)We maintain that across the auctions in the data bidders’ private costs are independentdraws from the same distribution. Bidders are ex ante symmetric in that each bidder’sprivate cost is drawn independently from the same distribution. Entrants know thata representative of the cartel is participating in bidding, and all follow Bayesian Nashequilibrium bidding strategies.We are interested in detecting the identities of the set of colluding ﬁrms in N ( c ) fromthe reported bidding and participation decisions. Let N (cid:48) ( c ) ⊂ N denote the set of bidderssuch that no two bidders in N (cid:48) ( c ) are ever observed to compete with each other in thebidding stage. By construction, N ( c ) ⊆ N (cid:48) ( c ) so the latter should be interpreted as aset of suspects for collusion. However, the set N (cid:48) ( c ) could also contain innocent non-colluding bidders who are never observed to compete with each other in the data becauseof random entry in ﬁnite sample. Our goal is to use bidding data to separate N ( c ) from N (cid:48) ( c ) \ N ( c ) ≡ N ( nc ) ∩ N (cid:48) ( c ) .Pesendorfer (2000) (Remark 3) shows that in any given auction with participants A ( c ) ∪ A ( nc ) , the distribution of bids from a non-colluding bidder j ﬁrst-order stochas-tically dominates the distribution of the bids from the sole bidder representing the cartel The cartel is sustained through side payments among its members. i ∗ . Speciﬁcally, for any such i ∗ and j ,(B.6) P { B i ∗ ≤ b | i ∗ ∈ A ∗ , | A ∗ |} > P { B j ≤ b | j ∈ A ∗ , | A ∗ |} for all b on the common support of the two distributions. The intuition, as is noted in Pesendorfer (2000), is that the sole bidder representinga cartel has a higher hazard rate than a non-colluding bidder. That is, relative to acompetitive bidder, the cartel representative has a higher probability of having a low costconditional on the costs being above any ﬁxed threshold. Besides, ex ante symmetryamong bidders implies that P { B i ≤ b | i ∈ A ∗ , | A ∗ |} = P { B j ≤ b | j ∈ A ∗ , | A ∗ |} whenever i, j ∈ N ( c ) or i, j ∈ N ( nc ) ∩ N (cid:48) ( c ) .We can then construct pairwise comparison indexes δ ij , δ ij by replacing G ij and G ji in(B.1) with the left- and right-hand side of (B.6). Appendix C. Mathematical Proofs

C.1.

Proof of Consistency of Classiﬁcation

We provide a proof of consistency of the classiﬁcation allowing both n and L to divergeto inﬁnity jointly. This requires a more elaborated set of assumptions than Assumption4.1. It is not hard to use the same proof to prove Theorem 4.1. Assumption C.1.

There exist sequences ρ L , κ L → and λ L → ∞ and constants c sij > , s ∈ { + , , −} such that along each sequence of probabilities P L ∈ P ,ε , the followingconditions hold for all s ∈ { + , , −} , s (cid:48) ∈ { + , −} , as n, L → ∞ .(i) max i,j ∈ N : i (cid:54) = j,τ ( i )= τ ( j ) (cid:12)(cid:12)(cid:12) P (cid:110) ˜ T sij ≤ t (cid:111) − F sij, ∞ ( t ) (cid:12)(cid:12)(cid:12) → , where F sij, ∞ is a continuous CDF.(ii)(a) max i,j ∈ N : i (cid:54) = j,τ ( i ) >τ ( j ) P (cid:110) | ˜ T + ij /λ L − c + ij | + | ˜ T ij /λ L − c ij | ≥ κ L (cid:111) = O ( ρ L ) . (ii)(b) For each pair i, j ∈ N such that i (cid:54) = j , if τ ( i ) ≥ τ ( j ) , there exists a CDF F − ij suchthat for all t ∈ R , P { ˜ T − ij ≤ t } ≥ F − ij ( t ) . Pesendorfer (2000) proved this result using the implicit assumption that the distribution of costs for non-colluding bidders and that for the sole cartel is common knowledge among all participants in an auction.(See proof of Remark 3 in Pesendorfer (2000).) This assumption is consistent with the informationalenvironment that the partition of N into N ( c ) and N ( nc ) is common knowledge among all bidders. Note that the statement is conditional since the bidding strategies depend on the cardinality of the ﬁnalset of bidders | A ∗ | . (iii)(a) max i,j ∈ N : i (cid:54) = j,τ ( i ) <τ ( j ) P (cid:110) | ˜ T − ij /λ L − c − ij | + | ˜ T ij /λ L − c ij | ≥ κ L (cid:111) = O ( ρ L ) . (iii)(b) For each pair i, j ∈ N such that i (cid:54) = j , if τ ( i ) ≤ τ ( j ) , there exists a CDF F + ij suchthat for all t ∈ R , P { ˜ T + ij ≤ t } ≥ F + ij ( t ) . Assumption C.2.

Suppose that for each pair i, j ∈ N such that i (cid:54) = j , F s (cid:48) ij and F sij, ∞ , s, s (cid:48) ∈ { + , , −} , are CDFs and ρ L , κ L , and λ L are sequences in Assumption C.1. Then thefollowing holds as n, L → ∞ .(i) max i,j ∈ N : i (cid:54) = j P (cid:110) (cid:107) ˜ F sij − F sij, ∞ (cid:107) ∞ > ε L (cid:111) = O ( ρ L ) and max i,j ∈ N : i (cid:54) = j (cid:13)(cid:13)(cid:13) F s (cid:48) ij − F s (cid:48) ij, ∞ (cid:13)(cid:13)(cid:13) ∞ = O ( ε L ) , where ε L → .(ii) r − L log(1 − F sij, ∞ (( c − κ L ) λ L ) + ε L ) → −∞ , for any constant c > .(iii) n (exp( − r L /

2) + ε L + ρ L ) → .Assumptions C.1 and C.2 are variants of Assumption 4.1 which require the rate of con-vergences explicitly. Assumption C.2(iii) is new here because we now allow both n and L to increase to inﬁnity. This assumption says that for the consistency of the estimatedgroup structure, we need to have n increase sufﬁciently faster than L . This condition istrivially satisﬁed when n is ﬁxed and L increases to inﬁnity. Note that the conditions arehigh level conditions suited to the generic set-up here. With a more detailed speciﬁcationof the pairwise comparison indexes δ ij and δ ij , test statistics and p -values, one may im-prove the conditions. Using high level conditions also enables us to focus only on morenovel aspects of the mathematical proofs.While existence of such sequences as κ L , ε L , and ρ L is fairly expected, obtaining theirprecise forms using lower level conditions require substantial yet tedious arguments de-pending on the way the test statistics and the p -values are constructed. Essentially whatone needs to obtain for lower level conditions is the rate in the convergence of the teststatistics to the limiting distribution both under the null hypothesis and under the localalternatives. For example, if one follows the approach of Lee, Song, and Whang (2018),the rate of convergence is ultimately delivered by a Berry-Esseen bound (for a sum ofindependent random variables) used to obtain the asymptotic normality of test statistics(under appropriate mean shifts). However, the ﬁnal result comes, in combination with this, only with carefully derived rate results on asymptotically negligible terms that arise,among other things, due to the Poissonization technique used in the paper. While theseadditional developments are feasible, they do not add insights to the main idea of thepaper. Hence we do not pursue details in this direction here.Let us ﬁrst state the theorem. Theorem C.1.

Suppose that Assumptions C.1 - C.2 hold, and that g ( L ) → ∞ and g ( L ) /r L → as L → ∞ . Then, for any ε > , along a sequence of probabilities P n,L from P ,ε , P n,L { ˆ K = K } → , as n, L → ∞ , and the estimated group structure ˆ T ˆ K satisﬁes that as n, L → ∞ ,P n,L { ˆ T ˆ K = T } → . The proof of this theorem is long. We ﬁrst prepare some auxiliary lemmas. Throughoutthis section, we assume that Assumptions C.1 and C.2 hold. First, for any subset N (cid:48) ⊂ N ,we deﬁne N (cid:48) ( i ) = N (cid:48) \ { i } , and let N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : τ ( i ) > τ ( j ) } , (C.1) N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : τ ( i ) < τ ( j ) } ,N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : τ ( i ) ≥ τ ( j ) } , and N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : τ ( i ) ≤ τ ( j ) } . We also deﬁne ˆ N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : log ˆ p + ij ≤ log ˆ p − ij − r L } and ˆ N (cid:48) ( i ) = { j ∈ N (cid:48) ( i ) : log ˆ p − ij ≤ log ˆ p + ij − r L } . Following the convention, given a CDF G , we deﬁne its generalized inverse G − as G − ( t ) = inf { s ∈ R : G ( s ) ≥ t } , t ∈ (0 , . Lemma C.1. (i) max i ∈ N P (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p − ij < − r L / (cid:41) = O ( nω n,L )max i ∈ N P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p + ij ≥ − r L / (cid:27) = O ( nρ L )max i ∈ N P (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p + ij < − r L / (cid:41) = O ( nω n,L ) , and max i ∈ N P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p − ij ≥ − r L / (cid:27) = O ( nρ L ) , where ω n,L = exp (cid:16) − r L (cid:17) + ε L + ρ L . (ii) max i,j ∈ N : τ ( i ) (cid:54) = τ ( j ) P (cid:8) log ˆ p ij ≥ − r L (cid:9) = O ( ρ L ) , and max i,j ∈ N : i (cid:54) = j,τ ( i )= τ ( j ) P (cid:8) log ˆ p ij ≤ − r L (cid:9) = O ( ω n,L ) . Proof: (i) We will show the ﬁrst and the second statements only. The third and thefourth statements can be proved similarly. Let us prove the ﬁrst statement ﬁrst. For all i ∈ N , P (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p − ij < − r L (cid:41) ≤ (cid:88) j ∈ N ( i ) P (cid:110) log ˆ p − ij ≤ − r L (cid:111) . Note that for each j ∈ N ( i ) , P (cid:110) log ˆ p − ij < − r L (cid:111) = P (cid:110) − exp (cid:16) − r L (cid:17) < ˜ F − ij ( ˜ T − ij ) (cid:111) ≤ P (cid:110) − exp (cid:16) − r L (cid:17) < F − ij, ∞ ( ˜ T − ij ) + ε L (cid:111) + P {(cid:107) ˜ F − ij − F − ij, ∞ (cid:107) ∞ ≥ ε L }≤ P (cid:110) − exp (cid:16) − r L (cid:17) < F − ij, ∞ ( ˜ T − ij ) + ε L (cid:111) + O ( ρ L ) , where the last O ( ρ L ) term is uniform over i, j ∈ N such that τ ( i ) (cid:54) = τ ( j ) and is due toAssumption C.2(i) and Markov’s inequality. As for the leading probability on the rightend side, we use the fact that for any CDF G and any t ∈ (0 , and x ∈ R , G − ( t ) ≥ x ifand only if t ≥ G ( x ) , (e.g. Lemma A.1.1. of Reiss (1989)), p.318, and bound P (cid:110) − exp (cid:16) − r L (cid:17) < F − ij, ∞ ( ˜ T − ij ) + ε L (cid:111) = 1 − P (cid:110) ˜ T − ij ≤ ( F − ij, ∞ ) − (cid:16) − exp (cid:16) − r L (cid:17) − ε L (cid:17)(cid:111) ≤ − F − ij (cid:16) ( F − ij, ∞ ) − (cid:16) − exp (cid:16) − r L (cid:17) − ε L (cid:17)(cid:17) , by Assumption C.1(ii)(b) and because j ∈ N (cid:48) ( i ) . By Assumption C.2(i), the last term isequal to − F − ij, ∞ (cid:16) ( F − ij, ∞ ) − (cid:16) − exp (cid:16) − r L (cid:17) − ε L (cid:17)(cid:17) + O ( ε L ) ≤ exp (cid:16) − r L (cid:17) + ε L + O ( ε L ) = O ( ω n,L ) , uniformly over i, j ∈ N such that τ ( i ) (cid:54) = τ ( j ) . The inequality above is due to the deﬁnitionof ( F − ij, ∞ ) − . Thus we obtain the ﬁrst statement.Let us consider the second statement. Suppose that τ ( i ) > τ ( j ) . We let A ,L = (cid:110) (cid:107) ˜ F + ij − F + ij, ∞ (cid:107) ∞ ≤ ε L (cid:111) and A ,L = (cid:110) | ˜ T + ij /λ L − c + ij | ≤ κ L (cid:111) , and let A L = A ,L ∩ A ,L . Note that P { log ˆ p + ij ≥ − r L / } ≤ P { log ˆ p + ij ≥ − r L / } ∩ A L + P A cL ≤ P { log ˆ p + ij ≥ − r L / } ∩ A L + O ( ρ L ) , where the last O ( ρ L ) term is due to Assumption C.1(ii)(a) and Assumption C.2(i), and isuniform over i, j ∈ N sucht hat i (cid:54) = j . For the leading probability on the right hand side,note that P { log ˆ p + ij ≥ − r L / } ∩ A L = P { log(1 − ˜ F + ij ( ˜ T + ij )) ≥ − r L / } ∩ A L ≤ P { log(1 − F + ij, ∞ ( ˜ T + ij ) + ε L ) ≥ − r L / } ∩ A L ≤ P A L · { − F + ij, ∞ (( c + ij − κ L ) λ L ) + ε L ≥ exp( − r L / } . The last indicator becomes zero from some large L on, with this large L chosen to beindependent of i, j , by Condition (ii) of Assumption C.2. Thus, we obtain the secondstatement.(ii) The proof of the ﬁrst statement is the same as that of the second statement of (i),and the proof of the second statement is the same as that of the ﬁrst statement of (i). (Asfor the proof of the second statement, we have the weak inequality instead of the strictinequality, but this does not make difference to the arguments, because F ij, ∞ is assumedto be continuous.) Details are omitted. (cid:4) Lemma C.2.

Suppose that N (cid:48) ⊂ N contains some i, j ∈ N such that τ ( i ) (cid:54) = τ ( j ) . Then, max i ∈ N P { N (cid:48) ( i ) = ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L );max i ∈ N P { N (cid:48) ( i ) = ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L );max i ∈ N P { N (cid:48) ( i ) = N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L );max i ∈ N P { N (cid:48) ( i ) = N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L ) . Proof:

We show only the ﬁrst and the third statements. The remaining statements canbe proved similarly. Deﬁne A L = (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p − ij ≥ − r L (cid:41) . Note that P (cid:110) N (cid:48) ( i ) ⊂ ˆ N (cid:48) ( i ) (cid:111) = P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p + ij − log ˆ p − ij ≤ − r L (cid:27) ≥ P (cid:40) max j ∈ N (cid:48) ( i ) log ˆ p + ij − min j ∈ N (cid:48) ( i ) log ˆ p − ij ≤ − r L (cid:41) ≥ P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p + ij ≤ − r L (cid:27) ∩ A L ≥ P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p + ij ≤ − r L (cid:27) − P A cL = 1 − P (cid:26) max j ∈ N (cid:48) ( i ) log ˆ p + ij > − r L (cid:27) − P A cL = 1 + O ( nω n,L ) , where the last inequality follows by the ﬁrst and the second statements of Lemma C.1(i).Thus we have P (cid:110) N (cid:48) ( i ) ⊂ ˆ N (cid:48) ( i ) (cid:111) = 1 + O ( nω n,L ) , (C.2)as n, L → ∞ . On the other hand, P (cid:110) N (cid:48) ( i ) ⊂ N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) (cid:111) ≥ P (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p + ij − log ˆ p − ij > − r L (cid:41) (C.3) ≥ P (cid:40) min j ∈ N (cid:48) ( i ) log ˆ p + ij > − r L (cid:41) = 1 + O ( nω n,L ) . (C.4)The second inequality follows because log ˆ p − ij ≤ and the last equality follows by thethird statement of Lemma C.1(i). Note that the term O ( nω n,L ) is uniform over i ∈ N .Since N (cid:48) ( i ) and N (cid:48) ( i ) partition N (cid:48) ( i ) , and ˆ N (cid:48) ( i ) and N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) also partition N (cid:48) ( i ) , itfollows that max i ∈ N P { N (cid:48) ( i ) = ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L ) , and max i ∈ N P { N (cid:48) ( i ) = N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) } = 1 + O ( nω n,L ) , as n, L → ∞ . The remaining statements can be shown similarly. (cid:4) Deﬁnition C.1. (i) An ordered partition ( N (cid:48) , ..., N (cid:48) s ) of a subset N (cid:48) ⊂ N is said to be a τ -ordered partition , if for any r < r , r , r = 1 , , ..., s , we have τ ( i ) < τ ( j ) whenever i ∈ N (cid:48) r and j ∈ N (cid:48) r .(ii) Let N τ ( N (cid:48) ) denote the set of τ -ordered partitions of N (cid:48) .When an ordered partition ( N (cid:48) , ..., N (cid:48) s ) is a τ -ordered partition, and τ partitions N (cid:48) into K groups (i.e., any two agents, say i, j , from two different groups from the K groupshave τ ( i ) (cid:54) = τ ( j ) ), we must have s ≤ K by the deﬁnition of τ -ordered partition. Hencesome group in the ordered partition ( N (cid:48) , ..., N (cid:48) s ) can have agents with different types. Deﬁnition C.2.

An estimated ordered partition ( ˆ N (cid:48) , ..., ˆ N (cid:48) s ) of a subset N (cid:48) ⊂ N is said tobe asymptotically τ -compatible at rate u n,L , if P { ( ˆ N (cid:48) , ..., ˆ N (cid:48) s ) ∈ N τ ( N (cid:48) ) } = 1 + O ( u n,L ) , as n, L → ∞ . Lemma C.3.

For each π = ( N (cid:48) , ..., N (cid:48) s ) ∈ N τ ( N ) , let R ( π ) ⊂ { , , ..., s } be such that for all r ∈ R ( π ) , N (cid:48) r has some i, j ∈ N (cid:48) r satisfying τ ( i ) (cid:54) = τ ( j ) , and let R ( π ) = { , , ..., s } \ R ( π ) so that for all r ∈ R ( π ) , and for all i, j ∈ N (cid:48) r , we have τ ( i ) = τ ( j ) . Let B ( π ) , π ∈ N τ ( N (cid:48) ) ,be disjoint events. Then, (cid:88) π ∈ N τ ( N (cid:48) ) P (cid:26) ∃ r ∈ R ( π ) , min i,j ∈ N (cid:48) r : i (cid:54) = j log ˆ p ij > − r L (cid:27) ∩ B ( π ) = O ( n ρ L ) , and (cid:88) π ∈ N τ ( N (cid:48) ) P (cid:26) ∃ r ∈ R ( π ) , min i,j ∈ N (cid:48) r : i (cid:54) = j log ˆ p ij ≤ − r L (cid:27) ∩ B ( π ) = O ( n ω n,L ) . Proof:

For each π ∈ N τ ( N (cid:48) ) , let K ( π ) denote the total number of the groups in π .Then we can write K ( π ) = | R ( π ) | + | R ( π ) | . If π = ( N (cid:48) , ..., N (cid:48) K ( π ) ) ∈ N τ ( N (cid:48) ) , we write ( N (cid:48) , ..., N (cid:48) K ( π ) ) = ( N (cid:48) ( π ) , ..., N (cid:48) K ( π ) ( π )) to make explicit the dependence of each group on π ∈ N τ ( N (cid:48) ) . Let B τ ( N (cid:48) ) = (cid:91) π ∈ N τ ( N (cid:48) ) B ( π ) . The ﬁrst statement of the lemma follows because (cid:88) π ∈ N τ ( N (cid:48) ) P (cid:40) ∃ r ∈ R ( π ) , min i,j ∈ N (cid:48) r ( π ): i (cid:54) = j log ˆ p ij > − r L (cid:41) ∩ B ( π ) ≤ (cid:88) π ∈ N τ ( N (cid:48) ) (cid:88) i,j ∈ N : τ ( i ) (cid:54) = τ ( j ) P (cid:8) log ˆ p ij > − r L (cid:9) ∩ B ( π )= (cid:88) i,j ∈ N : τ ( i ) (cid:54) = τ ( j ) P (cid:8) log ˆ p ij > − r L (cid:9) ∩ B τ ( N (cid:48) ) ≤ (cid:88) i,j ∈ N : τ ( i ) (cid:54) = τ ( j ) P (cid:8) log ˆ p ij > − r L (cid:9) = O ( n ρ L ) , by the assumption that B ( π ) ’s are disjoint. The last equality follows by Lemma C.1(ii).As for the second statement of the lemma, similarly, (cid:88) π ∈ N τ ( N (cid:48) ) P (cid:40) ∃ r ∈ R ( π ) , min i,j ∈ N (cid:48) r ( π ): i (cid:54) = j log ˆ p ij ≤ − r L (cid:41) ∩ B ( π ) ≤ (cid:88) i,j ∈ N : τ ( i )= τ ( j ) P (cid:8) log ˆ p ij ≤ − r L (cid:9) . Again, the last sum is O ( n ω n,L ) by Lemma C.1(ii). (cid:4) Lemma C.4.

Suppose that an estimated ordered partition ( ˆ N , ..., ˆ N s ) of N is asymptotically τ -compatible at rate u n,L . Then the Selection Step in the Selection-Split Algorithm appliedto this ordered partition selects a set, say, ˆ N ˆ r ⊂ N , with ˆ r = 1 , ..., s , such that P (cid:110) ∃ i, j ∈ ˆ N ˆ r , τ ( i ) (cid:54) = τ ( j ) (cid:111) = 1 + O ( n ω n,L + u n,L ) , as n, L → ∞ . Proof:

Let us consider the event that ( ˆ N , ..., ˆ N s ) coincides with a τ -ordered partition,say, π = ( N ( π ) , ..., N s ( π )) ∈ N τ ( N ) , and denote the event by A ( π ) . Note that A ( π ) ’s aredisjoint across π ∈ N τ ( N ) , and (cid:88) π ∈ N τ ( N ) P A ( π ) = 1 + O ( u n,L ) , (C.5)by the assumption that ( ˆ N , ..., ˆ N s ) of N is asymptotically τ -compatible at rate u n,L . Given π ∈ N τ ( π ) , let R ( π ) and R ( π ) be as deﬁned in Lemma C.3. Then, (cid:88) π ∈ N τ ( N ) P { ˆ r ∈ R ( π ) } ∩ A ( π ) (C.6) ≥ (cid:88) π ∈ N τ ( N ) P (cid:40) ∀ r ∈ R ( π ) , ∀ r ∈ R ( π ) , min i,j ∈ ˆ N r log ˆ p ij < min i,j ∈ ˆ N r log ˆ p ij (cid:41) ∩ A ( π ) , by the way the Selection Step is deﬁned. Note that (cid:88) π ∈ N τ ( N ) P (cid:40) ∀ r ∈ R ( π ) , ∀ r ∈ R ( π ) , min i,j ∈ ˆ N r log ˆ p ij < min i,j ∈ ˆ N r log ˆ p ij (cid:41) ∩ A ( π ) (C.7) ≥ (cid:88) π ∈ N τ ( N ) P (cid:40) ∀ r ∈ R ( π ) , ∀ r ∈ R ( π ) , min i,j ∈ ˆ N r log ˆ p ij ≤ − r L , min i,j ∈ ˆ N r log ˆ p ij > − r L (cid:41) ∩ A ( π ) ≥ (cid:88) π ∈ N τ ( N ) P (cid:40) ∀ r ∈ R ( π ) , min i,j ∈ ˆ N r log ˆ p ij > − r L (cid:41) ∩ A ( π ) − (cid:88) π ∈ N τ ( N ) P (cid:40) ∃ r ∈ R ( π ) , min i,j ∈ ˆ N r log ˆ p ij > − r L (cid:41) ∩ A ( π ) . By the deﬁnition of A ( π ) , the second to the last sum in (C.7) is written as (cid:88) π ∈ N τ ( N ) P (cid:26) ∀ r ∈ R ( π ) , min i,j ∈ N r ( π ) log ˆ p ij > − r L (cid:27) ∩ A ( π )= (cid:88) π ∈ N τ ( N ) P A ( π ) − (cid:88) π ∈ N τ ( N ) P (cid:26) ∃ r ∈ R ( π ) , min i,j ∈ N r ( π ) log ˆ p ij ≤ − r L (cid:27) ∩ A ( π ) . The difference on the right hand side is O ( n ω n,L + u n,L ) by the second statement ofLemma C.3 and (C.5). On the other hand, the last sum in (C.7) is equal to (cid:88) π ∈ N τ ( N ) P (cid:26) ∃ r ∈ R ( π ) , min i,j ∈ N r ( π ) log ˆ p ij > − r L (cid:27) ∩ A ( π ) = O ( n ρ L ) , by the ﬁrst statement of Lemma C.3. Since ρ L = O ( ω n,L ) , we ﬁnd that (cid:88) π ∈ N τ ( N ) P { ˆ r ∈ R ( π ) } ∩ A ( π ) = 1 + O ( n ω n,L + u n,L ) , (C.8)as n, L → ∞ .Thus, we have (cid:88) π ∈ N τ ( N ) P (cid:110) ∃ i, j ∈ ˆ N ˆ r , τ ( i ) (cid:54) = τ ( j ) (cid:111) (C.9) ≥ (cid:88) π ∈ N τ ( N ) P (cid:110) ∃ i, j ∈ ˆ N ˆ r , τ ( i ) (cid:54) = τ ( j ) (cid:111) ∩ A ( π ) ≥ (cid:88) π ∈ N τ ( N ) P {∃ i, j ∈ N (cid:48) ˆ r , τ ( i ) (cid:54) = τ ( j ) } ∩ A ( π )= (cid:88) π ∈ N τ ( N ) P { ˆ r ∈ R ( π ) } ∩ A ( π ) = 1 + O ( n ω n,L + u n,L ) , by (C.8). Thus we obtain the desired result. (cid:4) Lemma C.5.

For any set N (cid:48) ⊂ N which contains i, j ∈ N such that τ ( i ) (cid:54) = τ ( j ) , the orderedpartition ( ˆ N (cid:48) , ˆ N (cid:48) ) of N (cid:48) obtained by the Split Algorithm is asymptotically τ -compatible atrate n ω n,L . Proof:

We use the deﬁnitions of N (cid:48) ( i ) , N (cid:48) ( i ) , N (cid:48) ( i ) , and N (cid:48) ( i ) in (C.1). By LemmaC.2, for each i ∈ N (cid:48) , the ordered partitions ˆ T ( i ) = ( ˆ N (cid:48) ( i ) , N (cid:48) ( i ) \ ˆ N (cid:48) ( i )) and ˆ T ( i ) =( N (cid:48) ( i ) \ ˆ N (cid:48) ( i ) , ˆ N (cid:48) ( i )) are such that for T ( i ) = ( N (cid:48) ( i ) , N (cid:48) ( i )) and T ( i ) = ( N (cid:48) ( i ) , N (cid:48) ( i )) ,we have max i ∈ N P { ˆ T ( i ) = T ( i ) } = 1 + O ( nω n,L ) , and max i ∈ N P { ˆ T ( i ) = T ( i ) } = 1 + O ( nω n,L ) . Therefore, max i ∈ N P { ˆ T ( i ) (cid:54) = T ( i ) } + P { ˆ T ( i ) (cid:54) = T ( i ) } = O ( nω n,L ) , (C.10)as n, L → ∞ . Deﬁne ˆ T = ( ˆ N (cid:48) , ˆ N (cid:48) ) and note that ˆ T = ( ˆ N (cid:48) ( i ∗ ) , ( N (cid:48) ( i ∗ ) \ ˆ N (cid:48) ( i ∗ )) ∪ { i ∗ } ) , or ˆ T = (( N (cid:48) ( i ∗ ) \ ˆ N (cid:48) ( i ∗ )) ∪ { i ∗ } , ˆ N (cid:48) ( i ∗ )) , depending on whether s ( i ∗ ) ≤ s ( i ∗ ) or s ( i ∗ ) > s ( i ∗ ) . Hence P { ˆ T ∈ N τ ( N (cid:48) ) } = (cid:88) i ∈ N (cid:48) P { ˆ T ( i ) = T ( i ) , i = i ∗ , s ( i ) ≤ s ( i ) } (C.11) + (cid:88) i ∈ N (cid:48) P { ˆ T ( i ) = T ( i ) , i = i ∗ , s ( i ) > s ( i ) } + O ( n ω n,L ) , by (C.10). Now, the leading sum on the right hand side is bounded from below by (cid:88) i ∈ N (cid:48) ( P { i ∗ = i, s ( i ) ≤ s ( i ) } − P { ˆ T ( i ) (cid:54) = T ( i ) } ) . Applying the same argument to the last sum in (C.11), we obtain that P { ˆ T ∈ N τ ( N (cid:48) ) } ≥ − (cid:88) i ∈ N (cid:48) (cid:16) P { ˆ T ( i ) (cid:54) = T ( i ) } + P { ˆ T ( i ) (cid:54) = T ( i ) } (cid:17) + O ( n ω n,L ) , as n, L → ∞ . Hence by (C.10), we conclude that P { ˆ T ∈ N τ ( N (cid:48) ) } = 1 + O ( n ω n,L ) , as n, L → ∞ . (cid:4) Lemma C.6.

Suppose that for some s < K , an estimated ordered partition ( ˆ N , ..., ˆ N s ) of N is asymptotically τ -compatible at rate u n,L .Then the new ordered partition ( ˆ N (cid:48) , ..., ˆ N (cid:48) s +1 ) of N obtained by applying the Selection-Split Algorithm to ( ˆ N , ..., ˆ N s ) is asymptotically τ -compatible at rate n ω n,L + u n,L . Proof:

For each π ∈ N τ ( N ) , let us consider the event that ( ˆ N , ..., ˆ N s ) coincides with the τ -ordered partition, say, π = ( N (cid:48) ( π ) , ..., N (cid:48) s ( π )) , and denote the event by A ( π ) . Let R ( π ) be as in the proof of Lemma C.4. For each r ∈ R ( π ) , let B s ( r ; π ) be the event that thesplit of N (cid:48) r into ˆ N (cid:48) r, ∪ ˆ N (cid:48) r, (according to the Split Algorithm) coincides with N (cid:48) r, ∪ N (cid:48) r, such that ( N (cid:48) ( π ) , ..., N (cid:48) r − ( π ) , N (cid:48) r, , N (cid:48) r, , N (cid:48) r +1 ( π ) , ..., N (cid:48) s ( π )) ∈ N τ ( N ) . Let ˆ r be the chosen group index among , ..., s by the Selection Step. From the proof ofLemma C.4 (see (C.8)), we have (cid:88) π ∈ N τ ( N ) P { ˆ r ∈ R ( π ) } ∩ A ( π ) = 1 + O ( n ω n,L + u n,L ) . (C.12) The probability that the new ordered partition ( ˆ N (cid:48) , ..., ˆ N (cid:48) s +1 ) of N belongs to N τ ( N ) isbounded from below by (cid:88) π ∈ N τ ( N ) P ( B s (ˆ r ; π ) ∩ A ( π )) ≥ (cid:88) π ∈ N τ ( N ) P { ˆ r ∈ R ( π ) } ∩ A ( π ) ∩ B s (ˆ r ; π )= (cid:88) π ∈ N τ ( N ) (cid:88) r ∈ R ( π ) P { ˆ r = r } ∩ A ( π ) ∩ B s ( r ; π )= (cid:88) π ∈ N τ ( N ) (cid:88) r ∈ R ( π ) P { ˆ r = r } ∩ A ( π ) + O ( n ω n,L ) , by Lemma C.5. The last double sum is O ( n ω n,L + u n,L ) by (C.12). Thus, we concludethat P { ( ˆ N (cid:48) , ..., ˆ N (cid:48) s +1 ) ∈ N τ ( N ) } = 1 + O ( n ω n,L + u n,L ) , as n, L → ∞ . (cid:4) For each K ≥ , let ˆ T K = ( ˆ N , ..., ˆ N K ) be the estimated ordered group structure ob-tained through the Selection-Split algorithm (until the number of the groups reach K )and let T = ( N , ..., N K ) be the true ordered group structure. For the remainder of theproof, we assume that the conditions of Theorem C.1 hold. Lemma C.7.

Along any sequence of probabilities P n,L ∈ P ,ε , P n,L { ˆ T K = T } = 1 + O ( n ω n,L ) , as n, L → ∞ . Proof:

Note that ˆ T K = ( ˆ N , ..., ˆ N K ) is asymptotically τ -compatible at rate n ω n,L byLemmas C.5 and C.6. Since T has K groups, this gives the desired result. (cid:4) Lemma C.8. (i) If K ≥ K , then ˆ V ( K ) = O P (1) , as n, L → ∞ . (ii) If K < K , then for any M > , as n, L → ∞ , P { ˆ V ( K ) > g ( L ) M } → . Proof: (i) Let ( ˆ N , ..., ˆ N K ) be the ordered partition obtained by the Selection-Split Al-gorithm. By Lemma C.7, the event that τ ( i ) = τ ( j ) for all i, j ∈ ˆ N k has probabilityapproaching one for all k = 1 , ..., K . Let ( ˆ N (cid:48) , ..., ˆ N (cid:48) K ) be the ordered partition obtainedby the Selection-Split Algorithm with K ≥ K .Since K ≥ K , due to the sequential split nature of the algorithm, each of the resultinggroups, say, ˆ N (cid:48) k , k = 1 , ..., K , is a subset of a group, say, ˆ N r , obtained at step K = K .Therefore, the event that τ ( i ) = τ ( j ) for all i, j ∈ ˆ N (cid:48) k has probability approaching one foreach k = 1 , ..., K . By Assumption C.1(i), we have ˆ V ( K ) = 1 K K (cid:88) k =1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) min i,j ∈ ˆ N (cid:48) k log ˆ p ij (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = O P (1) , as n, L → ∞ . Thus (i) follows. (ii) Suppose that K < K , and ﬁx any M > . Let ( ˆ N , ..., ˆ N K ) be the ordered partitionobtained by the Selection-Split Algorithm. Let the event that π ∈ N τ ( N ) coincides with ( ˆ N , ..., ˆ N K ) be denoted by A K ( π ) . The event is disjoint across π ∈ N τ ( N ) . By LemmasC.5 and C.6, the ordered partition ( ˆ N , ..., ˆ N K ) is asymptotically τ -compatible at rate n ω n,L , that is, (cid:88) π ∈ N τ ( N ) P A K ( π ) = P  (cid:91) π ∈ N τ ( N ) A K ( π )  = 1 + O ( n ω n,L ) . (C.13)Since K < K , for any π ∈ N τ ( N ) such that π = ( ˆ N , ..., ˆ N K ) , there exists ˆ k ( π ) ∈{ , ...., K } such that for some i, j ∈ ˆ N ˆ k ( π ) , τ ( i ) (cid:54) = τ ( j ) . Then we have P { ˆ V ( K ) > g ( L ) M } = P (cid:40) K K (cid:88) k =1 min i,j ∈ ˆ N k log ˆ p ij < − g ( L ) M (cid:41) (C.14) ≥ (cid:88) π ∈ N τ ( N ) P (cid:40) K K (cid:88) k =1 min i,j ∈ ˆ N k log ˆ p ij < − g ( L ) M (cid:41) ∩ A K ( π ) ≥ (cid:88) π ∈ N τ ( N ) P (cid:40) min i,j ∈ ˆ N ˆ k ( π ) log ˆ p ij < − g ( L ) KM (cid:41) ∩ A K ( π ) , because the log of p -values are non-positive. The last sum in (C.14) is bounded frombelow by (cid:88) π ∈ N τ ( N ) P (cid:8) ∀ i, j ∈ N, s.t. τ ( i ) (cid:54) = τ ( j ) , log ˆ p ij < − g ( L ) KM (cid:9) ∩ A K ( π ) ≥ P (cid:8) ∀ i, j ∈ N, s.t. τ ( i ) (cid:54) = τ ( j ) , log ˆ p ij < − g ( L ) KM (cid:9) + O ( n ω n,L ) , by (C.13). By the condition that g ( L ) /r L → as n, L → ∞ , the last probability is boundedfrom below by (from some large L on) P (cid:8) ∀ i, j ∈ N, s.t. τ ( i ) (cid:54) = τ ( j ) , log ˆ p ij < − r L (cid:9) = 1 − P (cid:8) ∃ i, j ∈ N, s.t. τ ( i ) (cid:54) = τ ( j ) , log ˆ p ij ≥ − r L (cid:9) = 1 + O ( n ρ L ) , by Lemma C.1(ii). (cid:4) Lemma C.9. P { ˆ K = K } → as n, L → ∞ . Proof:

Choose K such that K < K and write ˆ Q ( K ) − ˆ Q ( K ) = ˆ V ( K ) − ˆ V ( K ) + ( K − K ) g ( L ) . As for the leading term on the left hand side, we have ˆ V ( K ) − ˆ V ( K ) = O P (1) , by Lemma C.8(i). Since g ( L ) → ∞ , we ﬁnd that whenever K > K , we have P (cid:110) ˆ Q ( K ) < ˆ Q ( K ) (cid:111) → . And for all

K < K , we have by Lemma C.8(ii), for any M > , (C.15) P (cid:110) ˆ V ( K ) > g ( L ) M (cid:111) → , whereas ˆ V ( K ) = O P (1) . Therefore, choose ε > and take any M (cid:48) ε > such that P { ˆ V ( K ) ≥ M (cid:48) ε } ≤ ε. We take large

M > K − K such that < M (cid:48) ε ≤ ( K − K + M ) g ( L ) . We ﬁnd that P (cid:110) ˆ Q ( K ) < ˆ Q ( K ) (cid:111) = P (cid:110) ˆ V ( K ) < ˆ V ( K ) + ( K − K ) g ( L ) (cid:111) ≥ P (cid:110) ˆ V ( K ) < ( K − K + M ) g ( L ) (cid:111) + o (1) , ( by (C.15) ) ≥ P (cid:110) ˆ V ( K ) < M (cid:48) ε (cid:111) + o (1) ≥ − ε + o (1) , as n, L → ∞ . By sending ε down to zero, we conclude that P { ˆ K = K } → , as n, L → ∞ . (cid:4) Proof of Theorem C.1:

The desired result follows from Lemmas C.7 and C.9. (cid:4)

Appendix D. Conﬁdence Sets for the Group Structure

The web appendix of Krasnokutskaya, Song, and Tang (2020b) proposes a methodto construct a conﬁdence set for each group of agents having the same type. Here forthe sake of readers’ convenience, we reproduce the procedure here using the notation ofthis paper. Let us consider a set-up where we have K groups and the set N of agents.Let ˆ K be the consistent estimator of K as proposed in Krasnokutskaya, Song, and Tang(2020a). As for conﬁdence sets, we construct a conﬁdence set for each group of agentswho have the same type. First, we ﬁx k = 1 , ..., ˆ K and construct a conﬁdence set for the k -th type group N k . In other words, we construct a random set ˆ C k ⊂ N such thatliminf L →∞ P { N k ⊂ ˆ C k } ≥ − α, For this, we need to devise a way to approximate the ﬁnite sample probabilities like P { N k ⊂ ˆ C k } . Since we do not know the cross-sectional dependence structure among theagents, we use a bootstrap procedure that preserves the dependence structure from theoriginal sample. The remaining issue is to determine the space in which the random set ˆ C k ⊂ N can take values in. It is computationally infeasible to consider all possible suchsets. Instead, we proceed as follows. First we estimate ˆ N k as prescribed in the paperand also obtain ˆ δ ij , the test statistic deﬁned in the main text. Given the estimate ˆ N k , weconstruct a sequence of sets as follows: Step 1:

Find i ∈ N \ ˆ N k that minimizes min j ∈ ˆ N k ˆ δ i j , and construct ˆ C k (1) = ˆ N k ∪ { i } . Step 2:

Find i ∈ N \ ˆ C k (1) that minimizes min j ∈ ˆ C k (1) ˆ δ i j , and construct ˆ C k (2) = ˆ C k (1) ∪{ i } . Step m : Find i m ∈ N \ ˆ C k ( m − that minimizes min j ∈ ˆ C k ( m − ˆ δ i m j and construct ˆ C k ( m ) =ˆ C k ( m − ∪ { i m } . Repeat Step m up to n = | N | .Now, for each bootstrap iteration s = 1 , ..., B , we construct the sets ˆ N ∗ k,s and { ˆ C ∗ k,s ( m ) } following the steps described above but using the bootstrap sample. (Note that this boot-strap sample should be drawn independently of the bootstrap sample used to constructbootstrap p -values ˆ p ij in the classiﬁcation.)Then, we compute the following: ˆ π k ( m ) ≡ B B (cid:88) s =1 (cid:110) ˆ N k ⊂ ˆ C ∗ k,s ( m ) (cid:111) . Note that the sequence of sets ˆ C ∗ k,s ( m ) increases in m . Hence the number ˆ π k ( m ) shouldalso increase in m . An (1 − α ) %-level conﬁdence set is given by ˆ C ∗ k ( m ) with ≤ m ≤ n such that ˆ π k ( m − < − α ≤ ˆ π k ( m ) . Note that such m always exists, because ˆ C ∗ k,s ( n ) = N . Appendix E. Further Simulation Results

Tables E.1 and E.2 summarize the distribution of estimation errors in our group classiﬁ-cation algorithm from 500 simulated data sets, when the number of groups is K = 2 andassumed known to the econometrician. The column D µ shows the difference between thegroup means chosen in the simulation.When K = 2 , the results show that the estimation error, as measured by the expectedaverage discrepancy (EAD), decreases with the distance between group means. Sucha reduction in EAD is most substantial when the number of agents is larger ( n = 40 )and the size of the data is small ( L = 100 ). Given group difference, EAD decreases assample size increases moderately from L = 100 to . This pattern is most obvious when D µ = 0 . .The other measure of estimation errors, HAD(p), also shows encouraging results.HAD(p) is zero for most of the cells in both panels (a) and (b), which shows that theempirical distribution of proportion of mis-classiﬁed bidders is reasonably skewed to theright. Besides, the reduction in HAD(p) as the sample size increases is most pronouncedwith closer group means, regardless of the number of bidders in the population.When K = 4 , the results demonstrate very similar patterns. Most remarkably, bothmeasures of mis-classiﬁcation errors only increase very marginally relative to the casewith K = 2 .Tables E.3 and E.4 report results from the full, feasible classiﬁcation procedure whenthe number of groups is estimated through the penalization scheme proposed in the text.For most of the speciﬁcations used in these two tables, the estimates for the number of groups ˆ K are tightly clustered around the correct K . Compared with the resultsfor infeasible classiﬁcation under known K , EAD and HAD(p) increase in most cases.Nonetheless such an increase is quite moderate, suggesting that our feasible classiﬁcationalgorithm performs reasonably well relative to its infeasible counterpart.In Tables E.3 - E.4, we report the analysis of computation time for the classiﬁcationprocedure. In Table 5.3, we give a decomposition of the time that it took for the classiﬁ-cation procedure. The table clearly shows that the major computation time spent is whenwe construct bootstrap p-values. Once the p-values are constructed, the classiﬁcationalgorithm itself runs fairly fast.In Table E.4 , the computation time is shown to vary depending on the number of theagents ( n ), the number of the true groups ( K ), and the number of the markets ( L ).The results show that the most computation time increase arises when the number ofthe bidders increases rather than when the number of the markets or the number of thegroups increases. Our simulation studies are based on our MatLab code. The programwas executed using a computer with the following speciﬁcations: Intel(R) Xeon (R) CPUX5690 @3.47 GHz 3.46 GHz. Table E.1 : Performance of the Classiﬁcation Estimator with Two Groups:( K = 2 and known) n L D µ EAD HAD(.10) HAD(.25) HAD(.50) HAD(.75) HAD(.90)12 400 0.6 0.012 0.012 0 0 0 012 400 0.4 0.014 0.014 0 0 0 012 400 0.2 0.004 0.004 0 0 0 012 200 0.6 0.004 0.004 0 0 0 012 200 0.4 0.006 0.006 0 0 0 012 200 0.2 1.118 0.560 0.252 0.158 0 012 100 0.6 0.006 0.006 0 0 0 012 100 0.4 0.084 0.078 0.006 0 0 012 100 0.2 1.794 0.682 0.478 0.284 0 040 400 0.6 0.018 0 0 0 0 040 400 0.4 0.022 0 0 0 0 040 400 0.2 1.170 0.178 0.014 0 0 040 200 0.6 0.018 0 0 0 0 040 200 0.4 0.020 0 0 0 0 040 200 0.2 2.726 0.404 0.210 0.122 0.021 0.00140 100 0.6 0.020 0 0 0 0 040 100 0.4 0.452 0.010 0 0 0 040 100 0.2 3.720 0.902 0.578 0.234 0.132 0.043

Note: This table summarizes the distribution of estimation errors in our classiﬁcation algorithm from 500Monte Carlo replications when K = 4 and known. Here n represents the number of the individualagents, L the number of the observed games in the data, D µ the distance between population means, EADthe expected average discrepancy, and HAD(p) the hazard rate of average discrepancies at p . Table E.2 : Performance of the Classiﬁcation Estimator with Four Groups:( K = 4 and known) n L D µ EAD HAD(.10) HAD(.25) HAD(.50) HAD(.75) HAD(.90)12 400 0.6 0.011 0.014 0.004 0 0 012 400 0.4 0.018 0.016 0.010 0 0 012 400 0.2 0.017 0.022 0.006 0 0 012 200 0.6 0.013 0.018 0.004 0 0 012 200 0.4 0.004 0.008 0 0 0 012 200 0.2 1.112 0.764 0.188 0.024 0.008 012 100 0.6 0.003 0.006 0 0 0 012 100 0.4 0.044 0.040 0.024 0.002 0 012 100 0.2 1.504 0.868 0.342 0.106 0.04 040 400 0.6 0.115 0.020 0.020 0 0 040 400 0.4 0.121 0.020 0.020 0 0 040 400 0.2 2.450 0.680 0.368 0.018 0.018 040 200 0.6 0.109 0.018 0.018 0 0 040 200 0.4 0.140 0.026 0.026 0 0 040 200 0.2 3.172 0.810 0.366 0.246 0.026 040 100 0.6 0.141 0.024 0.024 0 0 040 100 0.4 1.003 0.176 0.176 0.006 0 040 100 0.2 4.557 0.904 0.652 0.526 0.202 0.053

Note: This table summarizes the distribution of estimation errors in our classiﬁcation algorithm from 500Monte Carlo replications when K = 4 and known. Here n represents the number of the individualagents, L the number of the observed games in the data, D µ the distance between population means, EADthe expected average discrepancy, and HAD(p) the hazard rate of average discrepancies at p . Table E.3 : Computational Time for Various Steps of the Procedure( n = 60 , K = 2 , unknown, time measured in seconds )Step Description L =100 L =200 L =4001 generating pairwise indexes from the data 0.2987 0.3543 0.46072 constructing bootstrap pairwise indexes 81.2178 81.4871 82.08073 computing bootstrap p-values 0.0012 0.0014 0.00144 division of a group into two 0.0008 0.0008 0.0008n+4 number of groups selection 0.0002 0.0002 0.0002Total Time 81.528 81.852 82.552 Note: The table shows a decomposition of a total time it has taken for the classiﬁcation procedure. Thetable shows that the major portion of the time comes from constructing the bootstrap pairwise indexes.Once the bootstrap p-values are constructed, the classiﬁcation algorithm runs quite fast.

Table E.4 : Total Computational Time: across n , L , and K ( K unknown, time measured in seconds ) L =100 L =200 L =400 L =200 L =200 K =2 K =2 K =2 K =4 K =6 n =

12 3.246 3.224 3.239 3.216 3.219 n =

24 13.057 13.177 13.259 13.185 13.189 n =

48 51.987 52.272 52.700 52.281 52.291 n =

60 81.528 81.852 82.552 81.862 82.874 n =

72 116.949 117.213 117.577 116.912 117.328 n =

96 209.426 209.971 209.834 209.884 210.058

Note: The table shows the change in the computation time as one changes the number of the groups ( K ),the number of the markets and the number of the agents (i.e., bidders) ( ( n ) ). The major increase in thecomputation time arises when the number of the bidders increases rather than when the number of themarkets or the groups increases. Appendix F. Additional Materials for the Empirical Application

Table F.1 reports summary statistics for this set of projects. The table indicates that theprojects are worth $523,000 and last for around three months on average; 38% of theseprojects are partially supported through federal funds. There are 25 ﬁrms that participateregularly in this market. All other ﬁrms in the data are treated as fringe bidders. Anaverage auction attracts six regular potential bidders and eight fringe bidders. Since onlya fraction of potential bidders submits bids, an entry decision plays an important role inthis market. Finally, the distance to the company location varies quite a bit and is around28 miles on average for regular potential bidders.

Table F.1 : Summary Statistics for California Procurement MarketVariable Mean Std. DevEngineer’s estimate (mln) 0.523 0.261Duration, large projects (months) 3.01 1.56Federal Aid 0.384Number of Potential Bidders: 14.1 8.4Fringe Bidders 8.2 4.8Regular Bidders 5.5 3.3Number of Entrants: 5.4 2.8Fringe Bidders 3.5 2.7Regular Bidders 1.9 1.8Distance (miles): 18.72 6.33Fringe Bidders 11.21 5.42Regular Bidders 28.34 11.73

Note: This table reports summary statistics for the set of medium size bridge work and paving projectsauctioned in the California highway procurement market between years of 2002 and 2012. It consists of1,054 projects. The distance variable is measured in miles. It reﬂects the driving time between the projectsite and the nearest company plant. The “Federal Aid” variable is equal to one if the project receivesfederal aid and zero otherwise.

In Table F.2, we present an extended version of Table 6 in Section 6 of the main paper.This table includes the estimates of the group-speciﬁc ﬁxed effects. Table F.2 : Parameter Estimates (Extended Version of Table 6.)Estimate Std. Error Estimate Std. Error P-valueThe Distribution of Project CostsConstant ( ¯ q ) 0.127 ∗∗∗ (0.0129) 0.113 ∗∗∗ (0.0119) 0.216Eng. Estimate -0.0004 ∗∗∗ (0.0002) -0.0005 ∗∗∗ (0.0002) 0.392Duration 0.00026 ∗ (0.00036) 0.00022 ∗ (0.00027) 0.212Distance 0.0012 ∗∗∗ (0.00022) 0.00086 ∗∗∗ (0.00019) 0.041Bridge -0.0092 ∗∗∗ (0.0018) -0.012 ∗∗∗ (0.0011) 0.074Federal Aid -0.043 ∗∗∗ (0.0103) -0.078 ∗∗∗ (0.009) 0.012Regular Bidder -0.035 ∗∗∗ (0.003) ¯ q − ¯ q -0.051 ∗∗∗ (0.008) ¯ q − ¯ q -0.012 ∗∗∗ (0.005) ¯ q − ¯ q -0.032 ∗∗∗ (0.009) ¯ q − ¯ q -0.058 ∗∗∗ (0.008) ¯ q − ¯ q -0.014 ∗∗∗ (0.007) ¯ q − ¯ q -0.008 ∗∗∗ (0.006) ¯ q − ¯ q -0.009 ∗∗∗ (0.007) ¯ q − ¯ q -0.050 ∗∗∗ (0.006) σ C ∗∗∗ (0.032) 0.112 ∗∗∗ (0.022) 0.087 σ U ∗∗∗ (0.009) 0.0207 ∗∗∗ (0.008) 0.452The Distribution of Entry CostsConstant ( ˜ q ) -0.0114 ∗ (0.0078) -0.0161 ∗ (0.0091) 0.212Eng. Estimate 0.0055 ∗∗∗ (0.0016) 0.0051 ∗∗∗ (0.0012) 0.333Number of Items 0.0018 ∗ (0.0011) 0.0011 ∗∗∗ (0.0005) 0.082Regular Bidder -0.022 ∗∗∗ (0.004) ˜ q − ˜ q -0.019 ∗∗∗ (0.005) ˜ q − ˜ q -0.018 ∗∗∗ (0.007) ˜ q − ˜ q -0.016 ∗∗∗ (0.007) ˜ q − ˜ q -0.024 ∗∗∗ (0.006) ˜ q − ˜ q -0.022 ∗∗∗ (0.008) ˜ q − ˜ q -0.018 ∗∗∗ (0.006) ˜ q − ˜ q -0.017 ∗∗∗ (0.008) ˜ q − ˜ q -0.019 ∗∗∗ (0.008) Note: In the results above the distance is measured in miles. The fringe bidders are the reference group.The ﬁrst two columns correspond to the speciﬁcation which allows for the unobserved bidderheterogeneity; the next two columns correspond to the speciﬁcation without unobserved bidderheterogeneity. The last column reports the p-value of the bootstrap-based test of the equality ofcoefﬁcients estimated under the speciﬁcations with and without unobserved bidder heterogeneity. Detailsof the test are explained below. The results are based on the data for 1,054 medium-sized projects thatinvolve paving and bridge work.

The bootstrap p values in the last column of Table 6 are obtained by the following pro-cedure. First, let ˆ θ R and ˆ θ U be estimators of a scalar parameter θ , where ˆ θ R is obtained under the bidder homogeneity restriction and ˆ θ U is obtained under the group heterogene-ity of bidder types. We deﬁne T = √ L | ˆ θ R − ˆ θ U | For critical values, for each boostrap sample (indexed by b = 1 , ..., B ), we construct both ˆ θ ∗ R,b and ˆ θ ∗ U,b using the same bootstrap sample. Then we construct a bootstrap version of T as follows: T ∗ b = √ L | ˆ θ ∗ R,b − ˆ θ ∗ U,b − (ˆ θ R − ˆ θ U ) | . The bootstap p value for the equality of the two parameters, one identiﬁed under homo-geneity and the other identiﬁed under group heterogeneity, is given by B B (cid:88) b =1 {T ∗ b > T } . We compute this p value for each parameter in Table 6 that is deﬁned for both speciﬁca-tions and report it in the last column of the table. References H ANSEN , B. E. (2008): “Uniform Convergence Rates for Kernel Estimation with Depen-dent Data,”

Econometric Theory , 24, 726–748.K

RASNOKUTSKAYA , E., K. S

ONG , AND

X. T

ANG (2020a): “Estimating Unobserved AgentHeterogeneity Using Pairwise Comparisons,” Working paper.(2020b): “The Role of Quality in Internet Service Markets,”

Journal of PoliticalEconomy , 128, 75–117.L

EBRUN , B. (1999): “First Price Auctions in the Asymmetric N Bidder Case,”

InternationalEconomic Review , 40(1), 125–142.L EE , S., K. S ONG , AND

Y.-J. W

HANG (2018): “Testing for a General Class of FunctionalInequalities,”

Econometric Theory , 34, 1018–1064.P

ESENDORFER , M. (2000): “A Study of Collusion in First-price Auctions,”

Review of Eco-nomic Studies , 67, 381–411.R

EISS , R.-D. (1989):