[PDF] Teams: Heterogeneity, Sorting, and Complementarity

Abstract

How much do individuals contribute to team output? I propose an econometric framework to quantify individual contributions when only the output of their teams is observed. The identification strategy relies on following individuals who work in different teams over time. I consider two production technologies. For a production function that is additive in worker inputs, I propose a regression estimator and show how to obtain unbiased estimates of variance components that measure the contributions of heterogeneity and sorting. To estimate nonlinear models with complementarity, I propose a mixture approach under the assumption that individual types are discrete, and rely on a mean-field variational approximation for estimation. To illustrate the methods, I estimate the impact of economists on their research output, and the contributions of inventors to the quality of their patents.

Full PDF

aa r X i v : . [ ec on . E M ] F e b Teams:

Heterogeneity, Sorting, and Complementarity ∗ St´ephane BonhommeUniversity of ChicagoFebruary 4, 2021

Abstract

How much do individuals contribute to team output? I propose an econometricframework to quantify individual contributions when only the output of their teams isobserved. The identiﬁcation strategy relies on following individuals who work in diﬀer-ent teams over time. I consider two production technologies. For a production functionthat is additive in worker inputs, I propose a regression estimator and show how toobtain unbiased estimates of variance components that measure the contributions ofheterogeneity and sorting. To estimate nonlinear models with complementarity, I pro-pose a mixture approach under the assumption that individual types are discrete, andrely on a mean-ﬁeld variational approximation for estimation. To illustrate the meth-ods, I estimate the impact of economists on their research output, and the contributionsof inventors to the quality of their patents.

JEL code:

C13, J31.

Keywords:

Team production, networks, unobserved heterogeneity, sorting, comple-mentarity. ∗ Presented at the World Congress of the Econometric Society, invited session “Frontiers in Modern Econo-metrics”, August 17 2020. I thank Manuel Arellano, Eric Auerbach, Philipp Kircher, Thibaut Lamadon,Jack Light, Elena Manresa, Charles Manski, Eleonora Patacchini, Alessandra Voena, Martin Weidner, andaudiences at various venues for comments. I acknowledge support from NSF grant number SES-1658920.

Introduction

How to infer individual contributions when only team output is observed? This question iscentral to a number of areas, ranging from labor economics and the economics of innovation tosports and music. My goal in this chapter is twofold: I propose an econometric frameworkto estimate how team performance depends on its workers, and I apply it to study theproduction of economic research and the impact of individual inventors on the quality oftheir patents.I focus on three aspects of the relationship between workers and teams. The ﬁrst one isworker heterogeneity , which shapes team performance. The approach I propose relies on asimple intuition: when individuals work for diﬀerent teams over time, the variation in teamcomposition will be informative about which workers contribute the most to their teams andwhich ones contribute the least. Hence, the network structure of the data will be crucial inorder to infer worker heterogeneity.The second aspect is the sorting of workers into teams. Both heterogeneity and sortingcontribute to the variation in output. I show in a model where production in additive inworker inputs that the two can be separately identiﬁed. The approach I rely on generalizesthe “plus-minus” metric in US team sports (see Hvattum, 2019): intuitively, when a worker i ′ joins a team and a worker i leaves it, the diﬀerence in output identiﬁes the diﬀerencebetween the types of i ′ and i . More generally, I show how, in collaboration networks, workertypes can be identiﬁed by using their participation in diﬀerent teams, including when theywork on their own.The third aspect is the complementarity between workers. The presence of comple-mentarity is important for policies that inﬂuence the reallocation of workers across teams.Complementarities are also central to theories of sorting (e.g., Becker, 1973, Garicano andRossi-Hansberg, 2006, Eeckhout and Kircher, 2018). To identify complementarities empiri-cally, I build and estimate nonlinear models of team production, which rely on the assumptionthat worker types are discrete. This framework allows me to jointly document the role ofheterogeneity, sorting and complementarity.In the analysis I will draw connections with methods for matched employer-employeedata (e.g., Abowd and Kramarz, 1999). In that literature, both ﬁrm types and worker typescontribute to the wage outcome, which is observed at the worker level, and workers do not1nteract with one another inside the ﬁrm. The worker-ﬁrm structure is thus a bipartitegraph (Bonhomme, 2017). In contrast, in the collaboration networks I study here, workersinteract with one another in the team. In addition, while I assume away team eﬀects, so theteam in my framework is simply a collection of workers, the outcome is only observed at theteam level. Collaboration networks follow a hypergraph structure where teams vary in size(e.g., Turnbull et al. , 2019, Lerner et al. , 2019). Despite these diﬀerences, my framework willleverage insights from the matched employer-employee data literature. As an example, whenproduction is additive in worker inputs, I will estimate worker types using linear regression,in the spirit of Abowd et al. (1999).There is a fourth important aspect in the relationship between workers and teams: thepresence of other factors , beyond worker types, that aﬀect output. These other factors arelumped into the error term of the model. Their presence prevents one from recovering allworker types precisely, especially in data sets where the number of productions per workeris small and collaborations are sparse. As a result, regression-based estimates of sorting andheterogeneity may be biased. To correct the bias in additive models, I implement the methodof Andrews et al. (2008) developed for matched employer-employee data. Bonhomme et al. (2020) ﬁnd using samples from various countries that the bias can be substantial. Similarly,when estimating additive models in networks of collaborations, I ﬁnd that bias correction isimportant.A main goal of this chapter is to propose methods that account for complementaritybetween workers. Bonhomme et al. (2019, BLM) introduce a framework to allow for com-plementarity between workers and ﬁrms in matched employer-employee data. Here I buildon this work to allow for complementarity between workers within a team. Like in BLM, Iassume that heterogeneity takes a ﬁnite number of values. However, while BLM model ﬁrm heterogeneity as discrete and estimate ﬁrm groups using the kmeans algorithm, in my settingestimates of worker types are not suﬃciently precise to follow a similar grouped ﬁxed-eﬀectsapproach. For this reason, I model the distribution of the discrete worker types using arandom-eﬀects approach.The presence of unobserved worker heterogeneity in a network of teams makes estima-tion of nonlinear random-eﬀects models challenging. Indeed, the likelihood function is non- With some exceptions, such as the recent work by Herkenhoﬀ et al. (2018). Kline et al. (2020) generalize this approach to allow for unspeciﬁed heteroskedasticity. The nature ofthe bias is formally studied in Jochmans and Weidner (2020). et al. , 2017). In stochasticblockmodels, which are closely related to the model I study, Bickel et al. (2013) provideconditions for consistency. In my model I perform Monte Carlo simulations to probe theaccuracy of the variational method in ﬁnite samples. The framework can be extended in diﬀerent ways. In particular, I consider adding amodel of team formation on top of the model of team production. To do so, I implement ajoint random-eﬀects approach using a stochastic blockmodel as a simple statistical model ofteam formation. An interesting avenue for future work will be to combine the framework witheconomic models of team formation. It will also be important to understand under whichconditions the methods I propose are robust to the team formation model being misspeciﬁed.I apply these methods to two empirical settings: economic research, and patents andinnovation. Collaboration in scientiﬁc research is an extensively studied topic (e.g., Fur-man and Gaule, 2013). Aspects of the network of collaborations between economists havebeen studied by Goyal et al. (2006), Fafchamps et al. (2010), Hsieh et al. (2018), andAnderson and Richards-Shubik (2019), among others. Here I focus on the impact of individ-ual researchers and the shape of the production function. In the literature on patents andinnovation, there is increasing interest in the role of individual inventors in innovation asan engine of endogenous growth (e.g., Akcigit et al. , 2017, Bell et al. , 2019, Pearce, 2019).Relative to this literature, I propose diﬀerent measures of the type of an inventor – that is,a measure of her “quality” – which I infer from the network of patent collaborations usingeconometric techniques.I estimate the models using a sample of economists constructed by Ductor et al. (2014),and a sample of inventors and their patents constructed by Akcigit et al. (2016). In bothsamples, I ﬁnd that an additive model of team production is useful to highlight the presence ofheterogeneity and sorting. However, the additive model is only an imperfect tool to analyzethese data. Indeed, for both economists and inventors, I ﬁnd evidence of complementaritybetween high-type workers who collaborate in a team. These ﬁndings suggest that futureresearch will need to account for complementarity and sorting when measuring individual Codes to implement the methods are available on my webpage. et al. , 2012, and Arcidiacono et al. , 2017),a key diﬀerence in my context is that I only observe team output. In recent work, Devereux(2018) uses ﬁxed-eﬀects methods to estimate the link between individual productivity andteam performance in tennis games. Most closely related to this chapter, Ahmadpoor andJones (2019) use ﬁxed-eﬀects methods in nonlinear models to document heterogeneity andcomplementarity patterns with a focus on research, innovation and patenting. While I buildon this work, a speciﬁcity of my approach is that it explicitly accounts for the presence ofother unobserved inputs in addition to the worker types, which motivates the use of neweconometric methods. Moreover, estimates of nonlinear models with discrete types uncovernovel forms of complementarity between workers. Another recent related article is Weidmannand Deming (2020), who design and analyze experiments to identify individual contributionsto teamwork.The outline of the chapter is as follows. I describe the framework in Section 2, and studyadditive and nonlinear models in Sections 3 and 4, respectively. I apply the methods to studyeconomic researchers in Section 5, and inventors and their patents in Section 6. Finally, Iconclude in Section 7.

Consider a set of nodes , ..., N , which represent workers who collaborate and produce outputin teams. Nodes are linked to each other by hyperedges that represent the collaborationsbetween workers. In a standard graph, an edge links two nodes i and i ′ . In a hypergraph ofcollaborations, hyperedges can link a worker i to herself when i produces on her own, twoworkers i and i ′ who produce together, three workers i, i ′ , i ′′ who produce together, and soon. I will refer to hyperedges as “teams”, and denote the workers in a team j of size n j = n as ( i ( j ) , ..., i n ( j )). I denote the number of teams as J .As an example, Figure 1 represents a hypergraph of collaborations with ﬁve teams andﬁve workers. Workers 3 and 5 produce on their own. In addition, worker 3 produces jointlywith worker 4, and workers 4 and 5 produce in a team together with worker 2. Lastly, worker1 produces with worker 2 in a team. Related methods have been used to estimate manager ﬁxed-eﬀects (e.g., Bertrand and Schoar, 2003). worker worker worker worker worker team A team B team C team D team E Worker i contributes to the team a quantity α i , which is unobserved to the econometri-cian. I assume α i is constant across collaborations, and I refer to α i as the type of worker i .Constancy of worker types over time is a key feature of my framework. However, assumingthat individual types remain constant may be restrictive. For this reason, in the empiricalanalysis I will use observations from at most ﬁve consecutive years. Note that, while workertypes are constant, type composition varies across teams.The output of a team j with n workers is denoted as Y nj , and it is given by Y nj = φ n ( α i ( j ) , ..., α i n ( j ) , ε nj ) , (1)where φ n is the production function of an n -worker team, and ε nj represent other factors, orshocks, unobserved to the econometrician, that aﬀect team output beyond workers’ inputs α i . , I assume that the positions of workers in the team do not matter, so φ n is symmetricwith respect to its ﬁrst n arguments. Hence, the framework abstracts from within-teamhierarchy and organization.I now state the ﬁrst key assumption. Throughout, { A k : k } denotes the set of A k for all k . To relax this assumption, one could rely on hidden Markov models, which allow types to evolve asMarkov processes, or one could alternatively rely on mixed membership models (e.g., Airoldi et al. , 2008),which allow workers’ types to depend on the team they participate in. I do not study these possibilities here. Here and in the following, α i ( j ) = P Ni =1 { i ( j ) = i } α i , and similarly for the other workers in team j . Since n = n j is the size of team j , an alternative notation for (1) could be e Y j = φ n j ( α i ( j ) , ..., α i n ( j ) , e ε j ) , where e Y j = Y n j ,j and e ε j = ε n j ,j . ssumption 1 (Network exogeneity) { ε nj : ( n, j ) } are independent of { ( i ( j ) , ..., i n ( j )) : ( n, j ) } conditional on { α i : i } . Assumption 1 restricts the team formation process. It states that team formation isindependent of the team-speciﬁc shocks ε nj , conditional on the worker types α , ..., α N .Hence, while the probability of joining a team may depend unrestrictedly on worker typesand other factors that are independent of the ε nj ’s, it cannot depend on the ε nj ’s themselves.Assumption 1 will thus fail if, before forming a team or joining one, workers have advanceinformation about team-speciﬁc shocks. This assumption is important for the tractability ofthe approach. Yet it is restrictive, and in future work it will be important to relax it. I now state the second key assumption.

Assumption 2 (Independent shocks) { ε nj : ( n, j ) } are mutually independent, and independent of { ( i ( j ) , ..., i n ( j )) : ( n, j ) } and { α i : i } . Assumption 2 rules out dependence among team-speciﬁc shocks. As an implication,shocks to a team of workers who collaborate repeatedly over time are assumed seriallyindependent. An alternative approach, which I do not study in this chapter, would beto model the dependence, for example using a parametric process for ε nj . More generally,the framework does not allow for dynamics in team production, such as state dependenceeﬀects.Another implication of Assumption 2 is the absence of “team eﬀects” in the model. Inthe framework, a team j is only a collection of workers, and there is no eﬀect of j per se except through the independent shocks ε nj . In particular, the model does not allow forteam-speciﬁc capital (Jaravel et al. , 2018). In applications where ﬁrms (e.g., in R&D) orteam identities (e.g., in sports) play an important role, it will be important to extend theframework to allow for team types and other team-speciﬁc factors, in addition to workertypes.Lastly, another assumption implicit in (1) is the absence of covariates, except for teamsize n . Covariates can be incorporated as additional inputs to production. In the empiricalapplications, I will account for time eﬀects, and for age eﬀects in robustness checks. A possibility to relax Assumption 1 is a joint random-eﬀects approach, where production and teamformation are estimated together (see Section 4).

6o specify and estimate the model, I will use two approaches. In the ﬁrst one (“ﬁxed-eﬀects”, FE) I will treat the α i ’s as parameters and estimate them. The FE approach hasthe advantage of not requiring to specify a model of team formation. Moreover, when teamoutput is additive in worker types and Assumption 1 holds, the FE approach is tractable.However, in the empirical collaboration networks that I will analyze, α i estimates will oftenbe very noisy. The prevalence of sample noise will require using methods for bias reduction.In the second approach (“random-eﬀects”, RE) I will model the joint distribution of the α i ’s. I will use this approach to estimate nonlinear models with complementarity betweenworker types. The RE approach accounts naturally for sample noise. However, it requiresspecifying how the worker types depend on the collaborations in the network. In other words,the RE approach requires making assumptions about team formation beyond Assumption1. I will compare several approaches based on simple statistical models. In this section and the next, I present methods for additive production and nonlinear pro-duction, in turn.Suppose that the production function in (1) takes the following additive form Y nj = λ n (cid:0) α i ( j ) + ... + α i n ( j ) (cid:1) + ε nj , (2)where λ n is a team-size scaling factor. I adopt the normalization λ = 1. In (2), outputdepends on the sum of worker types, or equivalently on the mean worker type in the team(given the presence of λ n ). It is useful to write (2) in vector form as Y n = λ n A n + ε n ; that is, equivalently, stackingall teams and team sizes together, Y = D λ Aα + ε, (3)where D λ is a diagonal matrix, and A is a matrix of zeros and ones where a one indicatesthat a worker (i.e., a column) participates in a team (i.e., a row). Network exogeneity thentakes the form E ( ε nj | A, α ) = µ n , (4) In model (2), types α i are scalar, and their eﬀects on output in teams of varying sizes are proportionalto each other. The nonlinear model in Section 4 will allow for diﬀerent patterns, possibly non-monotonicacross teams of diﬀerent sizes with varying worker composition. µ n is a team-size-speciﬁc constant. In the empirical analysis, I will estimate (2) underthe assumption that µ n = 0. In addition, in a robustness check, I will report results fromthe following alternative speciﬁcation in logarithms,ln Y nj = µ n + α i ( j ) + ... + α i n ( j ) + ε nj . (5) To analyze identiﬁcation in this setting, I treat α , λ , µ , and A as non-stochastic quantities. Suppose, to start with, that the λ n ’s are known, and that µ n = 0 in (4). Since model (3)-(4)is a linear regression, the identiﬁcation of the worker eﬀects α , ..., α N is determined by therank properties of A .As an example, consider the matrix A corresponding to Figure 1: A =   . Since this matrix is non-singular, all α , ..., α are identiﬁed. An intuitive algorithm to showidentiﬁcation in this case is as follows. Let S denote the set of workers i for which α i isidentiﬁed. First, focus on workers who produce on their own: since workers 3 and 5 work ontheir own, include them in S . Then, focus on teams where the workers in S collaborate withothers: since workers 3 and 4 work together, include worker 4 in S . Repeat this operation:since workers 2, 4 and 5 work together, include worker 2 in S , and since workers 1 and 2work together, include worker 1 in S . Hence S = { , , , , } .To generalize the approach beyond this simple example, let B = D λ A , continuing withthe case where D λ is known. A worker type α i , for i ∈ { , ..., N } , is identiﬁed if and only if ∃ v i : B ′ v i = e i , where e i denotes the i -th canonical vector in R N . Letting ( B ′ ) † denote the Moore-Penrosegeneralized inverse of B ′ , and I be a conformable identity matrix, this is equivalent to( I − B ′ ( B ′ ) † ) e i = 0 . (6) Hence the conditioning on (

A, α ) in (4) could be removed from the notation. α i for which (6) holds are identiﬁed. In general, however, not all α i ’s areidentiﬁed. For example, when two workers always work together, with or without otherco-workers, their respective α i ’s are not separately identiﬁed.In practice, it is convenient to further restrict the sets of workers and teams such thatthe types of all workers in those teams are identiﬁed. To do this, I implement the followingsimple iterative algorithm. Algorithm 1

Iterate until convergence:1. Select all the workers such that (6) holds.2. Select all the teams that only comprise those workers.3. Let B sub be the resulting selection of rows and columns of B . Rename B sub as B . To compute Moore-Penrose generalized inverses, I rely on sparse LU matrix decomposi-tions. With some abuse of notation, I will still denote the resulting restricted vectors andmatrices as Y , D λ , A , and ε . Consider now the team-size parameters λ n , under the maintained assumption that µ n = 0.Since D − λ Y = Aα + D − λ ε, it follows that ( I − AA † ) D − λ Y = ( I − AA † ) D − λ ε. Hence, (4) implies E (cid:2) Z ′ n ( I − AA † ) D − λ Y (cid:3) = 0 , for all n, (7)where Z n is a J × n j = n and zeros elsewhere. System (7) is linearin the parameters λ − n , so the conditions for identiﬁcation are standard. In particular, λ , λ , λ are not identiﬁed in the stylized example of Figure 1. Here my goal is to point-identify the α i values of a set of workers, and the corresponding contributionsof these workers to team output. For this reason, in the identiﬁcation strategy I rely on workers who produceon their own. In other settings (e.g., team sports) this information may not be available. In such cases, itmay still be possible to identify diﬀerences between α i ’s. As an example, consider a team of ﬁve players,where player i gets replaced by player i ′ . In this case the diﬀerence in the team’s output identiﬁes α i − α i ′ ,though not α i and α i ′ separately. Another example is doubles tennis, as studied in Devereux (2018). In theempirical analysis, in robustness checks I will report estimates of nonlinear models that are solely based on2-worker teams. This quasi-diﬀerencing approach is related to Holtz-Eakin et al. (1988) and Chamberlain (1992). µ n = 0 in (4), one can use a similar strategy.To see this, let µ be the vector of µ n ’s. We have E (cid:2) Z ′ ( I − AA † ) D − λ ( Y − µ ) (cid:3) = 0 , (8)where instruments Z can be constructed as functions of collaborations in the network. As anexample, the j -th row of Z may contain some functions of the team sizes of the co-workersin j , in addition to the size n j of team j . Moreover, under assumptions on the covariancematrix of ε , one can add covariance restrictions on λ and µ to the mean restrictions in (8). Iwill not implement these approaches empirically in Sections 5 and 6, and proceed under theassumption that µ n = 0 for all n . I will focus on the following decomposition of output variance, for every team size n :Var n ( Y nj ) | {z } Total variance = λ n n X m =1 Var n (cid:0) α i m ( j ) (cid:1)| {z } Heterogeneity + 2 λ n n X m =1 n X m ′ = m +1 Cov n (cid:0) α i m ( j ) , α i m ′ ( j ) (cid:1)| {z } Sorting + Var n ( ε nj ) | {z } Other factors . (9)In this decomposition, the component labeled “heterogeneity” reﬂects the variation in workereﬀects on output, keeping team composition constant. Equivalently, it represents the eﬀectof worker heterogeneity if the allocation of workers to teams were random, in which casecovariances would be zero. In turn, the component labeled “sorting” reﬂects the variancecontribution due to team composition not being random. To estimate the variance components in (9), I ﬁrst estimate the team-size eﬀects λ n usingan empirical counterpart to the just-identiﬁed system (7). Next, I estimate α using OLSas b α = ( B ′ B ) − B ′ Y, for B = D λ A . Hence, for any variance component V Q = α ′ Qα , where Q is an N × N matrix,I can construct the ﬁxed-eﬀects (FE) estimator b V Q = b α ′ Q b α. (10) It is common in the literature on matched employer-employee data to apply a related decomposition tothe variance of log-wages, as opposed to (log-) output. This raises issues for the interpretation of variancecomponents estimates in terms of heterogeneity and sorting (Eeckhout and Kircher, 2011). In contrast, hereheterogeneity and sorting can be directly interpreted as contributions to production. The additive speciﬁcation (5) in logs can be estimated similarly to the one in levels, the only diﬀerencebeing a slightly diﬀerent approach to estimate µ n . b V Q is biased. To see this, note that E [ b V Q | A, α ] = V Q + E [ ε ′ ( B ′ B ) − B ′ QB ( B ′ B ) − ε | A, α ] | {z } Bias . (11)Following Andrews et al. (2008), I subtract an unbiased estimate of the bias to b V Q . Toconstruct such an estimate, note thatBias = Trace(( B ′ B ) − B ′ QB ( B ′ B ) − Ω) , where Ω is the covariance matrix of ε . To obtain an empirical counterpart of Ω, I assumethat all the ε nj ’s are independent (see Assumption 2), and in addition that Var( ε n | A, α ) = σ n varies only with team size. I estimate σ n as b σ n = Y ′ n ( I n − A n A † n ) Y n Trace( I n − A n A † n ) . (12)The bias correction relies on the ε nj ’s being independent, and on their variance being constantwithin team size. While homoskedasticity could be relaxed by following Kline et al. (2020)and suitably restricting the sets of workers and teams, independence may be empiricallystrong in the applications. Under independence, Kline et al. (2020) derive consistency resultsand limiting distributions for bias-corrected variance components estimates in environmentswhere the numbers of rows (teams) and columns (workers) in A both tend to inﬁnity.To implement the bias correction in large data sets, I rely on a sparse LU decompositionto compute the denominator in (12), and I approximate the trace in the bias formula usingHutchinson’s approximation, as in Gaure (2014) and Kline et al. (2020). I use 1000 drawsin the approximation. I now describe a nonlinear model that allows for complementarity between worker types. Tokeep tractability, I model types α i as discrete, with K points of support. I will vary K inrobustness checks.I adopt a random-eﬀects (RE) strategy that consists in modeling the distribution of types.Speciﬁcally, I suppose that α , ..., α N are drawn from the joint distribution Q Ni =1 π ( α i ), where π ( α i ) are type probabilities. I will ﬁrst describe an independent RE method where { α i : i } are independent of collaborations { ( i ( j ) , ..., i n ( j )) : ( n, j ) } . I will interpret π as a prior on11ndividual types, as in Arellano and Bonhomme (2009). While, by construction, the priorimposes that team formation is independent of types – and thus rules out sorting – theposterior estimates that I will report may still feature sorting (and empirically they will). However, a natural concern is that the prior π may be empirically informative. Forexample, a prior that is independent of collaborations will tend to shrink sorting patternstowards zero. Unlike the FE methods of Section 3, RE methods rely implicitly or explicitlyon a model of team formation, and can be sensitive to that model being misspeciﬁed. Forthis reason, I will also present estimates where I will model the dependence between thetypes α i and the collaborations ( i ( j ) , ..., i n ( j )).With discrete types, a diﬀerent approach would be to treat type membership indicatorsas discrete parameters, and use kmeans clustering methods for estimation (Bonhomme andManresa, 2015, Bonhomme et al. , 2021). Bonhomme et al. (2019) use this grouped ﬁxed-eﬀects approach to incorporate discrete ﬁrm heterogeneity in a nonlinear model of wages. Iwill use a related approach to provide preliminary evidence of sorting and complementarity.However, accurately estimating worker types using grouped ﬁxed-eﬀects requires a relativelylarge number of observations per worker. In the data sets that I analyze in this chapter,many workers produce only a handful of times. For this reason, I instead rely on a REapproach. Nonparametric identiﬁcation of ﬁnite mixture models under independence assumptions akinto Assumption 2 has been extensively studied. When all teams consist of a single worker,the model has independent measures and ﬁnite types, and its identiﬁcation has been studiedin Hall and Zhou (2003), Hu (2008), and Allman et al. (2009), among others. When teamshave multiple workers, the model has a network structure that is related to the settingsstudied in Allman et al. (2011).Here I outline an identiﬁcation argument in the case where { α i : i } are independentof collaborations { ( i ( j ) , ..., i n ( j )) : ( n, j ) } , output is i.i.d. across teams, and team size is n ∈ { , } . Formally, I treat worker types α i as random, and collaborations ( i ( j ) , ..., i n ( j ))as non-stochastic, and assume that the type distribution π is common across workers, inde- In a similar spirit, in matched employer-employee data, Woodcock (2008) points out that posterior REestimates can still indicate the presence of sorting even if the prior on worker and ﬁrm types is independentof mobility patterns. π , as well as thetype-speciﬁc conditional distributions of output.Focusing on workers who produce at least three times on their own gives, for { j , j , j } triplets such that i ( j ) = i ( j ) = i ( j ),Pr [ Y j ≤ y , Y j ≤ y , Y j ≤ y ]= X α π ( α ) Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) , (13)where I have used Assumptions 1 and 2, and that π does not depend on { ( i ( j ) , ..., i n ( j )) :( n, j ) } . System (13) identiﬁes π ( α ) and Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) up to labeling of the latenttypes, under suitable rank conditions (Allman et al. , 2009).Next, focusing now on pairs of workers who produce once on their own and once together,we similarly have, for { j , j , j } triplets such that { i ( j ) , i ( j ) } = { i ( j ) , i ( j ) } ,Pr [ Y j ≤ y , Y j ≤ y , Y j ≤ y ]= X α,α ′ π ( α ) π ( α ′ ) Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) Pr (cid:2) Y j ≤ y | α i ( j ) = α ′ (cid:3) × Pr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) . (14)Given that π ( α ) and Pr (cid:2) Y j ≤ y | α i ( j ) = α (cid:3) are identiﬁed up to labeling, (14) identiﬁesPr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) up to the same labeling, under a suitable rank condition. The argument requires no monotonicity assumptions about how output depends on workertypes. In addition, identiﬁcation can be shown using a related argument when π depends oncollaborations and worker types are not independent of each other within teams, providednone of the joint probabilities π ( α, α ′ ), for pairs of workers who produce once on their ownand once together, are positive.While the previous argument relies on workers producing on their own, identiﬁcation canalso be shown in settings without 1-worker teams; see Allman et al. (2011) for identiﬁcationresults in certain random graph models. As robustness checks, I will report empiricalresults based on 2-worker teams only. Speciﬁcally, it suﬃces that there exist a set { y r } of values in the support of Y j such that the matrixwith (( y r , y r ′ ) , ( α, α ′ ))-element Pr (cid:2) Y j ≤ y r | α i ( j ) = α (cid:3) Pr (cid:2) Y j ≤ y r ′ | α i ( j ) = α ′ (cid:3) has full column rank. As an example, for two workers who collaborate together at least three times, we have, for { j , j , j } .2 Estimation using a variational approach Estimating the random-eﬀects model of team production is challenging. To see this, considera setup where the density f nα i j ) ,...,α in ( j ) ( y ) of Y nj conditional on α i ( j ) , ..., α i n ( j ) is parametric,indexed by a ﬁnite-dimensional vector θ . The RE likelihood L ( θ, π ) = X α ... X α N Y i π ( α i ) Y n Y j f nα i j ) ,...,α in ( j ) ( Y nj ; θ )involves an intractable N -dimensional sum over all possible worker type realizations. Thelikelihood does not factor in simple ways, except in the special case where all teams consistof one worker. To reduce computational complexity, I follow a mean-ﬁeld variational approach that isincreasingly popular in related settings in machine learning and statistics. The idea is tointroduce an auxiliary distribution Q Ni =1 q i ( α i ), and set it to be as close as possible to theposterior density of α , ..., α N . Unlike the posterior density, its variational approximationfactors across i . This makes estimation feasible even in large data sets.Formally, the variational objective function isELBO( θ, π ) = max q ,...,q N ln ( L ( θ, π )) − E q ...q N ln Q i q i ( α i ) p ( α , ..., α N ; θ, π ) | {z } KL divergence , (15)where p ( α , ..., α N ; θ, π ) is the (computationally intractable) posterior density of worker typesgiven θ and π values. The second term on the right-hand side of (15) is the Kullback-Leibler(KL) divergence between the posterior density and its variational approximation. Hence, thevariational objective and the RE likelihood are closer when the variational approximation ismore accurate. The variational objective is a lower bound on the RE likelihood, and it isoften referred to as the “evidence lower bound” (ELBO). triplets such that { i ( j ) , i ( j ) } = { i ( j ) , i ( j ) } = { i ( j ) , i ( j ) } ,Pr [ Y j ≤ y , Y j ≤ y , Y j ≤ y ] = X α,α ′ π ( α ) π ( α ′ ) Pr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) × Pr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) Pr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) , which identiﬁes π ( α ) and Pr (cid:2) Y j ≤ y | α i ( j ) = α, α i ( j ) = α ′ (cid:3) up to labeling of the components, under suit-able rank conditions. In 1-worker teams, the identiﬁcation strategy leads to practical nonparametric estimators; see Bonhomme et al. (2016) for example. This could be used to nonparametrically estimate models of team production,using (14) and its extensions for n >

14o see why adding the KL term can improve tractability, note that we equivalently haveELBO( θ, π ) = max q ,...,q N X n X j E q i j ) ...q in ( j ) ln f α i j ) ,...,α in ( j ) ( Y nj ; θ ) + X i E q i ln π − X i E q i ln q i = max q ,...,q N X n X j X α ... X α n q i ( j ) ( α ) ...q i n ( j ) ( α n ) ln f α ,...,α n ( Y nj ; θ )+ X i X α q i ( α ) ln π ( α ) − X i X α q i ( α ) ln q i ( α ) . (16)Since this expression no longer involves an N -dimensional sum, it can be evaluated easily, atleast when team size n is relatively low. To estimate the nonlinear models, I will maximizethe evidence lower bound using the EM algorithm; see Bishop (2006) and Mariadassou etal. (2010). Speciﬁcally, the algorithm alternates between updates of the q i ( α )’s given ( θ, π )via Newton steps, and updates of ( θ, π ) given the q i ( α )’s via weighted maximum likelihoodestimation.Computation of the evidence lower bound in (16) becomes more complex as team sizeincreases. In the applications, I will restrict the analysis to 1-worker and 2-worker teams. Itwill be important to devise computational strategies to eﬃciently handle larger teams. Statistical properties of variational estimates.

An obvious issue with replacing theRE likelihood by the evidence lower bound is that this leads to optimizing a diﬀerent ob-jective function. Indeed, in the present setting as well as in most network applications, theposterior density p does not factor across workers, whereas the variational density q does.This discrepancy can make the variational estimates inconsistent even when the assumptionsof the RE model hold.The analysis in Bickel et al. (2013) suggests that variational estimates can be theoreticallyjustiﬁed in network models with discrete types (see also Celisse et al. , 2012). Bickel etal. (2013) focus on a standard stochastic blockmodel (e.g., Snijders and Nowicki, 1997)with binary outcomes, where choice probabilities are functions of both agents’ types. Intheir asymptotic environment, expected degree in the network grows at a rate faster thanln N . They ﬁrst show that the RE maximum likelihood estimator (MLE) is asymptoticallyequivalent to an infeasible MLE where the agents’ types are observed. As a result, the Moreover, to estimate models with larger teams, one will need to impose additional structure on howproduction depends on types. A possibility, in the spirit of Ahmadpoor and Jones (2019), would be to modeltype-speciﬁc mean output using a CES function. Related perfect classiﬁcation results have been derived in panel data models under discrete heterogeneity(e.g., Hahn and Moon, 2010, Bonhomme and Manresa, 2015).

15E estimator is consistent and asymptotically normal. They then show that the mean-ﬁeld variational estimator (Daudin et al. , 2008) is asymptotically equivalent to both theinfeasible MLE and the RE MLE, hence that it is consistent with the same asymptoticallynormal distribution.Stochastic blockmodels with binary outcomes are closely related to the model I focus on.However, there are also important diﬀerences, since in my model workers both form teamsand produce, and output is continuous. In addition, the conditions in Bickel et al. (2013) relyon the network becoming suﬃciently dense in large samples. It is unclear whether conditionsakin to the ones they assume provide a good approximation to the settings I study. For thisreason, in the appendix I probe the accuracy of the variational estimates using Monte Carlosimulations. In simulated samples that mimic the data on economic researchers, I ﬁnd thatvariational estimates recover the true parameter values well when the numbers of productionsand collaborations are suﬃciently large.

The assumption that the distribution of worker types α i does not depend on collaborations { ( i ( j ) , ..., i n ( j )) : ( n, j ) } may be restrictive. To give intuition, consider a simple assignmentmodel of team formation, along the lines of the roommate matching problem studied byChiappori et al. (2019). In the model, workers are assigned to teams of size n ∈ { , } in a way that maximizes expected output. Let µ ( α ) = E ( Y j | α i ( j ) = α ) and µ ( α, α ′ ) = E ( Y j | α i ( j ) = α, α i ( j ) = α ′ ). Let T α , T α ∈ N be exogenous “budgets” for type- α workers,indicating the maximum numbers of teams of size 1 and 2 in which they can participate.Consider an allocation solvingmax X α τ α µ ( α ) + X α τ αα µ ( α, α ) + 12 X α X α ′ = α τ αα ′ µ ( α, α ′ ) (17)s.t. τ α , τ αα ′ ∈ N , τ αα ′ = τ α ′ α , τ α ≤ T α , τ αα + X α ′ = α τ αα ′ ≤ T α , where τ α denotes the number of teams with one worker of type α , and τ αα ′ denotes thenumber of teams with one worker of type α and another one of type α ′ .The optimization in (17) is related to the planner’s objective in the Becker (1973) two-sided marriage model. As in the Becker model, in (17) the assignment of individuals to In the Becker model, the optimal allocation can be represented as a stable equilibrium in an economy µ ( α, α ′ ).Intuitively, the higher the degree of complementarity between workers, the higher the assor-tativeness of the optimal allocation. I will illustrate the link between complementarity andassortativeness by computing optimal allocations of economists and inventors in Sections 5and 6. In particular, this simple team assignment model implies that types will generally not be uniformly distributed across teams.To relax the independence assumption between types and collaborations, I will rely ontwo approaches. In the ﬁrst approach, I will maintain prior independence between the α i ’s,however I will let π i ( α i ) depend on worker i ’s characteristics. This correlated random-eﬀectsapproach (Chamberlain, 1984) is widely used in traditional panel data applications. However,in a network of teams, it is not a priori obvious which characteristics α i should depend on. Iwill experiment with features of the degree distributions in the collaboration hypergraph. Inaddition, note that this approach imposes that worker types be conditionally independent ofone another given collaborations, which is unlikely when the assignment of workers to teamsis given by a team formation model such as (17).In the second approach, I will jointly model output production and team formation. Thisadds another estimation step, since one needs to estimate the team formation process as well.Identiﬁcation can be analyzed using related techniques, and the mean-ﬁeld approximationcan be implemented similarly. A speciﬁc feature of the joint model of team formation andproduction is that worker types inﬂuence both the quality and the quantity of output. Forexample, in the case of researchers in economics, the type of a worker will inﬂuence bothhow many articles she publishes (i.e., quantity) and in which journals they appear (i.e.,quality). To implement this approach, I will specify a simple stochastic blockmodel of teamformation. Though of great interest, using an economic team formation model instead wouldraise several challenges, and I do not consider this possibility in this chapter. with transfers. In one-sided problems, existence of a decentralized solution is not guaranteed in general (e.g.,Talman and Yang, 2011). Chiappori et al. (2019) show existence when types are discrete and budget sizesare even. Unlike in their setup, here there are two budgets, for 1-worker and 2-worker teams, respectively.Determining optimal allocations of workers across teams of diﬀerent sizes would require measuring the costsof teamwork – e.g., eﬀort costs – which I do not consider here. In this case there are two kinds of dependent variables: the output variables and the team membershipindicators. See Bonhomme (2017) for an example. Academic production in economics

In this section and the next I present two applications. I start by applying the method tostudy economists and their academic production.

Table 1: Descriptive statistics on economic researchers(a) Sample (b) SubsampleAll n = 1 n = 2 n = 3 All n = 1 n = 2 Notes: Data from Ductor et al. (2014), 1995-1999. Output is a measure of journal quality, net of multiplica-tive year eﬀects (reference year is 1999). Percentiles of the distribution of number of articles per author areindicated in the bottom ﬁve rows. In panel (a) the sample is restricted to at least 5 publications per authorduring the period. In panel (b) the subsample is restricted to collaborations between at most two authors,where all authors have at least 5 publications during 1995-1999. I use the subsample in panel (b) to estimatenonlinear models.

I use data from Ductor et al. (2014). These data, initially constructed by Goyal et al. (2006), are drawn from the EconLit database, a bibliography of journals compiled by theJournal of Economic Literature. I restrict the sample to articles published between 1995 and1999, with at most three co-authors.I only include authors who produced at least ﬁve articles during the period. Whilea natural selection when aiming to infer individual contributions, restricting the numberof publications per author in this way is restrictive. Indeed, starting from 62615 authorsand 89834 articles in the original 1995-1999 data, the selection restricts the sample to 650918uthors and 41150 articles. To assess how representative the results on this sample are, Iwill run checks using larger samples.As measure of academic output, I follow Ductor et al. (2014) and use the journal qualityvariable from Kodrzycki and Yu (2006), as extended by Ductor and co-authors. From theoutput measure I net out multiplicative year ﬁxed-eﬀects, using 1999 as reference year. Ishow descriptive statistics in panel (a) of Table 1 for the main sample. Research output, asmeasured by the journal quality variable, is right-skewed. The number of collaborations perauthor is also skewed, with 50% of authors producing at most 8 articles, and 1% producingat least 39. The subsample in panel (b), which I will use to estimate nonlinear models of1-author and 2-author teams, shows similar patterns.

I start by estimating the additive model of academic production (2). I ﬁrst restrict the setof authors and articles in order to ensure identiﬁcation. This gives 6479 authors and 41049articles. Hence, relative to the sample in panel (a) of Table 1, only 30 authors and 101articles drop out.Using this sample, I ﬁrst estimate the team-size eﬀects λ n . I ﬁnd λ = 0 .

67 and λ = 0 . × . − × . − n = 1 , ,

3. The results suggest that author heterogeneity matters substantially, while playinga smaller role in larger teams. Indeed, the variance share explained by author heterogeneityis 33% for sole-authored articles, 19% in articles with two co-authors, and 14% in articleswith three co-authors. Sorting also contributes a substantial share: 17% in articles with twoco-authors, and 20% in articles with three co-authors. Hence, according to this analysis,both heterogeneity and sorting contribute to the variation in academic output.Another ﬁnding in Table 2 is the magnitude of the component due to other factors, σ n .This component explains 67% of the output variance in articles with one author, 57% inarticles with two co-authors, and 62% in articles with three co-authors. The large residual19able 2: Additive production, economic researchers n = 1 n = 2 n = 3Total variance 95.97 163.21 161.38Heterogeneity 32.53 31.71 22.58Heterogeneity (uncorrected) 42.80 53.26 42.70Sorting - 28.41 32.73Sorting (uncorrected) - 20.67 23.34Other factors 63.86 82.82 86.81Other factors (uncorrected) 53.17 92.26 99.41Team scale λ n Notes: Estimates of variance components in equation (9), for diﬀerent team sizes n . Descriptive statisticson the sample are given in panel (a) of Table 1. variance contributes to the ﬁxed-eﬀects estimates of variance components being substantiallybiased. For example, the ﬁxed-eﬀects estimate of the variance contribution of heterogeneityis biased upward by 33% with one author, 69% with two co-authors, and 91% with three co-authors. This suggests that, as has been documented in matched employer-employee settings(e.g., Bonhomme et al. , 2020), bias correction is needed in order to obtain reliable estimatesof variance components in collaboration networks. Robustness analysis.

The adequacy of the additive model (2) depends on how one mea-sures the output variable. To explore the sensitivity to this choice, I use two alternativemeasures of output. First, I estimate the model in logs, accounting for additive team-sizeeﬀects as in (5). The results are shown in Appendix Table B2, panel (a). Second, I use year-speciﬁc ranks of journal quality as dependent variables, instead of journal quality itself. Theresults are shown in Appendix Table B2, panel (b). While it is reassuring that the varianceshares in the two speciﬁcations are broadly comparable to the baseline, this exercise doesnot fully address the issue of the sensitivity to the choice of output measure. The nonlinearmodel that I estimate in the next section will be less sensitive to this issue. The estimation sample represents a relatively small share of authors and articles duringthe period. To probe the robustness of the ﬁndings to sample deﬁnition, I consider two lessrestrictive rules for inclusion in the sample, requiring every author to produce at least oneor two articles, respectively, as opposed to at least ﬁve in the baseline sample. The results In this application, it would be interesting to link the journal quality variable to economic payoﬀs andcareer outcomes.

Before presenting the results of the nonlinear model, I provide some preliminary evidenceof complementarity, as well as sorting and heterogeneity, in the data. To do so, I selectauthors who produce at least ﬁve articles on their own . This gives a set of 3263 authors.Then, for every author I compute the average output (i.e., average journal quality) amongtheir sole-authored articles, and divide this measure into four quartiles. I call the resultingquartile the “type proxy” of the author. The reason for focusing on individuals with at leastﬁve sole-authored articles is to reduce noise in the proxy. At the same time, this does notfully remove the noise and requires a selected sample.Given this proxy of type, I focus on 2-author teams and report two sets of results, whichI interpret as reﬂecting sorting and complementarity, respectively. In panel (a) of Figure 2,I show the proportions of every type proxy pairs in the sample. The percentages suggestthat higher-type authors, who produce higher-quality articles on their own, tend to worktogether, implying the presence of sorting between authors of similar productivity. In panel(b) of Figure 2, I compute the average output for every type proxy pairs. Authors whoproduce higher-quality articles on their own tend to produce higher-quality output as wellwhen they work in teams, which is consistent with the presence of author heterogeneity. Inaddition, joint output is highest when authors with the highest type proxies (i.e., in thetop quartile of sole-authored production) work together, and the ﬁgure is suggestive of thepresence of complementarity. Since the additive model (2) rules those out, this motivatesestimating a nonlinear model of academic production.21igure 2: Preliminary evidence based on type proxies, economic researchers(a) Sorting (b) Heterogeneity & complementarity t y pe p r o xy au t ho r a v e r age ou t pu t

315 type proxy author 220 3type proxy author 125 2 21 1 . Notes: Subsample of authors who produce at least 5 articles on their own. I compute type proxies as quartilesof average sole-authored output. In panel (a) I show the proportions of type proxies for authors producingtogether in a 2-author team. In panel (b) I show average output (i.e., average journal quality) for diﬀerentcombinations of the type proxies.

Estimates of the nonlinear model, baseline speciﬁcation.

I now report estimates ofa ﬁnite mixture model with K types, varying K between 2 and 6. In this analysis I onlyconsider 1-author and 2-author productions; see panel (b) of Table 1 for summary statisticson the sample. I model the output distribution as a log-normal, with mean and variancethat depend on the types of authors in the team: for 1-author teams mean and variance arefunctions of the type, and for 2-author teams mean and variance are symmetric functionsof the two types. The log-normal speciﬁcation is restrictive, and in future work it will beimportant to check how robust the results are to removing this functional form assumption.I use mean-ﬁeld variational EM for estimation. For conciseness, I only comment in detail on the results for K = 4. In this case, the typeproportions are 15%, 20%, 36%, and 29%. The means of the log-normal output distributionare, in 1-author teams:  . . . .  , I declare convergence when the increment in evidence lower bound is less than 10 − . To alleviate localmaxima issues, I started the variational EM algorithm from diﬀerent sets of parameters. The values thatI obtained upon reaching the tolerance threshold diﬀered somewhat, however they had similar implicationsfor heterogeneity, sorting and complementarity. K = 4)(a) Sorting (b) Heterogeneity & complementarity t y pe au t ho r a v e r age ou t pu t

315 type author 220 3type author 125 2 21 1 . Notes: Random-eﬀects estimates of a ﬁnite mixture model with K = 4 types. In panel (a) I show theproportions of types for authors producing together in a 2-author team. In panel (b) I show average output(i.e., average journal quality) for diﬀerent combinations of the types. Descriptive statistics on the sampleare given in panel (b) of Table 1. and in 2-author teams:  .

59 0 .

84 2 .

44 11 . .

84 0 .

59 1 .

61 4 . .

44 1 .

61 3 .

81 10 . .

27 4 .

07 10 .

12 23 .  . This suggests that authors have heterogeneous productivity levels, and these diﬀerencesaﬀect both 1-author and 2-author productions.To visualize the implications of the estimated model for heterogeneity, sorting and com-plementarity, in Figure 3 I report similar quantities as in Figure 2, except that I now use thetypes estimated under the model instead of the type proxies. Panel (a) of Figure 3 showsthat high-type authors have a stronger propensity to work together, although this evidenceof sorting is less pronounced than in Figure 2.Panel (b) of Figure 3 suggests the presence of complementarity, since the return of twohigh types working together is 1.5 standard deviations above the return of two low or middletypes working together. This suggests gains from complementarity in economic research,which are reﬂected in stronger sorting at the top. In addition, the results show that sole-authored productivity and productivity in teams are not one-to-one: while the lowest typeproduces slightly lower quality output than the second-lower type on her own, she beneﬁtssubstantially more from working with others, in particular with a high-type co-author.23able 3: Nonlinear production, economic researchers K = 2 K = 4 K = 5 K = 6 n = 1 n = 2 n = 1 n = 2 n = 1 n = 2 n = 1 n = 2Total variance 97.81 162.64 97.81 162.64 97.81 162.64 97.81 162.64Heterogeneity 2.18 13.84 35.71 46.85 42.77 51.80 35.52 49.86Sorting - 1.65 - 16.87 - 19.66 - 17.09Nonlinearities - 1.36 - 3.92 - 2.80 - 3.48Other factors 95.63 145.79 62.11 90.00 55.05 88.39 62.29 92.22 Notes: Estimates of variance components for diﬀerent team sizes n in the nonlinear model with K types.Descriptive statistics on the sample are given in panel (b) of Table 1. In addition to documenting the patterns of heterogeneity, sorting and complementarity,the nonlinear model can be used to reﬁne the variance decomposition in (9). Indeed, thenonlinear model accounts for interaction eﬀects between co-authors, over and beyond theadditive eﬀects of author types. This gives a fourth variance component, which I will referto as “nonlinearities”, to report in the decomposition, in addition to the contributions ofheterogeneity, sorting, and other factors. This component can be computed as the diﬀerencebetween the variance of the mean of Y nj given the worker types – which depends on bothadditive and interactive terms in general – and the variance of its best linear approximation.I report the results of the variance decomposition in Table 3, for K = 2 , , ,

6. While K = 2 seems to allow for too little heterogeneity, the results for K = 4 , , K = 4. In this case, when budget sizesare even, an optimal integer-valued allocation can be computed by solving a linear programthat does not impose the integral constraints. The resulting allocation in Figure 4 showssorting at the top of the type distribution, but not at the bottom.24igure 4: Optimal allocation, economic researchers ( K = 4) t y pe au t ho r Notes: Proportions of types for authors producing together in a 2-author team, in the allocation that solves(17). Descriptive statistics on the sample are given in panel (b) of Table 1.

Other speciﬁcations.

I now present the results of three exercises based on other speciﬁ-cations of the nonlinear model. In the baseline speciﬁcation, the distribution of author typesdoes not depend on collaborations. To assess the impact of this assumption, I estimatetwo alternative speciﬁcations. In the ﬁrst speciﬁcation, I model the type distribution as afunction of author characteristics, using a multinomial logit speciﬁcation. As characteristics,I use the numbers of teams in which author i participates, separately for 1-author and 2-author teams, as well as the indicators that these numbers are zero. In panels (1a) and (1b)of Figure 5, I show type-pair proportions and mean output by pairs of types, and in panel(1c) I show the optimal allocation obtained by solving (17). In Appendix Table B3 I showthe variance decomposition estimates. The results are similar to the ones from the baselinespeciﬁcation.In the second speciﬁcation, I augment the team production model with a model ofteam formation, and I jointly estimate the parameters using a variational EM algorithm.I specify team formation using a stochastic blockmodel for hypergraphs, where 1-authorand 2-author collaborations follow independent Poisson distributions, with a parameterthat is type-speciﬁc in the ﬁrst case and pair-of-types-speciﬁc in the second case. In thismodel, as in most models of team formation, the conditional distribution of { α i : i } given { ( i ( j ) , ..., i n ( j )) : ( n, j ) } does not factor across i . In addition, in such a model, authortypes do not only inﬂuence heterogeneity in research quality, but also in quantity. The re-25igure 5: Nonlinear model estimates and optimal allocation, economic researchers, otherspeciﬁcations ( K = 4) 1. Correlated RE(1a) Sorting (1b) Heterogeneity & complementarity (1c) Optimal allocation t y pe au t ho r a v e r age ou t pu t

315 type author 220 3type author 125 2 21 1 00.13 0 0 0.130.03 0 0 0 00.35 0 0 0 00.361 2 3 4type author 11234 t y pe au t ho r

2. Joint RE(2a) Sorting (2b) Heterogeneity & complementarity (2c) Optimal allocation t y pe au t ho r a v e r age ou t pu t

315 type author 220 3type author 125 2 21 1 0 0 00.11 00.25 0 0 0 00.29 0 0.11 0 00.251 2 3 4type author 11234 t y pe au t ho r

3. 2-author only(3a) Sorting (3b) Heterogeneity & complementarity (3c) Optimal allocation t y pe au t ho r a v e r age ou t pu t

330 type author 2 3type author 140 2 21 1 0 0.1 0 0 0.10.25 0 0 0 00.440.06 0 00.06 01 2 3 4type author 11234 t y pe au t ho r Notes: Random-eﬀects estimates of a ﬁnite mixture model with K = 4 types. In panels (a) I show theproportions of types for authors producing together in a 2-author team. In panels (b) I show average output(i.e., average journal quality) for diﬀerent combinations of the types. In panels (c) I show the proportions oftypes for authors producing together in a 2-author team, in the allocation that solves (17). “Correlated RE”comes from a model where the types follow a multinomial logit distribution given the numbers of 1-authorand 2-author collaborations at the extensive and intensive margins. “Joint RE” comes from a model wherecollaborations follow independent type-speciﬁc Poisson distributions. “2-author only” only uses informationfrom 2-author teams. Descriptive statistics on the sample are given in panel (b) of Table 1. In Appendix Table B3 I report vari-ance decomposition estimates. In 2-author teams, heterogeneity accounts for a lower shareof variance compared to the baseline (22% versus 29%), sorting accounts for a larger share(19% versus 10%), and the variance share accounted for by nonlinearities is smaller than inthe baseline.In a last exercise, I report estimates based on 2-author teams only. Hence, sole-authoredproductions are discarded. Panel (3a) of Figure 5 shows a higher degree of sorting in themiddle of the type distribution, rather than at the top. Panel (3b) again shows evidenceof complementarity, although the patterns diﬀer somewhat from the baseline, as shown bythe implied optimal allocation in panel (3c). Lastly, the variance decomposition results inAppendix Table B3 show that, in this case, heterogeneity accounts for 32% of variance, andsorting accounts for 5% (as opposed to 29% and 10% in the baseline).

In this section I apply the method to study patents and inventors.

I use data from Akcigit et al. (2016). Their main source is the disambiguated inventor data ofLi et al. (2014), which identiﬁes unique inventors in the USPTO data. I restrict the analysisto US patents, which were granted between 1995 and 1999. Throughout, I focus on patentsin the technology class “Computers and Communications”. For the output measure, Ifollow Akcigit et al. (2016) and use Hall et al. ’s (2001) measure of patent quality, which is atruncation-adjusted measure of forward citations of the patent. From this measure I net outmultiplicative year ﬁxed-eﬀects, using 1999 as the reference year. In this illustration, I donot use information about the ﬁrms where inventors work, although it would be interesting However, note that the high degree of sorting prevents one from assessing with conﬁdence the gainsfrom collaboration between highest and lowest types (i.e., between type-1 and type-4 authors): while thepoint-estimates in panel (2b) are large, panel (2a) shows that such collaborations are virtually non-existent.This is likely to aﬀect the optimal allocation numbers shown in panel (2c) of Figure 5. Among all inventors who patent in this class, 80% patent only in this class during the period.

27o incorporate this information in future research.Table 4: Descriptive statistics on patents and inventors(a) Sample (b) SubsampleAll n = 1 n = 2 n ≥ n = 1 n = 2 Notes: Sample from Akcigit et al. (2016), where the patent is from the US and granted between 1995 and 1999,and belongs to the class “Computers and Communications”. Output is the truncation-adjusted measure offorward citations of Hall et al. (2001), net of multiplicative year eﬀects (reference year is 1999). Percentilesof the distribution of number of patents per inventor are indicated in the bottom ﬁve rows. In panel (a)the sample is restricted to at least 5 patents per inventor during the period. In panel (b) the subsample isrestricted to collaborations between at most two inventors, where all inventors have at least 5 patents during1995-1999. I use the subsample in panel (b) to estimate nonlinear models.

In panel (a) of Table 4, I provide summary statistics about the sample, where I restrictinventors to participate in at least ﬁve patents during the period. In this setting also, outputand the number of collaborations are right-skewed. The original sample of US patents in theclass “Computers and Communications” that were granted between 1995 and 1999 contains65848 inventors and 62927 patents. Hence, imposing that all inventors be on at least 5patents restricts the sample size substantially. In the sample, team size exhibits a range ofvariation but teams tend to be small: 57% of teams are 1-inventor teams, 26% of teams have2 members, and less than 1% of teams have more than 6 members. In panel (b) of Table 4,I show summary statistics for the subsample of 1-inventor and 2-inventor teams that I willuse to estimate the nonlinear models. 28able 5: Additive production, patents and inventors n = 1 n = 2 n = 3Total variance 1519.3 1450.0 1388.5Heterogeneity 578.2 467.6 261.3Heterogeneity (uncorrected) 782.1 710.7 559.3Sorting - 90.2 169.6Sorting (uncorrected) - -25.9 -12.0Other factors 941.1 733.1 712.8Other factors (uncorrected) 737.2 727.3 792.0Team scale λ n Notes: Estimates of variance components in equation (9), for diﬀerent team sizes n . Descriptive statisticson the sample are given in panel (a) of Table 4. I ﬁrst estimate the additive model (2). Restricting the set of inventors and patents to ensureidentiﬁcation gives 5547 inventors and 29101 patents. I estimate the model by allowing forfour team sizes n in λ n , where n = 4 corresponds to all teams with at least four inventors. Iﬁnd λ = 0 . λ = 0 .

39, and λ = 0 .

29. Hence, keeping inventor type(s) constant, patentswith two inventors are cited 8% more than patents with one inventor, 3-inventor patents arecited 17% more, and patents with at least 4 inventors are cited 16% more.In Table 5, I show the estimates of the variance components in (9), for n = 1 , ,

3. Thevariance share explained by inventor heterogeneity is 38% in patents with one inventor, 32%in 2-inventor patents, and 19% in 3-inventor patents. Sorting contributes 6% of variance in2-inventor patents, and 12% in 3-inventor patents. Here also, the other factors ε nj accountfor the main share of the output variance. Overall, the variance decomposition estimates inthe patent sample are thus not very diﬀerent from those in the sample of economists, with asomewhat larger contribution of worker heterogeneity and a smaller contribution of sorting.In Appendix Table B4, I augment the sample and require every inventor to produce atleast one or two patents, as opposed to at least ﬁve in the baseline sample. In addition, as inall the other robustness checks in this application, using a Poisson regression I net out fromthe output the eﬀects of year and “inventor age”, as measured by the diﬀerence between theyear of observation and the ﬁrst year where the inventor produced a patent. Compared tothe baseline estimates, the results in Appendix Table B4 show that, in these larger samples,inventor ﬁxed-eﬀects are estimated with more noise, and uncorrected variance components29igure 6: Nonlinear model estimates, patents and inventors ( K = 4)(a) Sorting (b) Heterogeneity & complementarity t y pe i n v en t o r a v e r age ou t pu t

380 type inventor 2 3100 type inventor 1120 2 21 1 . Notes: Random-eﬀects estimates of a ﬁnite mixture model with K = 4 types. In panel (a) I show theproportions of types for inventors producing together in a 2-inventor team. In panel (b) I show average output(i.e., average truncation-adjusted forward citations) for diﬀerent combinations of the types. Descriptivestatistics on the sample are given in panel (b) of Table 4. are very large in magnitude. The bias-corrected estimates indicate a larger role of inventorheterogeneity than in the baseline, and a negative sorting contribution, although, given theamount of noise, these ﬁndings should be interpreted with caution. To specify the nonlinear model, I model the output distribution as a negative binomial withmean and variance that depend on the types of inventors in the team. This parametric formallows for a convenient treatment of the zeros in the dependent variable. I now comment indetail on the results for K = 4. In this case, the type proportions are 6%, 25%, 52%, and17%. The means of the negative binomial output distribution are, in 1-inventor teams:  . . . .  , In Appendix Figure B1, I construct type proxies as in Figure 2. I group inventors according to thequartiles of the citations of their sole-authored patents, and restrict the sample to inventors with at least5 patents on their own, so the sample is small, with 1554 inventors. The ﬁgure suggests the presence ofheterogeneity and sorting, although complementarity is less salient than for economists, and sorting seemsmore evenly spread out along the diagonal (compare with Figure 2). K = 2 K = 4 K = 5 K = 6 n = 1 n = 2 n = 1 n = 2 n = 1 n = 2 n = 1 n = 2Total variance 1263.5 1231.7 1263.5 1231.7 1263.5 1231.7 1263.5 1231.7Heterogeneity 237.2 302.9 324.6 366.9 375.2 412.8 390.1 384.4Sorting - 46.6 - 80.1 - 72.7 - 109.9Nonlinearities - 42.2 - 60.9 - 96.7 - 70.2Other factors 1026.4 840.0 938.9 723.7 888.3 649.5 873.4 667.1 Notes: Estimates of variance components for diﬀerent team sizes n in the nonlinear model with K types.Descriptive statistics on the sample are given in panel (b) of Table 4. and in 2-inventor teams:  .

76 6 .

65 23 .

36 13 . .

65 10 .

71 18 .

56 29 . .

36 18 .

56 27 .

30 53 . .

97 29 .

70 53 .

32 112 .  . In Figure 6, I report similar quantities as in Figure 3 to illustrate sorting, heterogeneity,and complementarity in the sample of inventors. Panel (a) shows some evidence of sortingtowards the top, however the pattern is less concentrated than in the sample of economists(compare with panel (a) of Figure 3). Panel (b) of Figure 6 suggests the presence of com-plementarity, since the return to two high-type inventors working together is 2 standarddeviations above the return to two low or middle types working together. This suggestsgains from complementarity in patent production, with somewhat less sorting than in thecase of economists.In Table 6, I show variance decomposition results based on the nonlinear model, for K = 2 , , ,

6. Focusing on the results for K = 4 , ,

6, which are broadly comparable,inventor heterogeneity explains approximately 30% of output variance, and sorting explainsbetween 6% and 9%. Nonlinearities explain a larger share than in the economists’ sample,between 5% and 8% of the variance.In addition, in Figure 7 I report the allocation that maximizes total output according to(17). As in the case of economists, the allocation has perfect assortative matching for thetop type but not for the bottom types. This optimal allocation diﬀers quite substantiallyfrom the estimated allocation shown in panel (a) of Figure 6.31igure 7: Optimal allocation, patents and inventors ( K = 4)

0 00.06 0 00.25 0 0 0.06 00.47 0 0 0 00.161 2 3 4type inventor 11234 t y pe i n v en t o r Notes: Proportions of types for inventors producing together in a 2-inventor team, in the allocation thatsolves (17). Descriptive statistics on the sample are given in panel (b) of Table 4.

Other speciﬁcations.

In Figure 8, I show estimates based on three alternative spec-iﬁcations: a correlated random-eﬀects estimator, a joint random-eﬀects estimator where1-inventor and 2-inventor collaborations follow independent Poisson distributions with type-speciﬁc parameters, and an estimator that is solely based on 2-inventor collaborations. Inthese speciﬁcations I net out multiplicative inventor-age eﬀects from the output, in additionto year eﬀects. I show the variance decomposition results in Appendix Table B5. As in thecase of economists, I ﬁnd that the correlated RE estimates do not substantially diﬀer fromthe baseline. The optimal allocation in panel (1c) of Figure 8 does diﬀer somewhat from thebaseline (compare with Figure 7), yet both allocations exhibit perfect assortative matchingat the top.In addition, the estimates based on 2-inventor teams are also broadly similar to theones based on both 1-inventor and 2-inventor teams in this sample. One diﬀerence is thatthe group shares are more unbalanced when using only 2-inventor collaborations. Anotherdiﬀerence is that the implied optimal allocation in panel (3c) of Figure 8 exhibits perfectassortative matching along the entire distribution, and not only at the top.However, the joint RE estimates show important diﬀerences compared to the baseline.Note that, as in the corresponding speciﬁcation in the economists’ sample, here inventor typesdo not only reﬂect heterogeneity in the quality of innovation through patent citations, butalso in the quantity of patents produced by an inventor. In 2-inventor teams, heterogeneity,32igure 8: Nonlinear model estimates and optimal allocation, patents and inventors, otherspeciﬁcations ( K = 4) 1. Correlated RE(1a) Sorting (1b) Heterogeneity & complementarity (1c) Optimal allocation t y pe i n v en t o r a v e r age ou t pu t t y pe i n v en t o r

2. Joint RE(2a) Sorting (2b) Heterogeneity & complementarity (2c) Optimal allocation t y pe i n v en t o r a v e r age ou t pu t

380 type inventor 2 3100 type inventor 1120 2 21 1 0 0 0.3 0 00.12 0 0 0.3 00.210.03 0 00.03 01 2 3 4type inventor 11234 t y pe i n v en t o r

3. 2-inventor only(3a) Sorting (3b) Heterogeneity & complementarity (3c) Optimal allocation t y pe i n v en t o r a v e r age ou t pu t t y pe i n v en t o r Notes: Random-eﬀects estimates of a ﬁnite mixture model with K = 4 types. In panels (a) I show theproportions of types for inventors producing together in a 2-inventor team. In panels (b) I show averageoutput (i.e., average truncation-adjusted forward citations) for diﬀerent combinations of the types. In panels(c) I show the proportions of types for inventors producing together in a 2-inventor team, in the allocationthat solves (17). “Correlated RE” comes from a model where the types follow a multinomial logit distributiongiven the numbers of 1-inventor and 2-inventor collaborations at the extensive and intensive margins. “JointRE” comes from a model where collaborations follow independent type-speciﬁc Poisson distributions. “2-inventor only” only uses information from 2-inventor teams. Descriptive statistics on the sample are givenin panel (b) of Table 4. For these results I net out multiplicative year and inventor-age eﬀects from theoutput. K = 4)(a) Sorting (b) Heterogeneity & complementarity t y pe i n v en t o r a v e r age ou t pu t

380 type inventor 2 3100 type inventor 1120 2 21 1 . Notes: The random-eﬀects model with K = 4 inventor types is estimated on the 1995-1999 period. In panel(a) I show the proportions of types, and in panel (b) I show average output (i.e., average truncation-adjustedforward citations) for diﬀerent combinations of the types, for inventors producing together in 2-worker teamsbetween 2000 and 2005. sorting and nonlinearity account for 9%, 8%, and less than 1% of the variance of output,respectively, compared to 30%, 7%, and 5% in the baseline (see Appendix Table B5). Whilethe estimates in panels (2a) and (2b) of Figure 8 show a high degree of sorting, they exhibitless evidence of complementarity compared to the other speciﬁcations. The implied optimalallocation in panel (2c) is also quite diﬀerent in this case. This suggests that the modelingof inventor types and how types aﬀect team formation is important to accurately assess thecontributions of heterogeneity, sorting and complementarity in this sample. How do inventor types perform out of sample?

In a last exercise, I study the pre-dictive performance of inventor types out of sample. To do so, I use the parameters of thebaseline random-eﬀects model with K = 4 types, estimated on the 1995-1999 period. Usingthe estimated variational posterior type probabilities, I then calculate average patent outputproduced by inventors of particular types between 2000 and 2005, and the type proportionsin those collaborations. Figure 9 shows that sorting patterns and productivity diﬀerencesbetween types persist out of sample, and that the higher returns to collaborations betweenhigh inventor types relative to other type combinations persist as well. However, the out-of-sample estimates show less inventor heterogeneity than in the baseline (compare with panel(b) of Figure 6). There may be several explanations for this: posterior type probabilities are34nly based on ﬁve years of observations, truncation issues with patent citations may makethe outcome less informative towards the end of the sample (Akcigit et al. , 2016), and overa 10-year period inventor quality may change. In this chapter I have outlined a measurement framework to assess the contributions of indi-viduals to team output. While an additive production speciﬁcation leads to a tractable esti-mator, it rules out complementarity that appears to be a feature of the samples of economistsand inventors that I study. In the applications, a natural next step will be to relate the latenttypes to characteristics of authors and inventors, such as their national origin or educationbackground.The discrete-type approach that I have implemented is promising, yet it needs to bestudied more. In particular, it will be important to provide formal consistency argumentsand to derive asymptotic distributions for variational estimators in this setting. Spectralclustering methods (e.g., Lei and Rinaldo, 2015) could be possible alternatives to the non-linear random-eﬀects methods that I have described. It will also be important to augmentthe framework to account for the productive eﬀects of team-speciﬁc factors, both observedand latent.The literature on matching and sorting has made substantial progress in one-to-one set-tings (Chiappori and Salani´e, 2016, Chade et al. , 2017). However, less is known aboutmany-to-one and many-to-many sorting environments. Eeckhout and Kircher (2018) modelsorting in large ﬁrms, while abstracting from complementarities between workers. Chadeand Eeckhout (2018) propose a model of information aggregation in teams that has tightimplications for sorting. Moreover, it is well-known that models with general complemen-tarities may not have equilibria (Kelso and Crawford, 1982). Addressing these challenges,and combining the quantitative framework of team production that I have introduced, witheconomic models of team formation and eﬀort allocation (such as those recently proposedby Hsieh et al. , 2018, and Anderson and Richards-Shubik, 2019), is an interesting avenue forfuture work. A related strand of the literature proposes models of co-authorship networks, see among others Goyal et al. (2004) and Gans and Murray (2014), and the co-author model in Jackson and Wolinsky (1996). eferences [1] Abowd, J. M., and F. Kramarz (1999): “The Analysis of Labor Markets Using MatchedEmployer-Employee Data,” Handbook of Labor Economics , 3, 2629–2710.[2] Abowd, J., F. Kramarz, and D. Margolis (1999): “High Wage Workers and High Wage Firms”,

Econometrica , 67(2), 251–333.[3] Airoldi, E. M., D. M. Blei, S. E. Fienberg, and E. P. Xing (2008): “Mixed Membership StochasticBlockmodels,”

Journal of Machine Learning Research , 9(Sep), 1981–2014.[4] Akcigit, U., S. Baslandze, and S. Stantcheva (2016): “Taxation and the International Mobilityof Inventors,”

American Economic Review , 106(10), 2930–81.[5] Akcigit, U., J. Grigsby, and T. Nicholas (2017): “The Rise of American Ingenuity: Innovationand Inventors of the Golden Age,” (No. w23047). National Bureau of Economic Research.[6] Ahmadpoor, M., and B. F. Jones (2019): “Decoding Team and Individual Impact in Scienceand Invention,”

Proceedings of the National Academy of Sciences , 116(28), 13885–13890.[7] Allman, E. S., C. Matias, and J. A. Rhodes (2009): “Identiﬁability of Parameters in LatentStructure Models with Many Observed Variables,”

Annals of Statistics , 3099–3132.[8] Allman, E. S., C. Matias, and J. A. Rhodes (2011): “Parameter Identiﬁability in a Class ofRandom Graph Mixture Models,”

Journal of Statistical Planning and Inference , 141(5), 1719–1736.[9] Anderson, K. A., and S. Richards-Shubik (2019): “Collaborative Production in Science: AnEmpirical Analysis of Coauthorships in Economics,” CMU and Lehigh University WorkingPaper.[10] Andrews, M. J., L. Gill, T. Schank, and R. Upward (2008): “High Wage Workers and LowWage Firms: Negative Assortative Matching or Limited Mobility Bias?”

Journal of the RoyalStatistical Society: Series A , 171(3), 673–697.[11] Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler (2012): “Estimating Spillovers UsingPanel Data, with an Application to the Classroom,”

Quantitative Economics , 3(3), 421–470.[12] Arcidiacono, P., J. Kinsler, and J. Price (2017): “Productivity Spillovers in Team Production:Evidence from Professional Basketball,”

Journal of Labor Economics , 35(1), 191–225.[13] Arellano, M., and S. Bonhomme (2009): “Robust Priors in Nonlinear Panel Data Models”,

Econometrica , 77, 489–536.[14] Becker, G. S. (1973): “A Theory of Marriage: Part I,”

Journal of Political Economy , 81(4),813–846.[15] Bell, A., R. Chetty, X. Jaravel, N. Petkova, and J. Van Reenen (2019): “Who Becomes anInventor in America? The Importance of Exposure to Innovation,”

The Quarterly Journal ofEconomics , 134(2), 647–713.[16] Bertrand, M., and A. Schoar (2003): “Managing with Style: The Eﬀect of Managers on FirmPolicies,”

The Quarterly Journal of Economics , 118(4), 1169–1208.

17] Bickel, P., D. Choi, X. Chang, and H. Zhang (2013): “Asymptotic Normality of MaximumLikelihood and its Variational Approximation for Stochastic Blockmodels,”

Annals of Statistics ,41(4), 1922–1943.[18] Bishop, C. M. (2006):

Pattern Recognition and Machine Learning . springer.[19] Blei, D. M., A. Kucukelbir, and J. D. McAuliﬀe (2017): “Variational Inference: A Review forStatisticians,”

Journal of the American statistical Association , 112(518), 859–877.[20] Bonhomme, S. (2017): “Econometric Analysis of Bipartite Networks,” to appear in

The Anal-ysis of Network Data , B. Graham and A. de Paula (eds).[21] Bonhomme, S., K. Holzheu, T. Lamadon, E. Manresa, M. Mogstad, and B. Setzler (2020):“How Much Should we Trust Estimates of Firm Eﬀects and Worker Sorting?” (No. w27368).National Bureau of Economic Research.[22] Bonhomme, S., K. Jochmans, and J.M. Robin (2016): “Nonparametric Estimation of FiniteMixtures from Repeated Measurements,”

Journal of the Royal Statistical Society: Series B(Statistical Methodology) , 78(1), 211–229.[23] Bonhomme, S., T. Lamadon, and E. Manresa (2019): “A Distributional Framework forMatched Employer-Employee Data,”

Econometrica , 87(3), 699-739.[24] Bonhomme, S., T. Lamadon, and E. Manresa (2021): “Discretizing Unobserved Heterogene-ity,” unpublished manuscript.[25] Bonhomme, S., and E. Manresa (2015): “Grouped Patterns of Heterogeneity in Panel Data,”

Econometrica , 83(3), 1147–1184.[26] Celisse, A., J. J. Daudin, and L. Pierre (2012): “Consistency of Maximum-Likelihood andVariational Estimators in the Stochastic Block Model,”

Electronic Journal of Statistics , 6,1847–1899.[27] Chade, H., and J. Eeckhout (2018): “Matching Information,”

Theoretical Economics , 13(1),377–414.[28] Chade, H., J. Eeckhout, and L. Smith (2017): “Sorting Through Search and Matching Modelsin Economics,”

Journal of Economic Literature , 55(2), 493–544.[29] Chamberlain, G. (1984): “Panel data”, in

Handbook of Econometrics , 2, 1247–1318.[30] Chamberlain, G. (1992): “Eﬃciency Bounds for Semiparametric Regression”,

Econometrica ,60, 567–596.[31] Chiappori, P. A., and B. Salani´e (2016): “The Econometrics of Matching Models,”

Journal ofEconomic Literature , 54(3), 832–61.[32] Chiappori, P. A., B. Salani´e, and A. Galichon (2019): “On Human Capital and Team Stability,”

Journal of Human Capital , 13(2), 236–259.[33] Daudin, J. J., F. Picard, and S. Robin (2008): “A Mixture Model for Random Graphs,”

Statistics and Computing , 18(2), 173–183.[34] Devereux, K. (2018): “Identifying the Value of Teamwork: Application to Professional Tennis,”No. 14, Working Paper Series.

35] Ductor, L., M. Fafchamps, S. Goyal, and M. J. van der Leij (2014): “Social Networks andResearch Output,”

Review of Economics and Statistics , 96(5), 936–948.[36] Eeckhout, J., and P. Kircher (2011): “Identifying Sorting – In Theory,”

The Review of Eco-nomic Studies , 78(3), 872–906.[37] Eeckhout, J., and P. Kircher (2018): “Assortative Matching with Large Firms,”

Econometrica ,86(1), 85–132.[38] Fafchamps, M., M. J. Van der Leij, and S. Goyal (2010): “Matching and Network Eﬀects,”

Journal of the European Economic Association , 8(1), 203–231.[39] Furman, J. L., and P. Gaule (2013): “A Review of Economic Perspectives on Collaboration inScience,” In

Workshop on Institutional & Organizational Supports for Team Science .[40] Gans, J., and F. Murray (2014): “Markets for Scientiﬁc Attribution,” (No. w20677). NationalBureau of Economic Research.[41] Garicano, L., and E. Rossi-Hansberg (2006): “Organization and Inequality in a KnowledgeEconomy,”

The Quarterly Journal of Economics , 121(4), 1383–1435.[42] Gaure, S. (2014): “Correlation Bias Correction in Two-Way Fixed-Eﬀects Linear Regression,”

Stat , 3, 379–390.[43] Goyal, S., M. J. Van Der Leij, and J. L. Moraga-Gonz´alez (2004): “Economics: An EmergingSmall World,” Tinbergen Institute Discussion Paper, No. 04-001/1.[44] Goyal, S., M. J. Van Der Leij, and J. L. Moraga-Gonz´alez (2006): “Economics: An EmergingSmall World,”

Journal of Political Economy , 114(2), 403–412.[45] Hahn, J., and H. Moon (2010): “Panel Data Models with Finite Number of Multiple Equilib-ria,”

Econometric Theory , 26(3), 863–881.[46] Hall, B. H., A. B. Jaﬀe, and M. Trajtenberg (2001): “The NBER Patent Citation DataFile: Lessons, Insights and Methodological Tools,” (No. w8498). National Bureau of EconomicResearch.[47] Hall, P., and X. H. Zhou (2003): “Nonparametric Estimation of Component Distributions ina Multivariate Mixture,”

Annals of Statistics , 201–224.[48] Herkenhoﬀ, K., J. Lise, G. Menzio, and G. M. Phillips (2018): “Production and Learning inTeams,” (No. w25179). National Bureau of Economic Research.[49] Holtz-Eakin, D., W. Newey, and H. S. Rosen (1988): “Estimating Vector Autoregressions withPanel Data,”

Econometrica , 1371–1395.[50] Hsieh, C. S., M. D. Konig, X. Liu, and C. Zimmermann (2018): “Superstar Economists:Coauthorship Networks and Research Output,” unpublished manuscript.[51] Hu, Y. (2008): “Identiﬁcation and Estimation of Nonlinear Models with Misclassiﬁcation ErrorUsing Instrumental Variables: A General Solution,”

Journal of Econometrics , 144(1), 27–61.[52] Hvattum, L. M. (2019): “A Comprehensive Review of Plus-Minus Ratings for EvaluatingIndividual Players in Team Sports,”

International Journal of Computer Science in Sport , 18(1),1–23.

53] Jackson, M. O., and A. Wolinsky (1996): “A Strategic Model of Social and Economic Net-works,”

Journal of Economic Theory , 71(1), 44–74.[54] Jaravel, X., N. Petkova, and A. Bell (2018): “Team-Speciﬁc Capital and Innovation,”

AmericanEconomic Review , 108(4-5), 1034–73.[55] Jochmans, K., and M. Weidner (2020): “Fixed-Eﬀect Regressions on Network Data,” to appearin

Econometrica .[56] Kelso Jr, A. S., and V. P. Crawford (1982): “Job Matching, Coalition Formation, and GrossSubstitutes,”

Econometrica , 1483–1504.[57] Kline, P., R. Saggio, and M. Solvsten (2020): “Leave-out Estimation of Variance Components,”to appear in

Econometrica .[58] Kodrzycki, Y. K., and P. Yu (2006): “New Approaches to Ranking Economics Journals,”

TheBE Journal of Economic Analysis & Policy , 5(1).[59] Lei, J., and A. Rinaldo (2015): “Consistency of Spectral Clustering in Stochastic Block Mod-els,”

The Annals of Statistics , 43(1), 215–237.[60] Lerner, J., M. Tranmer, J. Mowbray, and M. G. Hancean (2019): “REM BeyondDyads: Relational Hyperevent Models for Multi-Actor Interaction Networks,” arXiv preprintarXiv:1912.07403.[61] Li, G. C., R. Lai, A. D’Amour, D. M. Doolin, Y. Sun, V. I. Torvik, A. Z. Yu, and L. Fleming(2014): “Disambiguation and Co-Authorship Networks of the US Patent Inventor Database(1975-2010),”

Research Policy , 43(6), 941–955.[62] Mariadassou, M., S. Robin, and C. Vacher (2010): “Uncovering Latent Structure in ValuedGraphs: a Variational Approach,”

The Annals of Applied Statistics , 4(2), 715–742.[63] Pearce, J. (2019): “Firm-Inventor Links and the Composition of Innovation,” unpublishedmanuscript.[64] Snijders, T. A., and K. Nowicki (1997): “Estimation and Prediction for Stochastic Blockmodelsfor Graphs with Latent Block Structure,”

Journal of Classiﬁcation , 14(1), 75–100.[65] Talman, D., and Z. Yang (2011): “A Model of Partnership Formation,”

Journal of Mathemat-ical Economics , 47(2), 206–212.[66] Turnbull, K., S. Lunag´omez, C. Nemeth, and E. Airoldi (2019): “Latent Space Representationsof Hypergraphs,” arXiv preprint arXiv:1909.00472.[67] Weidmann, B., and D. J. Deming (2020): “Team Players: How Social Skills Improve GroupPerformance” (No. w27071), National Bureau of Economic Research.[68] Woodcock, S. D. (2008): “Wage Diﬀerentials in the Presence of Unobserved Worker, Firm,and Match Heterogeneity,”

Labour Economics , 15(4), 771–793. PPENDIX

A Monte Carlo simulations

To probe the accuracy of the variational estimator in ﬁnite sample, I run Monte Carlo simulations.I generate models based on two sets of collaborations, both drawn from the data on economicresearchers who work either on their own or with one co-author (see Section 5). For the smallersample I select articles that were published in 1999. Since I impose that all authors have at least5 articles, this gives a small number of authors and teams (about 150 and 900, respectively). Forthe larger sample I select articles that were published between 1998 and 1999. This gives a sampleof about 900 authors and 5000 articles.I ﬁrst consider a log-normal model with K = 2 types, and use K = 2 types in estimation. Thetrue parameters, as well as means and 2.5% and 97.5% quantiles of the variational estimator across500 simulations, are given in panel (a) of Table A1. The results suggest that both the parameters ofsole-authored productions and the type proportions are well recovered in both samples. However,the parameters of 2-author teams are not as well recovered in the smaller sample. In particular,while the true mean output for two high-type workers is 4, the mean Monte Carlo estimate is 3.71.This suggests that the variational approximation, which only matters for 2-author collaborations, isimperfectly accurate in this case. In contrast, estimates in the larger sample are close to unbiased.For example, the mean output for two high-type workers is indistinguishable from the truth. Inaddition, Monte Carlo estimates are quite tightly concentrated around true values. This suggeststhat the variational approximation is accurate in the larger sample.I next consider a log-normal model with K = 4 types, and use K = 4 types in estimation. Inthis case, I set the true parameter values to the parameters of the model estimated on the 1995-2000 sample; see the column labeled “True” in panel (b) of Table A1. While this second simulationdesign is closer to the data, it is also more challenging for the variational estimator. Panel (b) ofTable A1 shows that, in the smaller sample, both the parameters of sole-authored productions andthose of 2-author teams are biased and imprecise. In the larger sample, biases are quite small ingeneral, and estimates are more precise. In the empirical analysis in Section 5, I focus on a muchlarger sample of articles produced between 1995 and 1999. Since I also impose that every authorproduces at least 5 articles, I expect the variational method to be accurate in this case. Developingmethods to assess parameter uncertainty in the variational approach is an important avenue forfuture work. B Additional tables and ﬁgures

Smaller sample Larger sampleTrue Mean p2.5% p97.5% Mean p2.5% p97.5%(a) K = 2 , K = 2Mean type 1 0.00 0.00 -0.05 0.06 0.00 -0.03 0.02Mean type 2 2.00 2.00 1.92 2.07 2.00 1.97 2.03Var. type 1 0.50 0.50 0.43 0.56 0.50 0.47 0.53Var. type 2 0.50 0.50 0.42 0.57 0.50 0.47 0.53Mean type (1,1) 0.00 0.05 -0.49 0.74 0.01 -0.10 0.12Mean type (1,2) 1.00 1.01 0.59 1.42 1.01 0.92 1.10Mean type (2,2) 4.00 3.71 0.78 4.72 4.00 3.85 4.16Var. type (1,1) 0.50 0.46 0.11 0.93 0.50 0.40 0.62Var. type (1,2) 0.50 0.46 0.16 0.84 0.50 0.42 0.59Var. type (2,2) 0.50 0.38 0.00 1.24 0.49 0.35 0.69Prop. type 1 0.60 0.60 0.53 0.67 0.60 0.57 0.63(b) K = 4 , K = 4Mean type 1 -0.52 -0.52 -0.54 -0.50 -0.52 -0.53 -0.52Mean type 2 -0.47 -0.42 -0.56 -0.22 -0.47 -0.52 -0.41Mean type 3 0.07 0.21 -0.09 0.71 0.07 -0.03 0.18Mean type 4 1.62 1.68 1.39 2.01 1.64 1.53 1.75Var. type 1 0.01 0.02 0.01 0.08 0.01 0.01 0.01Var. type 2 0.48 0.64 0.38 1.15 0.48 0.41 0.55Var. type 3 1.74 1.88 1.42 2.44 1.74 1.59 1.89Var. type 4 2.16 2.10 1.65 2.59 2.15 1.96 2.31Mean type (1,1) -0.53 -0.03 -1.48 2.90 -0.52 -0.60 -0.34Mean type (1,2) -0.39 -0.28 -1.83 1.58 -0.38 -0.73 0.06Mean type (2,2) -0.53 -0.01 -1.73 2.17 -0.50 -0.63 -0.08Mean type (1,3) 0.01 0.26 -2.13 2.94 0.02 -0.43 0.52Mean type (2,3) -0.11 0.25 -1.50 2.25 -0.07 -0.42 0.26Mean type (3,3) 0.35 0.83 -1.28 3.23 0.42 -0.13 0.95Mean type (1,4) 1.38 1.07 -0.68 3.34 1.37 0.75 1.95Mean type (2,4) 0.66 0.94 -1.01 3.01 0.67 0.12 1.17Mean type (3,4) 1.36 1.46 -0.59 3.53 1.35 0.96 1.71Mean type (4,4) 2.35 2.02 -0.47 4.19 2.24 1.71 2.75Var. type (1,1) 0.01 0.58 0.00 2.99 0.02 0.00 0.24Var. type (1,2) 0.43 0.55 0.00 2.89 0.47 0.11 1.18Var. type (2,2) 0.01 0.61 0.00 2.89 0.13 0.00 0.98Var. type (1,3) 1.76 0.79 0.00 4.03 1.68 0.89 2.80Var. type (2,3) 1.18 0.83 0.00 3.42 1.25 0.76 1.85Var. type (3,3) 1.98 1.06 0.00 4.84 1.92 1.02 2.94Var. type (1,4) 2.09 1.02 0.00 4.31 1.98 1.13 3.11Var. type (2,4) 1.48 0.88 0.00 3.27 1.43 0.70 2.49Var. type (3,4) 1.91 1.02 0.00 3.60 1.87 1.24 2.60Var. type (4,4) 1.61 0.94 0.00 3.80 1.65 0.77 2.58Prop. type 1 0.15 0.15 0.10 0.22 0.14 0.12 0.17Prop. type 2 0.20 0.26 0.15 0.42 0.20 0.17 0.24Prop. type 3 0.36 0.31 0.13 0.45 0.36 0.31 0.41 Notes: Estimates of the random-eﬀects model with K groups, in data generated according to that model with K groups. 500 simulations. The smaller sample has 156 workers and 896 teams, the larger sample has 921workers and 5447 teams. In panel (a) K = K = 2 , in panel (b) K = K = 4 . n = 1 n = 2 n = 3(a) Log-journal quality as dependent variableTotal variance 2.00 2.41 2.30Heterogeneity 0.64 0.85 1.02Heterogeneity (uncorrected) 0.80 1.33 1.76Sorting - 0.59 0.93Sorting (uncorrected) - 0.38 0.49Other factors 1.15 1.29 1.19Other factors (uncorrected) 1.20 1.12 1.20Team shift µ n λ n ≥ λ n ≥ λ n Notes: Estimates of variance components for diﬀerent team sizes n . In panels (a) and (b) I use log-journalquality and year-speciﬁc ranks of journal quality, respectively, as dependent variables in the baseline sample.In panels (c) and (d) I enlarge the sample to include at most 2 and 1 articles per author, compared to atmost 5 in the baseline sample, using journal quality as the dependent variable. K = 4)Correlated RE Joint RE 2-author only n = 1 n = 2 n = 1 n = 2 n = 2Total variance 97.81 162.64 97.81 162.64 162.64Heterogeneity 36.49 46.71 32.86 35.03 51.97Sorting - 17.04 - 30.90 7.84Nonlinearities - 4.29 - 1.15 2.82Other factors 61.32 94.61 64.96 95.56 100.01 Notes: Estimates of variance components for diﬀerent team sizes n in the nonlinear model with K types,for three speciﬁcations. “Correlated RE” comes from a model where the types follow a multinomial logitdistribution given the numbers of 1-author and 2-author collaborations at the extensive and intensive margins.“Joint RE” comes from a model where collaborations follow independent type-speciﬁc Poisson distributions.“2-author only” only uses information from 2-author teams. Descriptive statistics on the sample are givenin panel (b) of Table 1. Table B4: Additive production, patents and inventors, robustness n = 1 n = 2 n = 3(a) ≥ λ n ≥ λ n Notes: Estimates of variance components for diﬀerent team sizes n . In panels (a) and (b) I enlarge thesample to include at most 2 and 1 patents per inventor, compared to at most 5 in the baseline sample. Forthese results I net out multiplicative year and inventor-age eﬀects from the output. t y pe p r o xy i n v en t o r a v e r age ou t pu t

380 type proxy inventor 2 3100 type proxy inventor 1120 2 21 1 . Notes: Subsample of inventors who produce at least 5 patents on their own. I compute type proxies asquartiles of average sole-authored output. In panel (a) I show the proportions of type proxies for inventorsproducing together in a 2-inventor team. In panel (b) I show average output (i.e., average truncation-adjustedforward citations) for diﬀerent combinations of the type proxies.

Table B5: Nonlinear production, patents and inventors, other speciﬁcations ( K = 4)Correlated RE Joint RE 2-inventor only n = 1 n = 2 n = 1 n = 2 n = 2Total variance 1270.4 1202.0 1270.4 1202.0 1202.0Heterogeneity 355.9 394.8 197.0 103.0 401.3Sorting - 79.7 - 92.5 86.2Nonlinearities - 52.9 - 6.9 47.1Other factors 914.5 674.5 1073.4 999.6 667.3 Notes: Estimates of variance components for diﬀerent team sizes n in the nonlinear model with K types,for three speciﬁcations. “Correlated RE” comes from a model where the types follow a multinomial logitdistribution given the numbers of 1-inventor and 2-inventor collaborations at the extensive and intensivemargins. “Joint RE” comes from a model where collaborations follow independent type-speciﬁc Poissondistributions. “2-inventor only” only uses information from 2-inventor teams. Descriptive statistics on thesample are given in panel (b) of Table 4. For these results I net out multiplicative year and inventor-ageeﬀects from the output.types,for three speciﬁcations. “Correlated RE” comes from a model where the types follow a multinomial logitdistribution given the numbers of 1-inventor and 2-inventor collaborations at the extensive and intensivemargins. “Joint RE” comes from a model where collaborations follow independent type-speciﬁc Poissondistributions. “2-inventor only” only uses information from 2-inventor teams. Descriptive statistics on thesample are given in panel (b) of Table 4. For these results I net out multiplicative year and inventor-ageeﬀects from the output.