[PDF] The relationship between acquaintanceship and coauthorship in scientific collaboration networks

Abstract

This article examines the relationship between acquaintanceship and coauthorship patterns in a multi-disciplinary, multi-institutional, geographically distributed research center. Two social networks are constructed and compared: a network of coauthorship, representing how researchers write articles with one another, and a network of acquaintanceship, representing how those researchers know each other on a personal level, based on their responses to an online survey. Statistical analyses of the topology and community structure of these networks point to the importance of small-scale, local, personal networks predicated upon acquaintanceship for accomplishing collaborative work in scientific communities.

Full PDF

TThe relationship between acquaintanceship andcoauthorship in scientiﬁc collaboration networks

Center for Astrophysics & Institute for Quantitative Social Science, Harvard University

Abstract

This article examines the relationship between acquaintanceship and coau-thorship patterns in a multi-disciplinary, multi-institutional, geographicallydistributed research center. Two social networks are constructed and com-pared: a network of coauthorship, representing how researchers write articleswith one another, and a network of acquaintanceship, representing how thoseresearchers know each other on a personal level, based on their responses toan online survey. Statistical analyses of the topology and community struc-ture of these networks point to the importance of small-scale, local, personalnetworks predicated upon acquaintanceship for accomplishing collaborativework in scientiﬁc communities.

Keywords:

Scientiﬁc collaboration networks, scientiﬁc communities,coauthorship, acquaintanceship, community structure, survey research.

1. Introduction

Numerous way to study scientiﬁc collaboration exist. Scholarly coauthor-ship —the joint publication of scholarly articles—is a widely used indicator ofcollaboration in scientiﬁc circles. In specialized literature, coauthorship net-works have been employed to study collaboration patterns within entire scien-tiﬁc domains, e.g., neuroscience (Braun et al., 2001), biomedicine (Newman,2004), and nanoscience (Schummer, 2004), as well as within more specializedscientiﬁc circles, e.g., the communities of researchers involved in informationvisualization (B¨orner et al., 2005), genetic programming (Tomassini et al.,2007), and environmental wireless sensing (Pepe and Rodriguez, 2010). This is a preprint. Published version can be found at DOI: 10.1002/asi.21629.

Preprint submitted to JASIST November 3, 2018 a r X i v : . [ c s . C Y ] A ug esides coauthorship, studies of scientiﬁc collaboration have also cov-ered the importance of acquaintanceship for scientiﬁc communication, labormanagement and coordination. Studies of this kind employ ethnographicmethods such as direct interviews, participant observations of meetings, andsociometric surveys to explore the role of personal networks in modelingscientiﬁc advancement. Early studies of acquaintanceship in science haveanalyzed the impact of task-based workﬂows (Kraut et al., 1987), face-to-face communication (Lievrouw et al., 1987), and scientiﬁc meetings (Liber-man and Wolf, 1997) on the structure of interpersonal ties and knowledgeﬂows among scientists in various scientiﬁc domains. More recently, scientiﬁcacquaintanceship studies have touched upon emerging cyber-infrastructureinitiatives, by exploring the role of formal and informal personal knowledgeexchange in the context of “virtual science” laboratories (Chin et al., 2002)and multi-institutional interdisciplinary research centers (Hara et al., 2003).While the literature covers many aspects of both coauthorship and ac-quaintanceship in the context of scientiﬁc communities, rarely has their re-lationship been examined in detail. This is because collecting data aboutacquaintanceship and related social traces is a time-consuming, elaborateand error-prone procedure, especially for large scientiﬁc communities. Out-side of science, with growing availability of online corpora containing socialindicators, it is possible to employ large datasets harvested from the web toinfer acquaintanceship ties. Personal connections as variegated as business,friendship and romantic connections are increasingly becoming available on-line and are being used for large-scale social network analyses. Examplesinclude: a comparison of online and oﬄine friendship ties (Tong et al., 2008),an analysis of acquaintanceship ties and geographic coincidence (Crandallet al., 2010), and the structural and temporal evolution of a dating websitecommunity (Holme, 2004). But a similar online footprint of acquaintanceshipties is not available for most scientiﬁc communities.For this reason, coauthorship networks are sometimes used as proxies foracquaintanceship networks. For example, in Who is the best connected sci-entist? (Newman, 2004), Newman constructs a coauthorship network andemploys it as an acquaintanceship network, assuming that “it is probablyfair to say that most people who have written a paper together are gen-uinely acquainted with one another” (Newman, 2004, p. 339). Is this areasonable assumption? Understanding the relationship that exists betweencoauthors and acquaintances is important not only to elucidate the socialcomponent that exists at the basis of modern scientiﬁc collaborations, but2lso to measure the extent by which scholarly coauthorship networks can beused as surrogates for social networks. In traditional research settings, it wasvery likely that coauthors of a scholarly article were acquainted with eachother at a personal level. But in recent times, scientiﬁc research is increas-ingly being conducted in centers which can be extremely large, variegated intheir disciplinary component, and geographically distributed over diﬀerentcities, countries and continents. For example, many ongoing collaborationsin physics and astrophysics are so large that some publications include hun-dreds of authors. For these articles, it is virtually impossible to discern thenature and extent of individual contributions to research, let alone deriveacquaintanceship patterns (Solomon, 2009). Similarly, many modern cyber-infrastructure enterprises, in the form of collaboratories and e-science centers,are so variegated in their disciplinary and geographical conﬁguration thattheir researchers rely predominantly upon computer supported technologiesfor their communication, collaboration and organization (Olson and Olson,2000; Bos et al., 2007; Finholt, 2002). How are coauthorship and acquain-tanceship patterns related to each other in such emerging scientiﬁc settings?This article addresses this question by a comparative network analysis ofcollaboration in a distributed, multi-disciplinary research environment.

2. Study, data and instruments

The scientiﬁc community analyzed here is pivoted around the Centerfor Embedded Networked Sensing (CENS), a National Science FoundationScience and Technology Center established in 2002, involved in the devel-opment and application of sensor network systems to critical scientiﬁc andsocietal pursuits. CENS features many of the characteristics of modern “col-laboratories”: it is multi-institutional, multi-disciplinary, and geographicallydistributed. It includes ﬁve member universities in California: Universityof California, Los Angeles, University of Southern California, University ofCalifornia, Riverside, California Institute of Technology, and University ofCalifornia, Merced. It includes 300 faculty, students, and staﬀ specializedin disparate academic disciplines, ranging from computer science to biologyand environmental science, with additional partners in arts, architecture,and public health. The type of research conducted at CENS spans a widespectrum of disciplines and applications requiring continuous cooperationamong individuals who, otherwise, would probably not interact beyond the3alls of traditional university departments and faculties. CENS features aheadquarter base located at UCLA, yet CENS-related work is conducted indepartments, labs, and remote ﬁeld locations situated near all ﬁve mem-ber institutions as well as at partner institutions in the U.S. and abroad.These institutions (and sometimes even departments) are suﬃciently distantfrom one another to prevent continuous physical interactions among scien-tists: computer-supported communication is at the basis of their collabora-tive work.This variegated institutional, disciplinary, and geographical arrangementmakes CENS a convenient environment to investigate scientiﬁc collaborationin modern research settings. Two manifestations of collaboration amongCENS researchers are investigated. The ﬁrst is coauthorship , deﬁned as thejoint authoring of scholarly artifacts. In a coauthorship network, two individ-uals are linked to one another if they are coauthors. The second manifestationis acquaintanceship , deﬁned as a relationship of personal knowledge charac-terized by mutual face and name recognition. In an acquaintanceship net-work, two individuals are linked if they indicate to recognize and know eachother at a personal level. These two networks are analyzed and compared. Tobegin with, it is interesting to explore the overarching statistical propertiesof the networks of acquaintanceship and coauthorship to understand whetherthey diﬀer in a systematic way. Do they have a similar conﬁguration? Theﬁrst portion of this research is aimed at exploring the mechanisms by whichCENS researchers write papers and make acquaintances with one another.It can be summarized as follows:

RQ1.

What is the topology of the coauthorship and acquain-tanceship networks in distributed, multi-disciplinary research en-vironments? What is the procedure by which researchers formties with their coauthors and acquaintances?The second portion of this study presents a comparative analysis of thecoauthorship and acquaintanceship networks. It is clear that in collabora-tories like CENS, computer supported technologies are enabling new modesof science, scientiﬁc advancement and collaboration. Yet, we still have tofully understand how these emerging forms of remote collaboration functionin the absence of interpersonal knowledge and contact, or similarly, whensuch contact is predominantly mediated via digital networks. It has beenrecently noted, for example, that physical distance between researchers that4ork across organization boundaries is the most likely factor to hinder sci-entiﬁc collaboration and coordination (Cummings and Kiesler, 2005). Whatis the proportion of CENS researchers that produce joint scholarly work buthave never met each other in person? In this part of the study, a compar-ative network analysis is conducted to understand how well communities ofcoauthors and acquaintances are representative of one another. With thesenotions in mind, the second guiding question can be summarized as follows:

RQ2.

Is coauthorship a suitable proxy for acquaintanceship inmulti-institutional, distributed, scientiﬁc communities? How docommunities of coauthors and acquaintances overlap?Findings from this research are presented in the next sections. In theremainder of this section, I discuss the data collection procedures and theinstruments employed in this study.

The coauthorship network of this scientiﬁc community was constructedfrom its bibliographic record, gathered from all scholarly items listed in sevenavailable CENS Annual Reports (2003–2009). A publication database wasassembled, consisting of 608 papers published by a total of 391 unique in-dividuals over a period of ten years (2000–2009). Table 1 summarizes thedistribution of papers analyzed by publication type, and number of authors.Roughly two-thirds of publications analyzed are papers in conference pro-ceedings, while journal articles take up the other third of the volume of pub-lications. The distribution of items per number of authors reveals that abouthalf of all publications are authored by two or three individuals. Author listsrarely exceed six authors.This bibliographic record was employed to construct the coauthorshipnetwork—a network in which nodes represent authors and edges represent theextent of coauthorship activity. The network is weighted and the edge weightsare established by partitioning a set value for every publication. In orderto determine the weights between nodes, i.e., the strength of collaborationamong coauthors, I use a weighting mechanism by which the weight of theedge between nodes i and j is: w ij = (cid:88) k δ ki δ kj n k − , (1)5 able 1: Basic statistics for the collected bibliographic data: paper distribution by publi-cation type, publication year, and number of authors Paper type n = 608 Conference proceedings 400Journal article 189Book chapter 18Book 1

Number of authors n = 608 δ ki is 1 if author i collaborated on paper k (and zero otherwise) and n k is the number of coauthors of paper k . As such, this weighting mecha-nism confers more weight to small and frequent collaborations, based on theassumptions that: i) publications authored by a small number of individu-als involve stronger interpersonal collaboration than multi-authored publica-tions, and ii) authors that have authored multiple papers together know eachother better on average and thus collaborate more strongly than occasionalcoauthors. Using this weighting mechanism, a network of coauthorship isconstructed. It is depicted and analyzed in the next section. A survey instrument was designed and implemented to ask CENS coau-thors to indicate their acquaintances. This study is speciﬁcally aimed atinvestigating a form of personal acquaintanceship that involves name andface recognition. For this reason, surveyed subjects are asked to indicate asacquaintances individuals that a) they have previously met in person, andthat b) they would say “hi” to if they bumped into them. This description6s aimed at discriminating between forms of genuine personal acquaintance-ship, and other forms of acquaintanceship that do not necessarily involve apersonal component, e.g., name cognizance based solely on email correspon-dence.The survey roster consisted of 388 individuals, i.e., all the authors ex-tracted from the bibliographic record (391) minus the individuals for whomno contact information could be found (3). Individuals in the roster wereinvited to take part in the survey via a recruitment letter, sent via electronicmail. The survey instrument was structured in two parts.In the ﬁrst part, respondents are asked to select their acquaintances froma list of all of the individuals in the survey roster. Individuals are identiﬁedby name and a thumbnail picture. In order to aid recognition, individualsare grouped together by department and institutional aﬃliation. Respon-dents select their acquaintances on this page and after submitting the dataare taken to the second part of the survey. In the second part, they arethen asked to indicate the nature and length of the relationship with theiracquaintances. Respondents are prompted with the list of people that theyselected as acquaintances in the previous page. For each individual they in-dicate as acquaintance, two questions are asked: When did you ﬁrst meet? (2001 or earlier, 2002, 2003, 2004, 2005, 2006, 2007, 2008, This year), and

How often are you in touch? (At least once/week, At least once/month,Occasionally, Rarely or never). Screen-shots of the ﬁrst and second parts ofthe questionnaire are included in the Appendix A, Figures A.5 and A.6.A total of 191 responses were collected. Some basic statistics relativeto the data collected via the survey instrument are summarized in Table 2.Nearly half of respondents invited to ﬁll in the survey (49%) participated inthe study. The rest were either not reachable (4%), or did not respond to thesurvey by the end of data collection (47%). About one third of respondents(39%) only completed the ﬁrst part of the survey. The vast majority ofrespondents indicated a number of acquaintances ranging between 5 and 40.Responses to the survey were obtained independently from one anotherand resulted in either reciprocal or non-reciprocal ties, as well as complete orincomplete data. Four diﬀerent scenarios based on the responses obtained aresummarized in Figure 1: (1) complete data, reciprocal tie, (2) complete data,non-reciprocal tie, (3) incomplete data, non-reciprocal tie, and (4) missingdata.The typology presented in Figure 1 includes both reciprocal ties (repre-sented in a directed network by bidirectional edges) and non-reciprocal ties7 able 2: Basic statistics for the collected social survey data.

Survey response n = 388Respondents 191 (49 %)Non-respondents 182 (47 %)Unreachable 15 (4 %) Portion of survey completed n = 191Full survey 116 (61 %)Only part one 75 (39 %) Number of acquaintances n = 1911-5 125-10 2110-20 4720-30 3830-40 1940-50 1250-60 1660-70 970-100 10100+ (maximum is 197) 7 ? (1) Responded

A BBAAA BB (2)(3)(4)

Did not respondNon-reciprocal tieReciprocal tie ? Missing data

Figure 1: Four possible classes of acquaintanceship ties: (1) complete data, reciprocaltie, (2) complete data, non-reciprocal tie, (3) incomplete data, non-reciprocal tie, and (4)missing data. Filled (black) nodes represent individuals who took the survey, while blank(white) nodes depict individuals who did not respond. able 3: A summary of the data collected in the online survey of acquaintanceship. Tie case Data Type of tie (45%)(2) Complete Non-reciprocal (16 %)(3) Incomplete Non-reciprocal (39 %)Totals (100%)(unidirectional edges). For the purpose of this study, network directionalityis not of fundamental importance, mostly because the coauthorship networkis natively undirected and maintaining information about acquaintanceshipreciprocity would not enable additional analyses and comparisons. For thisreason, directed acquaintanceship ties are converted to undirected edges us-ing a an available-case analysis (Little and Rubin, 1990), i.e., by includingall reciprocal and non-reciprocal ties with both complete and partial descrip-tions. Thus, the acquaintanceship network includes ties (1), (2) and (3) ,from Figure 1, broken down as presented in Table 3.In the second part of the survey, respondents were asked to indicate howlong they have known their acquaintances for and how frequently they are intouch with them. Particularly relevant to this discussion is the data regardingthe frequency of communication, that was used to compute edge weight.Table 4 summarizes the data collected in this portion of the study.As shown in Table 4, respondents provided data relative to the frequencyof communication for about three quarters of the total number of acquain-tances indicated. Moreover, a quick analysis of the distribution of responsesreveals that nearly half of all acquaintances (2245 out of 4621) communicaterarely or never. About a third of all ties (1539 out of 4621) are based onoccasional communication. Only about a ﬁfth of all ties relies on frequentcommunications (once a month and once a week). This information is usedto assign a weight to the edges connecting acquaintances. Frequent commu-nication are given a higher weight, based on the assumption that frequentcommunication involves a higher degree of cognizance among individuals, In order to justify the reconstruction of case (2) and (3) ties, two tests were performed.The ﬁrst test involved checking that the population of respondents was not systematicallydiﬀerent from the population at large, i.e., that the distribution of respondents in termsof departmental and institutional aﬃliation matched that of the entire population. Thesecond test involved manually inspecting incomplete data. able 4: A summary of the data collected in the second part of the online survey ofacquaintanceship. How often do you communicate with [name]?

Did not respond (no data available) 1668 (26%)Responded: 4621 (74%) ..........Rarely or never (0.25) ..........Occasionally (0.50) ..........At least once/month (0.75) ..........At least once/week (1.0) .

0; acquaintances communicating at least once a month are assigned witha weight of 0 .

75. Occasional communication is weighted 0 . .

25. All collected data (including par-tial and non-reciprocal data) were documented and used construct weightededges. The resulting weighted network of acquaintanceship is presented anddepicted in the next section.

3. Analysis and Results

Some statistical properties of the constructed coauthorship and acquain-tanceship networks are presented in Table 5. An analysis of these statisticsprovides insights into the collaborative conﬁguration of this scientiﬁc com-munity and its underlying mechanisms.

Table 5: Statistical properties of the studied coauthorship and acquaintanceship networks.

Property Coauthorship AcquaintanceshipNodes, n

391 385Edges, m (cid:96) C r average path length is the aver-age number of steps needed to connect any two nodes in a network. Itsvalue is relatively small across the two networks: it takes on average twoto three steps to connect any two individuals both in the coauthorship andacquaintanceship networks, connoting well-connected networks in which in-formation can transfer easily between nodes. The clustering coeﬃcient , i.e.,the density of cliques in a network, is also similar in both networks and isrelatively low, indicating that both networks are not signiﬁcantly dense. Thefact that both networks exhibit similar clustering coeﬃcients suggests thatthe modalities by which researchers connect do not change signiﬁcantly basedon the platform of collaboration: whether it is scholarly papers or interper-sonal knowledge relationships, researchers form communities that result insimilar topologies. Many real networks, including scientiﬁc coauthorship andacquaintanceship networks, have been observed to be small-world networks(Watts and Strogatz, 1998), i.e., they have low average path length and highclustering coeﬃcient. The networks presented here slightly deviate from thissmall-world model .A ﬁnal statistical property calculated on the CENS networks of coau- For a comparison, consider the coauthorship networks in biology ( (cid:96) = 4 . C = 0 . (cid:96) = 5 . C = 0 .

76) (Barab´asi et al., 2002), andacquaintanceship networks of movie actors ( (cid:96) = 3 .

48 and C = 0 .

78) (Watts and Strogatz,1998) and corporate company directors ( (cid:96) = 4 .

60 and C = 0 .

88) (Davis et al., 2003).

Assortativity , is ameasure of homophily in a network, i.e., it measures the tendency for nodeswith similar characteristics to attach to one another. Assortativity can becalculated for a numer of discrete or categorical features of a network. Forexample, a measure of assortativity by ethnicity in a social network would re-veal how individuals of same ethnicity preferentially attach to one another. Awidely studied measure of assortativity in scientiﬁc networks is degree assor-tativity, i.e., the tendency for scientists to preferentially attach to others withsimilar network degree, where degree is simply the number of edges attachedto a node. Thus, a high assortativity coeﬃcient in a coauthorship network( r →

1) means that proliﬁc authors collaborate preferentially with other pro-liﬁc authors. Inversely, a low assortativity coeﬃcient ( r →

0) indicates thatproliﬁc authors collaborate both with proliﬁc and non-proliﬁc authors, with-out preference. Network studies of scientiﬁc coauthorship and acquaintance-ship have shown moderate degree assortativity coeﬃcients . The near-zeroassortativity coeﬃcients of the coauthorship ( r = 0 . r = − . Examples are the networks of coauthorship in physics ( r = 0 . r = 0 . r = 0 . r = 0 .

208 and r = 0 . able 6: Acquainted and non-acquainted coauthors. Quantity Nodes EdgesAcquainted coauthors 305 (78%) 1 134 (64%)Non-acquainted coauthors 86 (22%) 613 (36%)Total coauthors 391 1 , The social networks studied in this article portray two diﬀerent relation-ships (coauthorship and acquaintanceship) among the same set of individuals(CENS researchers). A simple way to determine how these networks are re-lated to one another is by calculating the portion of coauthorship ties thatare also acquaintanceship ties, and, thus, the number of individuals who areboth coauthors and acquaintances. This breakdown is provided in Table 6.The results in the table show that the vast majority of coauthors (over threequarters) also know each other on a personal level. Also, most coauthoringevents in this scientiﬁc community (about two thirds) were performed withacquaintances.A more precise statistical measure of inter-network comparison can be ob-tained by performing a Quadratic Assignment Procedure (QAP) correlation(Krackhardt, 1988). Given two networks that represent diﬀerent relation-ships (edges) of the same individuals (nodes), QAP methods calculate thePearson’s correlation coeﬃcient on corresponding cells. In the case consid-ered here, a QAP correlation reveals the probability that a coauthorship tieamong two individuals is related to the probability of an acquaintanceshiptie among the same individuals. In other words, are coauthors also likelyto be acquaintances? The observed QAP correlation between the coauthor-ship and acquaintanceship networks was found to be r = 0 .

372 ( p < . . These ﬁndings indicate that in the contextof the medium-sized, inter-institutional, distributed collaboration analyzedhere, coauthors are likely to be acquainted with one another (RQ.2). Thisresult is further discussed in the next section.While the comparative analysis above provides an understanding of howthese networks overlap at a broad level, further analysis is required to under-stand more in detail the actual arrangement of collaboration circles withinthese networks. A study of community structure enables a network to be par-titioned into clusters and comparatively analyze how diﬀerent conﬁgurationsof collaboration patterns overlap with each other. Community structures ,also called structural communities, are “cliquish” groupings of nodes thatare highly connected between them, but poorly connected to other nodes(Girvan and Newman, 2002). In the coauthorship and acquaintanceship net-works, the clusters detected via a structural analysis correspond to commu-nities of collaborating researchers that write papers together and know eachother, respectively. How do these clusters compare with each other?The community detection method used here is the spinglass algorithm(Reichardt and Bornholdt, 2006). This method relies on an analogy betweenthe statistical mechanics of networks and physical spin glass models to de-construct a network into communities. In doing so, it assigns a communitymembership value to each node. Thus, individuals who are in the samestructural community are given the same membership value. The structuralcommunities found in the collaboration networks via the spinglass algorithmare diagrammed in Figures 2 and 3. Each ﬁgure presents a network withnodes colored according to the structural community that they belong to.Each community is represented using a diﬀerent color (or shade). Nodediameter represents the betweenness centrality score of nodes, where morecentral nodes have larger diameters. The histogram associated with each Fig-ure describes the frequency distribution of each community, i.e., the numberof scholars in each identiﬁed structural community.A total of 14 structural communities were found in the coauthorship net-work (Figure 2). The population distribution histogram shows that threesingle communities (with populations 76, 66, and 58) cover about half of theentire coauthorship population. The remaining coauthorship communities QAP correlation tests between the coauthorship network and a randomly generatednetwork of the same size returned r = − .

002 ( p = 0 .

12 13 14 Figure 2: Structural communities in the coauthorship network detected according to thespinglass algorithm. Node color represents structural community membership. Nodediameter represents betweenness centrality. Associated histogram describes the frequencydistribution of each community. are smaller in size (with an average of 15 members). The acquaintanceshipnetwork was partitioned into 8 communities (Figure 3). The population dis-15 Figure 3: Structural communities in the acquaintanceship network detected according tothe spinglass algorithm. Node color represents structural community membership. Nodediameter represents betweenness centrality. Associated histogram describes the frequencydistribution of each community. tribution shows that there are three acquaintanceship communities that arehighly populated (with populations 101, 79, and 62) and the remainder ofthe nodes more or less evenly distributed in the remaining ﬁve communities.The network diagrams of Figures 2 and 3 show the repartition of thenetworks into clusters of collaboration. More analysis is needed to reveal16ow these arrangements of collaboration diﬀer with each other, i.e., how re-searchers group into communities of coauthorship and acquaintanceship. Astatistical comparison to measure community overlap in the networks is per-formed by use of tests of independence that determine whether communitymembership in one network is dependent or independent on membership inthe other network. A contingency table portraying a numerical representa-tion of the overlap between community membership in these two networks ispresented in Table 7. This table displays the association between the coau-thorship and acquaintanceship networks, i.e. between circles of collaborationdepicted in Figures 2 and 3. Columns in this table (the x -axis) list commu-nity membership values in the acquaintanceship network and rows ( y -axis)in the coauthorship network. Table 7: Contingency table presenting the community membership association betweenthe coauthorship (“C”, columns) and acquaintanceship (“A”, rows) networks with totalmembership counts (“T”). (cid:72)(cid:72)(cid:72)(cid:72)(cid:72)

A C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 T1 50 7 37 1 2 2 2 1012 1 23 10 8 8 1 18 3 1 1 743 34 5 1 14 1 1 564 6 2 2 1 1 12 15 6 455 1 16 13 306 18 9 277 4 1 20 1 268 1 1 9 11 T

76 66 58 25 23 22 21 19 16 16 11 9 6 2 370

Analyzing the composition of Table 7, it can be seen that some com-munities overlap nearly perfectly, while others are highly partitioned. Forexample, community D ) whom collaborates sep-arately with diﬀerent groups (e.g., with B and C , with I , J , and K , etc.).It is interesting to note that this community is very marginalized from therest of the coauthorship network: only nodes A , B , and D have connectionswith nodes outside this community (dashed edges). The tightly overlapping17cquaintanceship community is composed of roughly the same nodes, butthe relationships among them are more frequent and dense: nearly every-one is connected to everyone else in this acquaintanceship community. Thisindicates that even though not all members of a coauthoring community col-laborate directly with each other, they are likely to know each other. Also,while the members of this community are separated from the rest of thecoauthorship network (i.e., they only write papers with each other), they areconsiderably more integrated with the social network (dashed lines representsocial relationships with the outside). ABC D FJ K H GI EL M

A B C D FJ K H GI ELM C D FJ K H GI E

Coauthorship network, community

Acquaintanceship networkcommunity

Figure 4: Anecdotal example: a case of overlap of coauthorship and acquaintanceshipcommunities. The inset image in the top right corner shows community n = 11 , m = 16). The inset image at the bottom right cornershows community n = 11 , m = 29). The mainimage shows the overlap between these two communities. While this anecdotal example provides a precise understanding of the18cholarly and social organization of a given community, a signiﬁcance test isneeded to determine statistically the level of independence between commu-nity membership at a broader level. A reliable test of independence whendealing with samples of suﬃciently large size that do not suﬀer from datasparseness is Pearson’s χ (Reiser and Lin, 1999). To overcome data sparse-ness, I directly manipulate data in Table 7 by removing rows and columnswith low frequency counts until less than one ﬁfth of its cell expectations arebelow 5, as suggested by Cochran (1954). The reduced contingency table,subjected to independence test returned a high Pearson’s χ score of 301 . p < . . These results suggest thatcommunities of coauthors and acquaintances overlap very well, i.e., coauthorsof scholarly papers tend to form communities of collaboration that coincidewith communities of personal acquaintances (RQ.2).

4. Discussion

As scientiﬁc research endeavors become larger, more variegated in theirdisciplinary component, and more distributed geographically, the practicesof scientiﬁc collaboration and communication are subject to radical changes.The disciplinary, institutional, and geographical diversity that characterizesemerging research endeavors such as collaboratories and cyberinfrastructureinitiatives is such that researchers necessarily rely on computer supportedtechnologies for their scientiﬁc collaboration, communication, and organiza-tion. Understanding the role that physical interaction and interpersonal com-munication play on distributed scientiﬁc workﬂows of this kind can prove use-ful in guiding, promoting and sustaining scientiﬁc collaboration (Bos et al.,2007).The research presented here elucidates the relationship between coau-thorship and acquaintanceship patterns in a medium-sized distributed, multi-disciplinary collaboratory. The ﬁrst portion of this research aims at unveilingthe topological properties of these networks and the mechanisms by whichthey are generated. Results show that acquaintanceship relations are farmore numerous than coauthorship ties: researchers know each other well A Fisher’s exact test on the same table also returned a p-value below 0 . a) similar in size and b) similar in their multi-disciplinary component. The sizeof the research center is an obvious factor to consider when extending theseﬁndings to other environments. Larger, as well as smaller, research centerswill naturally function on diﬀerent collaboration paradigms. But the combi-nation of social and scholarly assemblages found in a medium-sized center offew hundred researchers is likely to resemble that of CENS. A further factorto consider is the multi-disciplinary constitution of CENS. Naturally, diﬀer-ent scientiﬁc communities function according to diﬀerent authorship norms.But, being intrinsically multi-disciplinary, CENS is a hybrid milieu, wherediscipline-speciﬁc norms converge and blend with one another. Thus, ﬁnd-ings of this research can be extended to medium-sized cyberinfrastructureinitiatives with a multi-disciplinary orientation.Framing this research in the context of cyberinfrastructure studies, animportant result that emerges is that social cohesion is at the basis of sci-entiﬁc collaboration. That is, even though the collaboratory is by deﬁnitionpredicated upon notions of remote collaboration and computer-supported21ommunication, this study shows that social acquaintanceship relations areat the core of collaboration activity. Given the crucial role of social relation-ships for the advancement of scholarly collaboration, one wonders whetherthe cyberinfrastructure vision of a ﬂuid, distributed, multi-sited science, ag-nostic to geographical and physical constraints, can ever be attained. Inthis context, this work reinforces previous recommendations to consider thespatial, social, and human arrangements that drive scientiﬁc advancementand collaboration, and how they diﬀer across diﬀerent disciplines and orga-nizational settings (Olson and Olson, 2000; Cummings and Kiesler, 2005).Bringing this recommendation to the attention of policy makers and fundingagencies has the potential to shape the direction and form of future invest-ments and eﬀorts in cyberinfrastructure.

5. Acknowledgement

This article is an abridged version of a chapter of my doctoral dissertationtitled “Structure and evolution of scientiﬁc collaboration networks in a mod-ern research collaboratory” defended and ﬁled at the University of California,Los Angeles in 2010. The full text of the dissertation is openly available at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1616935 . Sup-plemental material and replication data for the dissertation and this arti-cle, including source code and collected data, a also openly available at http://hdl.handle.net/1902.1/15254 . For the preparation of this arti-cle, I would like to thank my doctoral chair Christine Borgman, as well asMark Hansen, Marko Rodriguez, and all my former colleagues at CENS.

References

Barab´asi, A. L., Jeong, H., Neda, Z., Ravasz, E., Schubert, A., Vicsek, T.,2002. Evolution of the social network of scientiﬁc collaborations. PhysicaA 311 (3-4).B¨orner, K., Dall’Asta, L., Ke, W., Vespignani, A., 2005. Studying the emerg-ing global brain: Analyzing and visualizing the impact of co-authorshipteams: Research articles. Complexity 10 (4), 57–67.22os, N., Zimmerman, A., Olson, J., Yew, J., Yerkie, J., Dahl, E., Olson,G., 2007. From shared databases to communities of practice: A taxonomyof collaboratories. Journal of Computer-Mediated Communication 12 (2),652–672.Braun, T., Glanzel, W., Schubert, A., 2001. Publication and cooperationpatterns of the authors of neuroscience journals. Scientometrics 51 (3),499–510.Castro, R. D., Grossman, J. W., 1999. Famous Trails to Paul Erd¨os. TheMathematical Intelligencer 21.Chin, G., Myers, J., Hoyt, D., 2002. Social networks in the virtual sciencelaboratory. Communications of the ACM 45 (8), 87–92.Cochran, W. G., 1954. Some methods for strengthening the common chi-square tests. Biometrics 10 (4), 417–451.Crandall, D. J., Backstrom, L., Cosley, D., Suri, S., Huttenlocher, D., Klein-berg, J., 12 2010. Inferring social ties from geographic coincidences. Pro-ceedings of the National Academy of Sciences 107 (52), 22436–22441.Cummings, J. N., Kiesler, S., 2005. Collaborative research across disciplinaryand organizational boundaries. Social Studies of Science 35 (5), 703–722.Davis, G. F., Yoo, M., Baker, W. E., 2003. The small world of the americancorporate elite, 1982-2001. Strategic Organization 1 (3), 301–326.Engestr¨om, Y., Engestr¨om, R., V¨ah¨aaho, T., 1999. When the center doesn’thold: The importance of knotworking. In: Chaiklin, S., Hedegaard, M.,Jensen, U. (Eds.), Activity Theory and Social Practice. Aarhus UniversityPress.Finholt, T., 2002. Collaboratories. In: Cronin, B. (Ed.), Annual Review ofInformation Science & Technology. Vol. 36. Information Today, pp. 73–107.Girvan, M., Newman, M. E. J., 2002. Community structure in social andbiological networks. Proceedings of the National Academy of Sciences 99,7821.Grossman, J. W., Ion, P. D. F., 1995. On a portion of the well-known col-laboration graph. Congressus Numerantium 108, 129–131.23ara, N., Solomon, P., Kim, S.-L., Sonnenwald, D. H., 2003. An emergingview of scientiﬁc collaboration: Scientists’ perspectives on collaborationand factors that impact collaboration. Journal of the American Society forInformation Science & Technology 54 (10), 952–965.Holme, P., 2004. Structure and time evolution of an internet dating commu-nity. Social Networks 26 (2), 155–174.Krackhardt, D., Dec. 1988. Predicting with networks: Nonparametric multi-ple regression analysis of dyadic data. Social Networks 10 (4), 359–381.Kraut, R. E., Galegherb, J., Egidoa, C., 1987. Relationships and tasks inscientiﬁc research collaboration. Human-Computer Interaction 3 (1).Liberman, S., Wolf, K. B., 1997. The ﬂow of knowledge: Scientiﬁc contactsin formal meetings. Social Networks 19 (3), 271 –283.Lievrouw, L. A., Rogers, E. M., Lowe, C. U., Nadel, E., 1987. Triangulationas a research strategy for identifying invisible colleges among biomedicalscientists. Social Networks 9, 217–248.Little, R., Rubin, D., 1990. The analysis of social science data with missingvalues. Sociological Methods and Research 18, 292–326.Nardi, B. A., Whittaker, S., Schwarz, H., 2002. Networkers and their activityin intensionalnetworks. Journal of Computer Supported Cooperative Work11 (1-2), 205–242.Newman, M. E. J., 2001. The structure of scientiﬁc collaboration networks.Proceedings of the National Academy of Sciences of the United States ofAmerica 98 (2), 404–409.Newman, M. E. J., 2002. Assortative mixing in networks. Physical ReviewLetters 89 (20).Newman, M. E. J., 2004. Who is the best connected scientist? A studyof scientiﬁc coauthorship networks. In: Ben-Naim, E., Frauenfelder, H.,Toroczkai, Z. (Eds.), Complex Networks. Springer, pp. 337–370.Olson, G. M., Olson, J. S., 2000. Distance matters. Human-Computer Inter-action 15 (2), 139–178. 24epe, A., Rodriguez, M. A., 2010. Collaboration in sensor network research:an in-depth longitudinal analysis of assortative mixing patterns. Sciento-metrics 84 (3).Reichardt, J., Bornholdt, S., 2006. Statistical mechanics of community de-tection. Physical Review E 74 (016110).Reiser, M., Lin, Y., 1999. A goodness-of-ﬁt test for the latent class modelwhen expected frequencies are small. Sociological Methodology 29, 81–111.Schummer, J., 2004. Multidisciplinarity, interdisciplinarity, and patterns ofresearch collaboration in nanoscience and nanotechnology. Scientometrics59 (3), 425–465.Solomon, J., 12 2009. Programmers, professors, and parasites: Credit andco-authorship in computer science. Science and Engineering Ethics 15 (4),467–489.Tomassini, M., Luthi, L., Giacobini, M., Langdon, W. B., 2007. The structureof the genetic programming collaboration network. Genetic Programmingand Evolvable Machines 8 (1), 97–103.Tong, S. T., Van Der Heide, B., Langwell, L., Walther, J. B., 2008. Too muchof a good thing? the relationship between number of friends and interper-sonal impressions on Facebook. Journal of Computer-Mediated Communi-cation 13 (3), 531–549.Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of small-world net-works. Nature 393 (6684), 440–442.25 ppendix A. Acquaintanceship survey instrument