Berlin: A Quantitative View of the Structure of Institutional Scientific Collaborations
BBerlin: A Quantitative View of the Structure of InstitutionalScientific Collaborations
Aliakbar Akbaritabar ∗ Abstract
This paper examines the structure of scientific collaborations in a large European metropolitanarea. It aims to identify strategic coalitions among organizations in Berlin as a specific case with highinstitutional and sectoral diversity. By adopting a global, regional and organization based approach weprovide a quantitative, exploratory and macro view of this diversity. We use publications data withat least one organization located in Berlin from 1996-2017. We further investigate four members ofthe Berlin University Alliance (BUA) through their self-represented research profiles comparing it withempirical results of OECD disciplines. Using a bipartite network modeling framework, we are ableto move beyond the uncontested trend towards team science and increasing internationalization. Ourresults show that BUA members shape the structure of scientific collaborations in the region. However,they are not collaborating cohesively in all disciplines. Larger divides exist in some disciplines e.g.,Agricultural Sciences and Humanities. Only Medical and Health Sciences have cohesive intraregionalcollaborations which signals the success of regional cooperation established in 2003. We explain possibleunderlying factors shaping the observed trends and sectoral and intra-regional groupings. A majormethodological contribution of this paper is evaluating coverage and accuracy of different organizationname disambiguation techniques. keywords : Berlin, Internationalization, Co-authorship Network Analysis, Bipartite Community Detec-tion, Disambiguation, Berlin University Alliance
Researchers work for academic and non-academic organizations and firms and use the resources from theseorganizations to carry out scientific work and form scientific collaborations. Coalitions and strategic tiesbetween scientific organizations can be a cause and/or an effect of the way scientists affiliated to themcommunicate with each other. An example of the former is the top-down regional, national or organizationalpolicies that support specific types of collaborations (e.g., COST initiative to foster scientific networkingin Europe). The latter is driven more by the individual motivations of scientists to start bottom-upresearch projects and obtain funding through inter-organizational collaborations with researchers of other(inter)national organizations (e.g., ERC starting, consolidator or advanced grants). We aim to look at theoutcome of scientific collaborations, in form of scientific publications, which is produced through the formerprocess or the latter. By understanding the structure of scientific collaborations between these organizations,we aim to find a proxy to identify possible strategic coalitions among them that in turn could have beeninspired by individual researchers.These strategic coalitions could take different forms and lead to differing set of outputs (Katz & Martin,1997; Laudel, 2002). Here we are focused on co-authorship as one of the main forms and scientific publicationsas the expected output of it. We are aware that co-authorship offers only a reductionist view, but neverthelessit is one of the highly used measures of scientific collaborations. ∗ German Centre for Higher Education Research and Science Studies (DZHW), Berlin, Germany; [email protected];[email protected]; ORCID = 0000-0003-3828-1533 (Corresponding Author) https://erc.europa.eu/ a r X i v : . [ c s . D L ] A ug oreover, these strategic coalitions can be affected by linguistic (Avdeev, 2019), geographical (Katz,1994) and regional proximities (Luukkonen, Persson, & Sivertsen, 1992). In an in-depth review, Small &Adler (2019) presented a diverse array of literature that emphasizes on the effect of space in formation ofsocial ties. Scientific organizations are populated by scientists and science is a social enterprise (Fox, 1983).Thus, it is not counterintuitive to consider scientific collaborations as a form of social tie. Formation of theseties are facilitated or hindered by contextual (Sonnenwald, 2007; Small, 2017 p 154; Akbaritabar, Casnici, &Squazzoni, 2018), social (Smith-Doerr, Alegria, & Sacco, 2017; Akbaritabar & Squazzoni, 2020) and epistemicpreferences of researchers and they can result in denser or instead sparser scientific communities (Akbaritabaret al., 2020).In addition, the increasing trends towards more collaborative work and team science is well-known. Itis claimed that scientific disciplines, even social sciences, are moving towards more intense collaborations andmore internationalization (Wuchty, Jones, & Uzzi, 2007; Araújo, Araújo, Moreira, Herrmann, & Andrade,2017). However, studies have highlighted the differences in national or disciplinary contexts in the rate ofinternationalization (Moed, De Bruin, Nederhof, & Tijssen, 1991; Babchuk, Keith, & Peters, 1999) or differingrates of benefits, in terms of impact, obtained from internationalized collaborations (Glänzel, Schubert, &Czerwon, 1999).It is argued in the literature that scientific and complex economic activities are concentrated in urbanand metropolitan areas. In a large-scale study of the USA, Balland et al. (2020) investigated scientific papers,patents, employment rates and gross domestic product of 353 US metropolitan areas. Their main hypothesiswas that disproportionate spatial concentration increases with complexity of productive activities, whichwas confirmed. They used average number of authors in scientific publications as a proxy of the complexity(due to higher coordination cost of larger scientific teams) and they found that scientific fields with highercomplexities tend to have more urban concentration.In the case of Europe, policies and initiatives are developed aiming at building an “integrated EuropeanResearch Area”. Hoekman, Frenken, & Tijssen (2010) tested whether this objective is achieved. They concludedthat Europe leans towards more integration of previously dominated national contexts. Nevertheless, theyreported prevalence of geographically localized co-authorship with high degrees of difference among disciplinesin the regional, national or Europe oriented tendencies. They found that some fields, e.g., physical sciences andlife sciences are in a more advanced stage of “Europeanization” while other fields, e.g., medicine, engineering,social science and humanities present a more nationally oriented scientific collaboration. Furthermore, theyconcluded that although regional collaborations are still composing high shares of overall collaborations, theeffect of territorial and national borders seem to be decreasing over time and Europe is moving towards amore level field of scientific collaboration.Specific national contexts can present a higher or lower degree of scientific production and internation-alized co-authorship. Therefore it is important to take the national context into account along with thecontinental and regional views. Stahlschmidt, Stephen, & Hinze (2019) presented a view of the science systemof Germany and provided an updated version of some of their findings in Stephen, Stahlschmidt, & Hinze(2020). They found that Germany has a stable rate of scientific production similar to that of OECD countrieswith more established science systems (e.g., the USA, the UK and France), but the growth rate of scientificproduction of Germany is decreasing. Some countries (e.g., China and India) have higher growth rates ofscientific production in recent years. Thus the overall share of Germany from world’s scientific publicationsis decreasing in the recent decades (from 6.3% in 1995 to 4.3% in 2018 based on Web of Science (WOS))(Stephen et al., 2020). As presented in Stahlschmidt et al. (2019), Germany is moving towards higher ratesof international collaborations in most scientific disciplines (from 46% internationalized co-authorships in2007 to 55% in 2017 in Scopus and from 47% in 2007 to 59% in 2017 in WOS). The USA, the UK, France,Switzerland, Italy, the Netherlands, China, Spain, Austria and Australia are the ten countries with the highestshares of co-authorship with Germany in Scopus. In addition, Aman (2016) presented evidence of increasinginternationalization and also higher rates of citations for inter-organizational and international co-authorshipfor the German science system in WOS from 2007 to 2012.There is also studies specifically focused on the Berlin metropolitan region. Rammer, Kinne, & Blind(2020) provide a fine-grained view of the case of Berlin and how a form of selective spatial proximity exists2etween knowledge producing institutions (e.g., universities) and knowledge demanding institutions (e.g.,innovative companies and firms). They used the first wave of panel data curated through Berlin InnovationPanel which surveys enterprises with five or more employees in manufacturing and knowledge-intensiveservices located in Berlin. They reported a micro-geographic scope where innovative firms are surroundedby same-sector firms and they were located closer to universities and research institutes that can signal aselective process. They also found that change in innovative activities of a given firm affects other firms intheir vicinity. It is necessary to note that the sample analyzed in their study was composed of 80% smallfirms and 70% service sector which can affect their conclusions.Abbasiharofteh & Broekel (2020) explored the biotechnology field in Berlin metropolitan region. Theyinvestigated co-patenting, co-authorship and joint R&D collaborations and provided an exhaustive view ofthe temporal changes in the scientific landscape of the region in the specific case of biotechnology. Theyconcluded that eastern and western organizations within Berlin are still not cohesively collaborating witheach other. The “shadow” of the Berlin wall still influences the scientific collaborations of the region.Organizational, regional and continental agreements and strategic coalitions are being developed tosupport higher rates of scientific collaboration among the actors in these contexts. A specific example is Berlin University Alliance (BUA) . BUA was founded in February 2018 between the three main universitiesand one university hospital located in the Berlin metropolitan region i.e., Freie Universität Berlin (FU),
Humboldt-Universität zu Berlin (HU),
Technische Universität Berlin (TU) and
Charité – UniversitätsmedizinBerlin (CH) (Berlin University Alliance, 2018, 2019). BUA claims to be established based on a long lastingrecord of intra-regional collaborations between these institutions. The interaction between these institutionshave started from an era of institutional isolation after the fall of Berlin wall during which these institutesneeded to define and empower their unique identities. Afterwards, first forms of cooperation betweenthese institutes emerged which lead in 2003 to establishment of a shared medical faculty between HU andFU to be located in Charité and allow higher collaborations in medical and natural sciences. There areexamples of competition, mutual definition of exclusive research areas and graduate programs versus closecooperation among BUA members in the past three decades. These are highlighted in the BUA proposal asstrengths of the region. Although the four major institutions of the region have their own unique identityand research profiles (further detail will follow in Data and Methods section), nevertheless, they aim atfostering previous collaboration experiences in a new organizational form, i.e., BUA. We aim to controlwhether these four institutions have a distinctive position in the structure of scientific collaborations formedin the Berlin metropolitan region. Thus, we add specific measures to control whether BUA members formcohesive collaboration ties among themselves. As the research profiles presented later in text advocates, theseinstitutions have overlapping disciplinary focuses. We aim to investigate whether they have prevailing rolesin these overlapping disciplines or whether we can find signs of strategic coalitions among them i.e., cohesiveco-authorship communities.Network analysis can be used to identify the presence of communities in co-authorship networks (Palla,Barabási, & Vicsek, 2007; Leone Sciabolazza, Vacca, Kennelly Okraku, & McCarty, 2017; Akbaritabar etal., 2020). Quantitative models are used to examine if collaboration patterns persist between or within denser areas of the network and in form of specific communities. Looking at the composition of thesecommunities and identifying potential factors contributing to their cohesion helps to explain groupings inscientific collaborations.In lower levels than continental, national or regional frameworks, scientific organizations themselvescould have strategic plans to define their overarching identities and main research focus. This might inspireresearchers in a certain organization to prioritize research in specific disciplines and areas (Blume, Bunders,Leydesdorff, & Whitley, 1987) to signal allegiance with the organization’s designed identity which in turncould penalize selection of innovative research themes (Rijcke, Wouters, Rushforth, Franssen, & Hammarfelt,2016). Collaboration can be affected by the goals set out by funding agencies (Nederhof, 2006; Wagner, Park,& Leydesdorff, 2015). Furthermore, it can be affected by the type of organization (i.e., sector) which partiallydetermines the type of research of an organization or expected outcomes of it.In addition to the themes discussed above, the type of data employed to answer the research questions RQ1 : How collaborative and internationalized is the scientific landscape of the Berlin metropolitanregion?•
RQ2 : Are there disciplinary differences in the rate of collaborative and internationalized scientificwork?•
RQ3 : How regionally or continentally oriented is scientific collaboration in the Berlin metropolitanregion?• RQ4 : How sector oriented is scientific collaboration in the Berlin metropolitan region?•
RQ5 : How does the diversity and level of development of national or regional science systems worldwideaffect the collaborations with the Berlin metropolitan region?•
RQ6 : Is there evidence of strategic coalitions, disciplinary, regional or organizational agreements in thestructure of scientific collaborations in the Berlin metropolitan region?•
RQ7 : Are there specific disciplinary, sectoral, national or continental cohesive subgroups driving thescientific collaborations in the Berlin metropolitan region?•
RQ8 : How influential is the disambiguation of organization names on the structure of scientificcollaborations?Contributions of the current paper is fourfold: 1) we focus on scientific output of the Berlin metropolitanregion and trace the share of collaborative works and identify the share of international collaborations. Weseparate Berlin, Germany, Europe and continental regions worldwide to investigate possible groupings and weintend to move beyond the descriptive and macro view, which advocates for increasing internationalization,by investigating the structure of scientific collaborations in a multi-level framework. 2) we cover six majorOECD scientific disciplines and provide a comparative view of the specificities of these disciplines and weinclude a sectoral view based on the type of organizations. 3) we develop and use multiple organization namedisambiguation techniques and compare their efficiency, coverage and accuracy and 4) we employ a bipartitenetwork modeling and community detection approach and present how it can be useful in co-authorshipnetwork analysis and identification of denser groups collaborating preferably among themselves. Sinceinvestigation on the level of entire organization can cause a high rate of interconnectivity in the network(due to aggregation to organization level and overlooking individual researcher or research group borders),as described in Data and Methods Section, our bipartite modeling approach takes specificities in single
We use Scopus 2018 data from the German Bibliometrics Competence Center (KB) . We extract article , review and conference proceedings documents published from beginning of the database in 1996 till end of2017. To delineate Berlin metropolitan region and to identify the scientific collaborations occurred in theregion, we select only publications that have at least one authoring organization located in Germany andBerlin. Thus, co-authorship here includes Berlin organizations and their collaborators worldwide. Our levelof analysis are scientific organizations (i.e., each affiliation address mentioned in a publication that can beacademic or non-academic organizations or firms where researchers affiliate) and we do not investigate lowerlevels e.g., authors, since our goal is to identify structure of scientific collaborations among organizations.Our data include different meta-data for each publication such as publication year , title , affiliationaddresses , scientific discipline , journal name and document type . We include conference proceedings in additionto articles and reviews since there are technical universities in the sample for which this type of documentis considered influential. We use a mapping of publications to OECD scientific disciplines based on ScopusASJC . We compare the aggregate data with trends of different OECD scientific disciplines i.e., AgriculturalSciences (AS),
Engineering Technology (ET),
Natural Sciences (NS),
Medical and Health Sciences (MHS),
Humanities (H) and
Social Sciences (SS). Please note that some publications might be assigned to multipledisciplines. In the aggregate analysis, we use the first assignment of each publication, but in a single disciplineview, we take publications with any assignment in the given discipline, thus, interdisciplinary publicationsare covered separately in all their assigned disciplines.As described earlier, scientific organizations set goals and define strategic paths to ensure a uniqueresearch profile and identity. In order to have a better understanding of how BUA members introduce theirown research goals and main areas, we use their self-representations in Berlin University Alliance (2019) andBerlin University Alliance (2018). We expect to observe prevailing roles of these institutions in structureof scientific collaborations of the disciplines closer to their areas of focus. FU : “Biomedical Foundations”,“Complex Systems”, “Cultural Dynamics”, “Educational Processes and Results”, “Health and Quality ofLife”, “Human-Environmental Interactions”, “In-Security and Security Research”, “Materials Research” and“Transregional Relations”. HU : “Application-Oriented Mathematics”, “Image Sciences”, “Integrative LifeSciences”, “Integrative Natural Sciences”, “Research on Law and Society”, “Study of Ancient Civilizations”and “Sustainability Research”. CH : “Cardiovascular Research & Metabolism”, “Infection, Immunology &Inflammation”, “Neuroscience”, “Oncology”, “Rare Disease & Genetics” and “Regenerative Therapies”. TU :“Materials, Design and Manufacturing”, “Digital Transformation”, “Energy Systems, Mobility and SustainableResources”, “Urban and Environmental Systems”, “Optic and Photonic Systems” and “Education and HumanHealth”. Except CH which has a focus on MHS and NS, the other three institutions are active in areas closeto major OECD disciplines. The data delivered by Scopus is not perfect. It is prone to error and there is a strong need for disambiguation of organization names (Donner et al., 2019). Without disambiguation, co-authorship networks constructedwill have multiple representations of the same actor and an artificially higher level of (dis)connectivity. Kompetenzzentrum Bibliometrie (KB), http://bibliometrie.info All Science Journal Classification
5e developed two disambiguation techniques (i.e.,
PyString and
Fuzzy matching) and compared theirresults with a previously established technique (i.e., Research Organization Registry (ROR) ), as depicted infigure 1. ROR uses data from Global Research Identifier Database (GRID ) prepared by Digital Science ,ISNI , Crossref and Wikidata .In PyString matching (gray shaded area on the left of figure 1), we standardize organization namesand perform a match with GRID (snapshot of February 17, 2019). We use only the largest entity thatKompetenzzentrum Bibliometrie extracts from the Scopus using the first part of affiliation string before thefirst comma. To match, we used string comparison methods in Python (for simplicity we call it
PyString )which matches whole and subsets of the text strings but does not account for change in the order of words inorganization names. To remove the effect of order of organization name parts, we split the names based onspace (i.e., words) and reorder them alphabetically for both Scopus and GRID entities. We then add countrynames to the end of strings to allow higher precision of matching and reduce the effect of organizationalhomonyms . For those still non-matched, we perform another PyString match with scientific organizationsin Wikidata and for the still missing ones, with Wikidata entities which have geographical coordinates. Welimit the results to the most promising ones using Jaro Winkler distance of more than 0.85 between thetwo matched names. We do this after the PyString match is done, as a control of reliability. We chose thisthreshold based on manual evaluation of match results to have the highest accuracy. At the end and tocomplement the results of PyString procedure, we search organization names in an in-house database whichis previously developed by comparing organization names to Wikidata entities by Rimmert (2018) .In parallel, we compare organization names with GRID by Fuzzy matching the names. This methodtakes differing word order and subsets of the name into account (we standardize the names as before and addcountry). For Fuzzy text matching, we use the
FuzzyWuzzy library in Python. Using fuzz.ratio as scorer,we set a threshold of 80 percent which was chosen based on empirical evaluation on some exemplar cases andproved to give a reliable accuracy (gray shaded area in center of figure 1).In a third attempt, instead of the main organization names used in previous procedures, we used thecomplete string of affiliation addresses delivered by Scopus to disambiguate it with Research OrganizationRegistry (ROR) API (see previous footnote for an example). We obtain further information (i.e., country,geographical coordinates (longitude and latitude) of the main address and type of organization as education , non-profit , company , government , health-care , facility , archive and other). We used the ROR snapshotfrom November 7 th https://ror.org/about International Standard Name Identifier, https://isni.org/ As an example, from this address string delivered by Scopus, “Freie Universität Berlin, Department of education andpsychology, DEU”, KB extracts “Freie Universität Berlin” as the first part which we use for PyString and Fuzzy matchingprocesses and after removing alphanumeric symbols, lowercasing and reordering alphabetically, we add the 3-digit ISO countrycode to the end (e.g., “berlin freie universität DEU”). This in-house data is only accessible through KB infrastructure. https://github.com/seatgeek/fuzzywuzzy Organization that uses its surplus revenue to achieve its goals. Includes charities and other non-government research fundingbodies. Example, the Max Planck Society (grid.4372.2) A building or facility dedicated to research of a specific area, usually contains specialized equipment. Includes telescopes,observatories and particle accelerators. Example: member institutes of the Max Planck Society (e.g., Max Planck Institute forDemographic Research, grid.419511.9) Repository of documents, artifacts, or specimens. Includes libraries and museums that are not part of a university. Example,New York Public Library (grid.429888.7)
We construct bipartite co-authorship networks (Breiger, 1974) using ties between publications and organiza-tions (Katz & Martin, 1997). We treat each single publication as an event where organizations interact toproduce an academic text (Biancani & McFarland, 2013). Studies on co-authorship networks usually use aone-mode projection of these bipartite networks (Newman, 2001a, 2001b). The problem with this projectionis twofold. Different structures in two-mode networks are projected to the same one-mode structure whichcauses an information loss about the underlying structure. Second, the one-mode projection can present anartificially higher density and connectivity due to publications with high number of authors which project tomaximally connected cliques. By adopting methods specifically developed for bipartite networks we are ableto resolve the shortcomings.To identify possible geographical, disciplinary and/or sector based coalitions between scientific organiza-tions, we extract the largest connected component of the network, i.e., giant component, and investigate itfurther. Our aim is to see if there are cohesive subgroups of organizations preferably collaborating among themselves. We investigate the potential underlying factors behind these groupings.To identify communities of co-authorship, we use bipartite community detection by Constant Pottsmodel (CPM). CPM is a specific version of Potts model (Reichardt & Bornholdt, 2004) proposed by Traag,Van Dooren, & Nesterov (2011) as a resolution-limit-free method. It resolves the resolution limit problem inmodularity (Newman, 2004) which can obstruct detection of small communities in large networks (Traag,Waltman, & van Eck, 2019). We use the implementation in the
Leidenalg library in Python. Communitydetection emphasizes the importance of links within communities rather than those between them. CPM usesa resolution parameter γ (i.e., “ constant ” in the name), leading to communities such that the link densitybetween the communities (external density) is lower than γ and the link density within communities (internaldensity) is more than γ . We set different resolution parameters in case of aggregate data (ROR = 3 × − )and scientific disciplines (AS = 7 × − , ET = 6 × − , NS = 6 × − , MHS = 5 × − , H = 4 × − and SS = 6 × − ). We chose these parameters after exploration of the number of communities detected incontrast to the number of organizations and publications included in each bipartite community to arrive at arather consistent distribution. Figure 1 presents different disambiguation techniques used and the coverage of item(publication)-author-organization links (
PyString
Fuzzy
ROR https://github.com/vtraag/leidenalg erlin region scientific organization name matching and disambiguation diagram (** These were results meeting both conditions) PyString Matching (parallelized 24 hours) Matching ROR API (parallelized 45 hours)Fuzzy Matching (parallelized 43 hours)Disambiguated / Geocoded MapYes Orgs: 115,749 = 32.43% Unique orgs: 11,743 = 1 to 9.8 Pubs: 239,390 = 93.18% Item-Author-Org link: 2,961,204 = 66.18% Orgs disambiguated while other methods could not: 2,310 = 1,206 uniqueModify: Clean / Sort / Replace NA / Add Country / StripPyString match GRID? Exclude in further analysis In Rimmert 2018 Wikidata?Yes KB IDs: 980 Wikidata IDs: 961 Pubs: 210,965 No Orgs: 241,169 = 67.57% Pubs: 127,096 = 49.47% Item-Author-Org link: 1,512,895 = 33.81% German Orgs: 51,015 With KB IDs: 427 Pubs involved: 22,814In addition to Length Diff = 0 JW > 0.85? **Yes 99,422 = 27.85%PyString match Wikidata?NoYes 7,062 = 1.98% PyString match Wikidata geo?NO NoYes 8,285 = 2.32% Modify: Clean / Sort / Replace NA / Add Country / StripFuzzy match GRID?Disambiguated / Geocoded Yes Orgs: 194,054 = 54.37% Unique orgs: 17,144 = 1 to 11 Pubs: 184,990 = 72% Item-Author-Org link: 2,522,350 = 56.38% Orgs disambiguated while other methods could not: 31,311 = 8,198 uniqueExclude in further analysisNo Orgs: 162,864 = 45.63% Pubs: 203,340 = 79.15% Item-Author-Org link: 1,951,749 = 43.62% German Orgs: 45,495 With KB IDs: 1,033 Pubs involved: 170,793 Raw Address Full stringMatched in ROR API?Disambiguated / Geocoded Yes Orgs: 227,213 = 63.66% Unique orgs: 14,787 = 1 to 15 Pubs: 233,039 = 90.71% Item-Author-Org link: 3,465,948 = 77.47% Orgs disambiguated while other methods could not: 71,107 = 8,449 uniqueExclude in further analysis No Orgs: 129,705 = 36.34% Pubs: 161,003 = 50.90% Item-Author-Org link: 1,008,151 = 22.53% German Orgs: 41,841 With KB IDs: 797 Pubs involved: 78,224KB cleaned nameStart maching SCP nameSCP orgHas KB name?Normalized org names: 356,918 Pubs: 256,909 Item-Author-Org link: 4,474,099Yes 49.87% No 50.13%
Figure 1: Organization name disambiguation techniques and comparison of coverage and accuracy8able 1: Berlin organizations co-authorship networks using non-disambiguated and disambiguated data (G =giant component)
Metrics Non disambiguated PyString Fuzzy RORN. of connected components 10,269 66 159 100N. of biparitite nodes 613,827 135,057 58,547 133,387N. of biparitite edges 1,083,775 246,704 89,199 246,472% of biparitite nodes in G 95 100 99 100% of biparitite edges in G 98 100 100 100N. of organizations 356,918 5,244 4,978 7,257N. of organizations in G 337,755 5,176 4,809 7,153N. of publications (%) 256,909 129,813 (51%) 53,569 (21%) 126,130 (49%)N. of publications in G 245,203 129,657 53,248 125,949
Table 2: Berlin organizations co-authorship networks in different OECD scientific disciplines (G = giantcomponent, ROR organization name disambiguation)
Metrics AS ET H MHS NS SSN. of connected components 13 56 34 48 55 68N. of biparitite nodes 8,528 33,930 4,835 46,842 89,032 16,045N. of biparitite edges 13,822 57,671 6,195 84,945 170,334 25,336% of biparitite nodes in G 100 100 98 100 100 99% of biparitite edges in G 100 100 99 100 100 100N. of organizations 1,687 2,991 798 3,843 5,970 2,091N. of organizations in G 1,668 2,933 763 3,792 5,910 2,012N. of publications 6,841 30,939 4,037 42,999 83,062 13,954N. of publications in G 6,828 30,851 3,988 42,913 82,989 13,849 observed a high rate of dis-connectivity in the non-disambiguated network (10,269 components) while thiswas extremely reduced through disambiguation techniques (i.e., to 66 in PyString, 159 in Fuzzy and 100 inROR). The share of nodes in the giant component which was initially high in the non-disambiguated network(95%) further increased and covered close to 99% in all cases (numbers in the table are rounded up). In allthese cases, disambiguation shows that many unique organization names delivered by Scopus need to bemerged due to spelling error and name order changes which can affect the networks constructed to a highdegree (see De Stefano, Fuccella, Vitale, & Zaccarin (2013) for a discussion of possible effects). In PyString,the ratio of disambiguated unique organizations to non-disambiguated ones were 1 to 9.8 (in Fuzzy 1 to 11and in ROR 1 to 15). This proves the high influence disambiguation could have on the results (
RQ8 ). Table2 presents the networks in different OECD scientific disciplines using ROR results. Note that the resultswhich follow are based on the 49% of publications that were successfully disambiguated by ROR technique.Each of the disciplines presented in Table 2 covers a different share of connected components observed inthe aggregate data ranging from 13 components in Agricultural Sciences (AS) to 68 in Social Sciences (SS).Natural Sciences (NS) has both the highest number of publications and organizations while Humanities (H)has the smallest number of publications and organizations.
Figure 2 presents the raw and fractional count of publications among different OECD disciplines. Note thatit is based on publications which have at least one collaborator from Berlin metropolitan region and for9rganizations that were successfully disambiguated with ROR technique. Nevertheless, the trends are in-linewith what Stephen et al. (2020) and Stahlschmidt et al. (2019) reported for Germany. Some disciplinesshow higher rates of collaborative work (e.g., see case of NS, blue lines, first and second from top) which isevident in the gap between the lines presenting their raw and fractional counts. In contrast, some disciplinesthat are traditionally known to be less collaborative (Leahey, 2016) present a smaller gap on the plot (e.g.,Humanities). Since the Y axis is on log 10 scale, the figure shows growth rates in raw and fractional countof publications which is tripled in case of Humanities and Social Sciences. But it can be due to highercoverage of Scopus in recent years and not merely increase of publications (see Stahlschmidt et al. (2019) fora discussion). To investigate internationalization of publications, figure 3 presents the single (intra-DEU)versus multiple country publications. Despite the fact that it presents only the 49% of publications whichwere disambiguated, but the trends observed are in line with the case of German science system reportedin Stahlschmidt et al. (2019). It is clear that some disciplines have already reached close to 50% share of internationalization (i.e., NS) driving the aggregate trend of increasing internationalization observed on toppanel of the figure. However, there are other disciplines with still lower than 25% share of internationalcollaborations (i.e., H and SS). In all these disciplines an increasing trend towards further internationalizationis evident (see increasing length of black bars in the figure) except H and SS that do not present a clearincreasing trend and in some years the rate of internationalization decreases. This answers our
RQ1 and
RQ2 signaling a high disciplinary difference in rate of collaborative work and internationalization.Figure 4 presents the internationalization of collaborations between OECD disciplines divided overcontinental and geographical regions worldwide. It also shows the countries where the top five percent ofcollaborators in terms of number of organizations and publications are located (see country labels). Note thatthe scale of X and Y axes are different among panels of figure and they are on log 10 scale. Collaborationsinside Germany prevail the aggregate image in all disciplines. After Germany, the USA, the UK, France, Chinaand Russia dominate the aggregate view with highest number of publications and organizations. However,disciplines have noticeable differences. In AS, H and SS, Germany, the USA and the UK are the countrieswhere most of collaborators and prolific ones are located. In MHS, France is the only country which joins thepreviously mentioned top five percent group. In NS and ET, China and Russia join this group as well andthe image becomes closest to the aggregate level. This further improves our findings for
RQ2 and
RQ3 .To investigate our
RQ4 and
RQ5 , we focus on organization sectors. In total out of 7,257 uniqueorganizations in Berlin sample based on ROR, there were 2,844 from education sector, 1,667 facility, 860healthcare, 587 company, 436 nonprofit, 429 government, 282 Other, 124 archive and 28 not available. Table3 presents the distribution of organizations in different sectors in five countries with highest number oforganizations (i.e., China, Germany, France, the UK and the USA, in alphabetical order). While educationis the sector with the highest number of organizations in four countries, facility has the highest number oforganizations in Germany which can be an artifact of the disambiguation and exclusion of publications withnon-disambiguated organizations. Figure 5 presents the geographical distribution of organizations worldwideseparated by sectors and aggregated in countries. To make the image clearer, we remove countries where noorganizations from a given sector is present and brighter colors show higher number of organizations in agiven country. It is clear that most countries have organizations in education sector. Another evident patternis that more developed countries (e.g., in western Europe, North America and Oceania) have representativesin all sectors which signals the higher sectoral diversity of the science systems of these countries. However,China and India are two specific cases outside of previously mentioned regions with representation in manysectors. The distribution of companies is another interesting observation where many countries do not haveany representatives in contrast to education sector. Figure 6 focuses on Berlin metropolitan region providinga more fine-grained view of the sectoral distribution of organizations and their dense geographical proximity.Note that on this figure, name of organizations with more than 10,000 publications is printed which arethe four BUA members. Berlin presents a science hub with high degree of densely located academic andnon-academic scientific organizations which belong to multiple sectors and based on previous map in figure 5,they collaborate with organizations from many sectors worldwide.10
100 3001,0003,000
Year N u m be r o f P ub li c a t i on s Count Type Fractional ASFractional ETFractional H Fractional MHSFractional NSFractional SS Raw ASRaw ETRaw H Raw MHSRaw NSRaw SS Discipline AS ETH MHSNS SS
Figure 2: Raw and fractional count of Berlin publications by OECD disciplines (1996-2017, Scopus, fractionalcount based on organizations, Y on log scale) 11
Aggregate P e r c en t AS ET NS MHS
Year H Year SS Country Status
Multiple Countries pubs Single Country (DEU)
Figure 3: Share of intra-Germany versus multiple country co-authorship, (Top) aggregate (Bottom) differentdisciplines (1996-2017, Scopus) 12 l ll lll l ll l ll ll l lll ll lll llll ll ll l ll ll lll l ll ll ll lll
CHN DEUFRAGBRRUS USA
10 1,000100,000
Aggregate l llll ll l ll ll l l lll lll ll llll ll llll ll ll llll
DEUGBR USA
1 10 100 1,00010,000 AS l ll lll l ll l ll ll l lll ll lll lll ll ll ll ll lll l ll ll llll CHNDEU FRAGBRRUS USA
1 10 100 1,00010,000 ET l l ll lll l ll l ll ll l lll ll lll lll ll ll l ll ll lll l ll ll l lll CHN DEUFRAGBRRUS USA
10 1,000100,000 N u m be r o f P ub li c a t i on s NS ll llll ll l ll ll l lll ll l ll ll ll lll ll lll l lll l llll DEUFRA GBRUSA
1 10 100 1,00010,000
MHS ll ll lll ll ll l l lll l l ll ll ll lll l ll lll
DEUGBR USA
1 10 1001,000
Number of Organizations H l llll ll l ll ll l l ll ll lll lllllll ll lll ll ll llll DEUGBR USA
1 10 100 1,00010,000
Number of Organizations SS geo_region l East Asia & PacificEurope & Central Asia Latin America & CaribbeanMiddle East & North Africa North AmericaSouth Asia Sub−Saharan Africa
Figure 4: Number of organizations and co-authorship with Berlin by countries worldwide, aggregated (top)and by discipline (bottom, 1996-2017, Scopus, X and Y on log scale, label: top 5 percent)13able 3: Five countries with highest number of organizations by sector (GRID data based on Berlin sample1996-2017)
Country code Organization sector CountCHN Education 193CHN Facility 77CHN Healthcare 39CHN Government 18CHN Company 7CHN Nonprofit 5CHN Other 3CHN Archive 1DEU Facility 319DEU Education 215DEU Company 205DEU Healthcare 115DEU Nonprofit 113DEU Other 113DEU Government 66DEU Archive 35FRA Facility 261FRA Education 114FRA Healthcare 45FRA Government 35FRA Company 34FRA Other 15FRA Nonprofit 8FRA Archive 4GBR Education 114GBR Healthcare 91GBR Facility 46GBR Company 34GBR Nonprofit 26GBR Government 25GBR Other 16GBR Archive 7USA Education 423USA Healthcare 157USA Company 129USA Facility 110USA Nonprofit 95USA Government 44USA Other 34USA Archive 23 rganization count Organization count
Organization count
Organization count
Organization count
Organization count
Organization count
Organization count
Archive CompanyEducation FacilityGovernment HealthcareNonprofit Other
Figure 5: Countries worldwide collaborating with Berlin region by sector (color: N. of organizations. If a country does not have presence in a sector, itis removed) igure 6: Organizations in Berlin (Colors: Education = red, Nonprofit = yellow, Government = blue, Facility = orange, Healthcare = green, Company= brown, Other = pink, Archive = gray, NA = white, Labels: > 10,000 publication, name, 1000-10,000 publication, count) .3 Structure of institutional scientific collaborations in Berlin We focus now on communities of co-authorship identified from the giant component using bipartite communitydetection (
RQ6 and
RQ7 ). This enables us to go beyond the macro descriptive view presented thus farand investigate the structure of collaborations in single publication level. Note that these are communitiesdetected from the giant component which is connected in itself, however, these communities signal the denserareas of the collaboration network. We are interested to know what could be the underlying factors behindthese higher densities which constructs these cohesive sub-groups. Note also that we tested with a diversearray of resolution parameters as discussed in the Data and methods section. We finally set the resolutionparameters that gave the most consistent number of communities.Figure 7 presents the distribution of communities based on the number of organizations in eachcommunity and aggregate number of publications of all organizations in a given community (each communityis represented with one dot). The clearest observation in this figure is the discipline-based collaborationpatterns among BUA members indicated by the shape and color of dots where green triangles show presenceof one or more BUA member(s) in a given community. In the aggregate view on top, ET, NS and SS, wesee BUA members populating the two most prolific communities. MHS is the only case where all these fourmembers are present in a single community which is due to closer cooperation among BUA members thatwas formed through a shared faculty by HU and FU located in Charité in 2003 and it seems they have beensuccessful in integrating TU in the collaboration structure. In case of AS and H they populate three distinctcommunities which signals a larger divide in the collaboration structure despite the smaller size and lowernumber of players in these two disciplines. It is in-line with the self represented research profiles of the BUAmembers since they aim to have strong research focus in these areas. However, it shows a divide betweenthem that can be bridged by shared research projects or new organizational forms as it was the case in MHS.In ROR, TU is a member of community 1 while FU, CH and HU are members of community 0. In AS, HUand FU are members of community 0 while TU is in community 1 and CH is in community 2. This couldbe due to the fact that TU, being a technical university, pursues more technical and application orientedresearch. But the divide between CH from one side and HU and FU on the other side needs further probesfor underlying causes. In ET, TU is in community 0 while FU, HU and CH are in community 1. In NS andSS, CH, FU and HU are in community 0 and TU is in community 1. In H, HU and CH are in community 0,FU is in community 1 and TU is in community 2. Overall, it seems that the joint cooperation between HUand FU in form of the shared faculty located in CH is paying off in most cases in form of a more cohesivestructure of collaborations. However, in Humanities, where HU, FU and TU have strong research focuses,they have formed distinctive and separated collaboration structures. Of course it can be affected by thedisciplinary properties of Humanities which is closer to the ideal type of “sole investigators” (Leahey, 2016).BUA needs to integrate TU further into the structure of scientific collaborations in the region through sharedprojects or organizational forms. It is clear from aggregate and disciplinary views that not all communities arepopulated with the most prolific organizations (in terms of number of publications). There are communities ofdifferent sizes consisting of organizations with different levels of productivity (e.g., see the difference betweencommunities 0, 1 and 47 on the aggregate view or communities 0, 1 and 2 in AS). Another clear finding whichis in-line with the previously observed number of connected components is that some disciplines present moregroupings (e.g., see the distribution of dots in MHS and NS) while some present a lower rate of groupings(e.g., see H and AS). We now exclude communities with the lowest number of organizations which were notprolific and focus instead on the most interesting communities (those with labels in figure 7). We present aview of the potential underlying factors based on geographical regions, communities of co-authorship, sectorsand productivity which could have lead to the emergence of observed groupings.Figure 8 presents the communities in the aggregate view (i.e., ROR) consisting of more than 30organizations or with more than 25,000 publications (aggregate of all community members) and Table 4provides details on regional and sectoral composition of these selected communities. We chose these parametersto present the most interesting communities with highest productivity in terms of number of publications andthe most distinctive structure of collaborations. Note that, although the most interesting communities arechosen based on number of organizations and raw count of publications, but, the total publications presentedon the bottom panel of figure 8 are fractional counts to better present the share of collaborative works. Weseparate
Berlin and
Germany (DEU) to provide a better comparison of intra / inter -regional collaborations.17
10 1,000100,000
Aggregate
01 23456
1 10 100 1,00010,000 AS
1 10 100 1,00010,000 ET
01 184 71
10 1,000100,000 N u m be r o f P ub li c a t i on s NS
1 10 100 1,00010,000
MHS
10 1001,000
Number of Organizations H
01 2 47 6 8
1 10 100 1,00010,000
Number of Organizations SS Figure 7: Organizations in communities of the giant component vs publication (label: largest and mostprolific communities, color and shape: green triangle includes BUA member(s), X and Y on log scale)18able 4: Composition of the largest and most prolific communities of the giant component by region andsector in ROR (N = community size, P = aggregate publications)
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Oceania Archive Company Education Facility Government Healthcare No sector Nonprofit Other0 43 104,247 6 34 3 1 28 5 1 81 15 36,918 6 6 3 8 6 12 38 9,862 8 6 1 6 17 24 11 2 13 41 6,074 5 6 1 7 22 4 20 11 2 2 26 31 3,193 3 2 1 24 1 1 19 1114 47 1,447 1 6 26 3 3 6 2 2 30 8 1 1 4 125 34 686 1 13 2 1 5 12 1 2 20 7 2 1 130 43 497 4 7 16 1 5 9 1 1 19 13 3 1 3 347 49 624 2 43 4 7 30 9 1 1 155 34 424 23 3 8 5 5 12 2 5 557 38 398 5 15 1 17 1 3 16 12 3 2 158 34 519 4 1 3 26 1 4 8 8 3 9 174 37 356 3 8 1 1 7 17 3 10 7 4 7 5 189 33 322 18 3 1 10 1 2 9 3 4 11 2 2103 46 317 3 1 42 1 11 15 2 12 2 3177 36 170 5 6 2 1 19 1 2 1 3 4 12 4 1 5 6
Selected communities in ROR are 16 in total (see the labels on central stacked bar, i.e., 0, 1, 2, 3, 6, 14, 25, 30,47, 55, 57, 58, 74, 89, 103 and 177). From one hand, top and bottom panels of this figure presents the regional composition of the communities (see left stacked bars with region names) and sector of the organizationsin these communities (see right stacked bars with sector names). On the other hand, top panel shows thesize of these communities in terms of number of organizations (see the width of ribbons outgoing from eachstacked bar) and bottom panel shows the level of productivity in terms of total number of publications byorganizations in each community (see the width of ribbons outgoing from each stacked bar).In the aggregate level in ROR, Europe (excluding Berlin and Germany) has the highest share oforganizations (see length of left stacked bar on top panel) and it is followed by Americas, Asia and Germany(excluding Berlin). However, in productivity, Berlin and Germany overtake the other regions. It is clearthat communities 0 and 1 are more prolific than other communities (see the length of central stacked bar inbottom panel of the figure 8), despite their smaller sizes. Although these 16 communities are composed of aninternational mixture of organizations, the most prolific communities (i.e., 0 and 1) are dominated by Berlin,Germany and other European organizations (compare the multiple ribbons ingoing to each community incentral stacked bar on top panel and how the most prolific ones in bottom panel are dominated with fewribbons and highest length of stacked bars). Furthermore, these two communities include all of the fourBUA members (FU, CH and HU are in community 0 and TU is in community 1). Organizations from othercountries (even those with comparable number of organizations) populate the smaller, less prolific communities.While education and facility have the largest share of organizations and productivity, these communitiespresent a diverse sectoral composition and they are not dominated by any specific sector. Nevertheless, it isinteresting that government organizations have representatives in all these selected communities except 6 (seeTable 4).Table 5 presents the communities in Agricultural Sciences consisting of more than 50 organizations,which are 7 communities in total (i.e., 0, 1, 2, 3, 4, 5 and 6. These communities can be noted on thesub-panel of figure 7 dedicated to AS). Similar to the composition observed in aggregate level (i.e., ROR),these communities are composed of an internationalized mixture of organizations. But, different from what weobserved in aggregate level (i.e., domination of Berlin, Germany and Europe in the most prolific communities),here we observe that highly prolific communities are composed of a mixture of multiple regions. Consideringthese internationalized collaborations and the fact that BUA members are located in communities 0 (HU andFU), 1 (TU) and 2 (CH), we can conclude that each of these highly prolific organizations has their own specificgroup of international collaborators in AS. This is in line with the self-representations of BUA members oftheir research profiles. Community 0 is the largest and most prolific one in AS. While communities 1 and2 have comparable productivity rates, the size of community 2 is about twice that of community 1. Wecannot observe a dominating pattern of sectors in composition or productivity of the communities in AS andeducation, facility and government present relatively similar productivity levels which could be an attributeof AS. It is interesting to see the high density of health-care organizations in community 2 where Charité islocated.Table 6 presents the communities in Engineering Technology consisting of more than 18 organizations19 nknownOceaniaEuropeDEUBerlinAsiaAmericasAfrica 17710389745857554730251463210 OtherNonprofitHealthcareGovernmentFacilityEducationCompanyArchiveUnknown R eg i on C o m m un i t y S e c t o r Organization count in region vs communities vs sector N u m be r o f O r gan i z a t i on s ROR
UnknownOceaniaEuropeDEUBerlinAsiaAmericasAfrica 17710389745857554730251463210 OtherNonprofitHealthcareGovernmentFacilityEducationCompanyArchiveUnknown R eg i on C o m m un i t y S e c t o r Fractional publication count in region vs communities vs sector N u m be r o f P ub li c a t i on s ROR
Figure 8: Organizations by regions vs membership in communities of the giant component aggregated inROR (Top: size of communities, Bottom: productivity)Table 5: Composition of the largest and most prolific communities of the giant component by region andsector in AS (N = community size, P = aggregate publications)
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Oceania Archive Company Education Facility Government Healthcare No sector Nonprofit Other0 296 8,386 32 68 22 12 78 76 8 20 3 187 49 18 7 1 10 11 127 1,464 9 18 28 11 27 30 2 2 5 81 26 6 2 2 52 210 1,498 6 54 35 14 31 65 5 10 99 41 13 39 1 73 114 538 3 22 12 5 11 56 5 2 5 51 32 6 5 9 44 71 238 6 9 4 3 42 7 2 2 37 13 7 4 3 35 53 237 13 13 3 6 18 5 29 13 2 1 2 16 53 168 4 8 2 2 5 32 2 23 12 9 3 2 2
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Oceania Company Education Facility Government Healthcare Nonprofit Other0 7 15,116 5 2 2 51 11 12,457 4 6 1 5 2 1 32 23 4,092 2 6 1 4 9 1 9 9 3 1 14 30 2,059 1 7 1 2 19 1 18 6 2 1 215 25 446 2 3 1 19 1 10 11 1 1 120 31 294 2 2 3 23 1 2 13 7 1 2 4 224 19 163 1 5 1 1 11 2 10 740 19 158 11 1 7 5 8 641 23 147 13 1 2 2 5 2 10 5 1 2 350 20 154 16 4 2 15 2 156 19 158 2 3 13 1 1 8 6 1 1 270 24 129 4 5 14 1 3 14 4 394 19 106 12 2 1 4 15 2 1 1 or those with more than 10,000 publications. These are 13 communities in total (i.e., 0, 1, 2, 4, 15, 20, 24, 40,41, 50, 56, 70 and 94). On the one hand, Europe, Americas, Asia and Germany seem to have the highestshare of organizations populating these communities. On the other hand, education, facility and companiesdominate the sectoral composition. The most prolific communities (i.e, 0 and 1 both with more than 12,000publications) are dominated by Berlin and German organizations and community 1 has only 1 member fromEurope. Community 0 includes TU from BUA members. It can signal the specialization of TU in EngineeringTechnology discipline and a locally oriented structure of collaborations which is in line with what Hoekman etal. (2010) reported about ET in other European countries. Community 1 composed of 11 organizations fromBerlin, Germany and Europe is highly prolific (+12,000 publications) and includes FU, HU and CH fromBUA members which signals the closer collaboration ties between these three members. However, community2 is an interesting case here. It is rather small in size (23 organizations), it has no BUA members and iscomposed of an international group of organizations and is relatively prolific (the third most prolific withclose to +4,000 publications).Table 7 presents the communities in Natural Sciences consisting of more than 25 organizations or thosewith more than 20,000 publications. These criteria are satisfied only by five communities (i.e., 0, 1, 4, 18 and71). Community 0 is the most prolific one (58,586 publications) and only includes 32 members which are allfrom Berlin and Germany with one member from Europe. The second most prolific community with morethan 24,000 publications is community 1 with seven members which are all from Berlin and Germany withone organization from Europe. Community 71 is the largest one with 45 members from Europe (education,facility, government, health-care, non-profit and other), Americas (education, facility and health-care sectors),Asia (education and facility), Africa (facility) and Oceania (facility) without any members from Berlin orGermany. Note that, these communities are detected based on denser areas of the giant component and theydo not necessarily need to include Berlin organizations. However, while being the largest community, it is notprolific compared to other NS communities (553 publications). Community 18 is another interesting casewhich is highly international with 26 members and 968 publications. Africa is only present in community 71and Oceania is only present in communities 18 and 71 i.e., the communities with the lowest productivitylevels. NS is showing a highly prolific community (i.e., 0, 32 members, 3 BUA members, i.e., HU, FU andCH) and a small but still prolific community (i.e., 1, 7 members, includes TU from BUA members) withBUA members. This can signal a divide between these two groups of BUA members which are collaboratingwithin Berlin, Germany and Europe but with less overlapping collaboration ties that can be bridged andfostered through future cooperations.Table 8 presents the communities in Medical and Health Sciences with more than 25 organizations.These are six communities (i.e., 0, 8, 29, 37, 38 and 51). MHS is presenting a highly interesting casewhere community 0 is the most prolific (55,480 publications) and it is populated by only 37 organizationsfrom Berlin and Germany (education, facility, health-care, in both Germany and Berlin. One governmentorganization from Berlin and one company from Germany i.e., excluding Berlin). Community 0 includesall four BUA members and signals a high level of intraregional collaboration in MHS. This shows that thestrategic cooperation between FU and HU to establish a shared MHS faculty in Charité has paid off andthey have been successful in integrating TU and three other organizations from Berlin and 30 organizations21able 7: Composition of the largest and most prolific communities of the giant component by region andsector in NS (N = community size, P = aggregate publications)
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe Oceania Company Education Facility Government Healthcare Nonprofit Other0 32 58,586 6 25 1 1 20 4 1 61 7 24,045 4 2 1 2 4 14 27 4,869 2 6 1 2 16 2 17 6 1 118 26 968 2 16 3 1 3 1 20 1 1 2 271 45 553 1 3 2 38 1 9 19 3 6 4 4
Table 8: Composition of the largest and most prolific communities of the giant component by region andsector in MHS (N = community size, P = aggregate publications)
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Archive Company Education Facility Government Healthcare Nonprofit Other0 37 55,480 7 30 1 23 5 1 78 26 529 2 18 2 4 2 7 4 12 129 34 251 1 1 32 12 9 2 9 237 28 234 4 1 1 21 1 12 1 2 11 1 138 35 180 5 28 2 3 24 5 1 251 29 192 16 4 9 1 2 10 2 1 8 2 3 from Germany. All other communities, whether international or regional have much lower productivity levelscompared to community 0.Table 9 presents the communities in Humanities consisting of more than 30 organizations. These aresix communities (i.e., 0, 1, 2, 3, 4 and 5). Community 0 is composed of a highly international group oforganizations from all regions. Other communities in Humanities are comparably international, however theyare less prolific than community 0. Similar to our expectation, Humanities scholars are mainly affiliatedto organizations in the education sector, but, a highly inter-sectoral composition is observed as well. BUAmembers are located in three most prolific communities i.e., 0 (two BUA members, HU and CH), 1 (FU)and 2 (TU) which signals a rather non-overlapping structure of collaboration which is consisting of highlyinternationalized mixture of organizations. It is in line with high focus of BUA members on distinctiveareas in H mentioned in their self-representations in BUA proposed (Berlin University Alliance, 2018, 2019)and can be remnants of the era where BUA members needed to have mutually exclusive areas of focus andidentities. Note that, as presented in figure 3, Humanities is not that much internationalized and more than75% of all publications are single country ones (i.e., intra-Germany co-authorship). Nevertheless, Humanitiesso far is presenting the closest image to the one observed in Agricultural Sciences where Berlin, Germanyand European organizations do not dominate the prolific communities and we observe an internationalizedmixture or organizations from multiple sectors.Table 10 presents the communities in Social Sciences consisting of more than 20 organizations or thosewith more than 1,500 publications. These are six communities in total (i.e., 0, 1, 2, 6, 8 and 47). SocialSciences are presenting an image far from the one observed in the case of Humanities and closer to naturaland hard sciences. The two most prolific communities (i.e., 0 and 1) are completely dominated by Berlin,German and European organizations. Three of the BUA members are located in community 0 (FU, HU andCH) and one in community 1 (TU). Since FU and HU have formed a closer collaboration with CH since2003, it would be interesting to look further into the topics of focus in their research to investigate whetherthese separate communities which are all active in SS study different subjects. Both these communitiesTable 9: Composition of the largest and most prolific communities of the giant component by region andsector in H (N = community size, P = aggregate publications)
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Oceania Archive Company Education Facility Government Healthcare Nonprofit Other0 343 3,319 11 92 23 20 60 120 3 14 1 4 260 43 13 14 4 41 130 1,294 2 28 20 5 24 46 1 4 1 2 85 22 12 4 3 12 74 549 1 31 5 4 12 20 1 2 1 52 9 1 2 5 23 42 411 1 17 1 6 1 16 2 1 25 5 1 6 24 53 206 3 11 3 4 32 3 26 20 45 43 169 4 2 4 10 22 1 5 1 19 10 1 2 4 1
Region Sectorcluster N P Africa Americas Asia Berlin DEU Europe No region Company Education Facility Government Healthcare Nonprofit Other0 22 10,895 3 17 2 1 14 2 51 8 2,046 4 3 1 3 4 12 19 1,733 6 1 4 7 1 17 26 29 291 1 4 4 1 1 18 1 16 9 1 1 18 31 353 18 2 11 2 24 2 2 147 24 150 1 3 2 2 16 1 16 2 4 1 are collaborating preferably within Berlin and Germany and they have only three members from Europe.Community 0 with 22 members is the most prolific (10,895 publications). Community 2 is an interestingcase with one organization from Berlin (which is not from BUA members), six organizations from Americas,four from Germany, seven from Europe and in relative terms, it has a high productivity. It can signal evenhigher disciplinary divide in Humanities in the Berlin region. However, there are smaller, more internationalcommunities which are not highly prolific as it was the case in ET, NS and MHS. Americas have the largestshare of organizations after Europe and it is highly represented in community 8 which is only composed ofAmericas, two organizations from Berlin and 11 organizations from Europe (excluding Germany).
In this paper, we provide a quantitative, exploratory and macro view of the structure of scientific collaborationsin the Berlin metropolitan region. Our main level of analysis was scientific organizations (which can beacademic or non-academic organizations or firms) and we investigated the share of collaborative work,internationalized work versus single country collaborations. We covered all OECD scientific disciplines andpresented a comparative view of the similarities and differences of collaborations in these disciplines.In methodological terms, we developed two organization name disambiguation techniques (i.e., PyStringand Fuzzy matching) and compared their performance and coverage with an established technique (i.e.,Research Organization Registry (ROR)). We presented the high impact organization name disambiguationcould have on the constructed collaborations networks and how it can bias measures and trends. We had toexclude 51% of the publications which had one or more non-disambiguated organizations to limit our analysisto successfully disambiguated cases.At a first view and only based on descriptive analysis, we observed a highly collaborative scientificlandscape. Some disciplines present a high degree of difference between raw and fractional count ofpublications. Despite the prevalence of collaborative works and increasing trend towards internationalizationin aggregate view, we observed that some disciplines (e.g., Natural Sciences and Agricultural Sciences) havemore internationalized collaborations while other disciplines (e.g., Medical and Health Sciences, Humanitiesand Social Sciences) are less internationalized or they did not present a steady upward trend which is in-linewith observation by Moed et al. (1991) and Babchuk et al. (1999). But our further investigation showed thatthese disciplines have a more complex structure of collaboration which in some cases is highly dominated bylocal (e.g., Berlin and Germany) organizations and in some cases it is already Europeanized. In rare cases(e.g., Agricultural Sciences and Humanities) we observed a higher representation of international organizationsamong the most prolific communities despite the smaller size of these disciplines and in case of Humanitiesless than 25% share of internationalization. Overall we observed a high degree of collaboration betweenBerlin organizations and other European countries which is in-line with the trend observed by Hoekman et al.(2010), however, we observed that the image is more complicated once we look at cohesive subgroups andgroupings.In geographical terms, different disciplines present various collaboration trends, however, all of themhave high degrees of regional and European collaboration. Berlin presents a specific case. It is similarto a science hub with a diverse sectoral composition of organizations which is in line with Balland et al.(2020)’s observation in metropolitan regions in the USA and Rammer et al. (2020)’s observation of Berlin23etropolitan region. However, it can be due to our data gathering strategy where only publications withat least one organization located in Berlin are included. Thus there could be other collaborations betweenthe partners excluding Berlin organizations that we do not cover here. North America is the preferredcollaboration partner outside of Europe. But this image changes in some disciplines with closer share ofcollaborators from East Asia and Pacific and North America and their shares is the closest in EngineeringTechnology. Intra Germany (i.e., single country) collaborations prevail the co-authorships. The USA and theUK are central members of the collaboration landscape in most disciplines based on the aggregate numberof publications. But, France, China and Russia join the top five percent of most prolific collaborators inEngineering Technology and Natural Sciences.We provided a sectoral view of the geographical distribution of organizations worldwide and in Berlin.Some countries present a highly diverse science system consisting of a wide range of sectors among thosecollaborating with Berlin organizations. However, in most countries, education is the prevailing sector wherescientific publications are produced which is not counterintuitive. China and India have representatives inmany sectors.We modeled the scientific collaborations through bipartite co-authorship networks which treats eachscientific publication as an event where organizations interact in producing scientific texts. Our bipartitecommunity detection configuration was helpful in detecting the diverse composition of organizational teamscontributing to scientific publications which could be overlooked if the network is projected to one-modedue to artificially high cliquish behavior. We extracted the largest connected components in the aggregatelevel and for each scientific discipline. We then looked at denser collaborating groups. We observed that inmost disciplines, with the exception of
Humanities and
Agricultural Sciences , the most prolific communitieswere composed of organizations located in the Berlin metropolitan region and they were collaborating eitherwithin Berlin or with other German organizations or exclusively with European organizations. There were ofcourse internationalized communities in all disciplines, but they were not highly prolific.Disciplines present interesting differences in the composition of communities (i.e., denser collaboratinggroups). Not all communities were composed of highly prolific organizations. However, in some cases weobserved that all highly prolific organizations were member of one or two communities. We observed a highlyregional oriented Medical and Health Sciences, national and Europe oriented Natural Sciences, EngineeringTechnology and Social Sciences and an internationally oriented Agricultural Sciences and Humanities. Thiscan be due to the restrictive criteria we set in the further investigation of the highly prolific or the largestcommunities or it can signal strategic regional coalitions.Looking at the members of Berlin University Alliance (BUA) and their position in these cohesivesub-groups presented interesting findings. In Agricultural Sciences and Humanities which were the disciplineswith lower aggregate internationalization (less than 25%), we observed higher rates of internationalizedcollaboration once the structure of communities were analyzed and there was a larger divide between BUAmembers. They formed denser collaboration ties within three different communities. This is in line with theself-representations of the research profiles of the BUA members (Berlin University Alliance, 2018, 2019).Only in Medical and Health Sciences which was dominated by high productivity of Berlin and Germany, thefour BUA members were located in one community and collaborated densely. This is highly affected by thefact that from 2003, HU and FU have jointly established a MHS faculty located in Charité and our findingsshow that this strategic coalition has been successful in integrating other organizations from Berlin and TU.Furthermore, some of the observed disciplinary divide between BUA members could be remnants of theeast-west division in Germany and the reorganization of research profiles and mutually exclusive definition ofareas of focus to reduce parallel work and competition that happened after the reunification. This dividewas recently observed in biotechnology field by Abbasiharofteh & Broekel (2020). TU presents a specificcase. While the other three BUA members formed different collaboration structures based on disciplinaryspecialization, TU is in most cases member of a separated community of its own. In ET, TU’s collaborationnetwork is dominated by other Berlin and German organizations. It shows that BUA needs to developfurther strategic cooperations among the members to ensure a higher integration, similar to the case of MHS.However, this might be due to the fact that we included conference proceedings which is a specific publicationtype preferred more by the technical universities. Since TU is the main technical university in our sample,the collaboration structure reflected in this document type might have affected the observed results and over24nflate the divide between TU and other BUA members. Charité from other hand seems to have built a largecollaboration network worldwide which is evident in health-care sector and high representation of nationaland international organizations in the communities where Charité was located. BUA can plan to benchmarkthis as a case of successful internationalization of collaborations.We conclude that mixing a macro and global view while keeping regional, national and continentalgranularity can help in describing observed quantitative trends. It is necessary to move beyond the macrodescriptive view presented based on yearly count of publications or increasing trends of team science. Inaddition, investigation of self-representation of research profiles of scientific organizations was helpful ininterpretation of observed trends. As our investigation proved, not all members of the community are movingtowards internationalization and some parts of the community, which are normally the highly prolific ones,prevail and distort the aggregate images. This methodological approach is close to what Stadtfeld (2018)suggests in moving between micro, meso and macro levels to better explain the observed trends.Our paper suffers from certain limitations. When we construct the co-authorship network at theorganization level, we naturally overlook the changes that happen in the composition of researchers affiliatedto those organizations. The same organization could have a highly different composition of members over thetime that affects the type of research carried out and collaboration ties formed.We only use Scopus as the main database and although it covers German speaking publications, it isdominated by English speaking records. Bibliometric databases, including Scopus, are regularly updatedwhich can affect the temporal trends we observe here. In addition, each bibliometric database covers a specificset of scientific publications (see Stahlschmidt et al. (2019) for a comparison between WOS and Scopus),despite the similarities, there are differences in philosophies and approaches to what should be indexed.Furthermore, we were unable to disambiguate all the organizations in our sample which lead to excluding51% of the publications which had one or more non-disambiguated organizations. Thus, while our resultsseem to be following the general trends observed in the German scientific system (see Stahlschmidt et al.(2019), Stephen et al. (2020) and Aman (2016)), but the specificities observed in the structure of scientificcollaborations among BUA members and the international collaborations could be highly affected if we wereable to improve the coverage of the disambiguation techniques.Another limitation of our data and research in organization level is the superstar researchers withmultiple affiliations. We assume that these researchers have received resources from each of these multipleorganizations. Thus, we consider these researchers as bridges between these organizations. While in thenetworks constructed, these cases might seem as an international collaboration when a single author isaffiliated to multiple countries. High quality data with disambiguated records of publications in author levelwould allow a more complete investigation. Another limitation of our study could be that our disambiguationtechniques penalizes non-English speaking countries or less known organizations which are usually lessprolific. They can be more represented among the organizations that we excluded from our analysis since thedisambiguation did not give reliable results for them. In addition, different disambiguation techniques areeffective to identify a differing set of organizations and any choice would have implications on a subset of theorganizations while penalizing another subset.We do not have any insight over the background of individual researchers affiliated to these scientificorganizations. We do not know about the motivations that drive the scientific collaboration and observedtrends (Subramanyam, 1983; Katz & Martin, 1997). As an example, in case of all disciplines, we observedsmall communities which were leaning more towards internationalized collaborations. This might be groupsmainly consisting of migrant scientists who collaborate with their former scientific organizations or they playa “boundary spanning role” among regional, national and continental contexts. We cannot investigate thesetype of questions at the organization level.Furthermore, our definition of the Berlin metropolitan region was based on the affiliation addresseswhile literature on science geography presents a diverse array of definitions (e.g., Cottineau, Finance, Hatna,Arcaute, & Batty, 2019; Abbasiharofteh & Broekel, 2020) from NUTS level to areas covering multiple citieswhich are overlooked in our data gathering strategy. 25
Acknowledgements
We would like to thank Sybille Hinze, Martin Reinhart and Paul Donner for comments and suggestions onearlier versions of this paper. Caveats are our responsibility.
This research was done in DEKiF project supported by Federal Ministry for Education and Research (BMBF),Germany, with grant number: M527600. Data is obtained from Kompetenzzentrum Bibliometrie (CompetenceCenter for Bibliometrics), Germany, which is funded by BMBF with grant number 01PQ17001.
Data cannot be made publicly available due to the licensing and contract terms of the original data.
References
Abbasiharofteh, M., & Broekel, T. (2020). Still in the shadow of the wall? The case of the Berlin biotechnologycluster:
Environment and Planning A: Economy and Space . https://doi.org/10.1177/0308518X20933904Akbaritabar, A., Casnici, N., & Squazzoni, F. (2018). The conundrum of research productivity: A studyon sociologists in Italy.
Scientometrics , (3), 859–882. https://doi.org/10.1007/s11192-017-2606-5Akbaritabar, A., & Squazzoni, F. (2020). Gender Patterns of Publication in Top Sociological Journals. Science, Technology, & Human Values . https://doi.org/10.1177/0162243920941588Akbaritabar, A., Traag, V. A., Caimo, A., & Squazzoni, F. (2020). Italian sociologists: A community ofdisconnected groups.
Scientometrics . https://doi.org/10.1007/s11192-020-03555-wAman, V. (2016). How collaboration impacts citation flows within the German science system.
Sciento-metrics , (3), 2195–2216. https://doi.org/10.1007/s11192-016-2092-1Aman, V. (2018). Does the Scopus author ID suffice to track scientific international mobility? A casestudy based on Leibniz laureates. Scientometrics , (2), 705–720. https://doi.org/10.1007/s11192-018-2895-3 Araújo, E. B., Araújo, N. A. M., Moreira, A. A., Herrmann, H. J., & Andrade, J. S. (2017). Genderdifferences in scientific collaborations: Women are more egalitarian than men. PLOS ONE , (5), e0176791.https://doi.org/10.1371/journal.pone.0176791Avdeev, S. (2019). International Collaboration In Higher Education Research: A Gravity ModelApproach. SSRN Electronic Journal . https://doi.org/10.2139/ssrn.3505886Babchuk, N., Keith, B., & Peters, G. (1999). Collaboration in sociology and other scientific disciplines:A comparative trend analysis of scholarship in the social, physical, and mathematical sciences.
The AmericanSociologist , (3), 5–21. https://doi.org/10.1007/s12108-999-1007-5Balland, P.-A., Jara-Figueroa, C., Petralia, S. G., Steijn, M. P. A., Rigby, D. L., & Hidalgo, C. A.(2020). Complex economic activities concentrate in large cities. Nature Human Behaviour , (3), 248–254.https://doi.org/10.1038/s41562-019-0803-3Berlin University Alliance. (2018, February). Gemeinsam im Verbund (Together as a group). Berlin University Alliance
Berlin University Alliance Proposal Crossing Boundaries toward anIntegrated Research Environment . Berlin: Berlin University Alliance.Biancani, S., & McFarland, D. A. (2013). Social networks research in higher education. In
Highereducation: Handbook of theory and research (pp. 151–215). Springer.Blume, S., Bunders, J., Leydesdorff, L., & Whitley, R. (Eds.). (1987).
The Social Direction of thePublic Sciences . Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-009-3755-0Breiger, R. L. (1974). The Duality of Persons and Groups.
Social Forces , (2), 181–190. https://doi.org/10.1093/sf/53.2.181Cottineau, C., Finance, O., Hatna, E., Arcaute, E., & Batty, M. (2019). Defining urban agglomerationsto detect agglomeration economies. Environment and Planning B: Urban Analytics and City Science , (9),1611–1626. https://doi.org/10.1177/2399808318755146D’Angelo, C. A., & van Eck, N. J. (2020). Collecting large-scale publication data at the level ofindividual researchers: A practical proposal for author name disambiguation. Scientometrics , (2), 883–907.https://doi.org/10.1007/s11192-020-03410-yDe Stefano, D., Fuccella, V., Vitale, M. P., & Zaccarin, S. (2013). The use of different data sourcesin the analysis of co-authorship networks and scientific performance. Social Networks , (3), 370–381.https://doi.org/10.1016/j.socnet.2013.04.004Donner, P., Rimmert, C., & van Eck, N. J. (2019). Comparing institutional-level bibliometric researchperformance indicator values based on different affiliation disambiguation systems. Quantitative ScienceStudies , (1), 150–170. https://doi.org/10.1162/qss_a_00013Fox, M. F. (1983). Publication Productivity among Scientists: A Critical Review. Social Studies ofScience , (2), 285–305. https://doi.org/10.1177/%2F030631283013002005Glänzel, W., Schubert, A., & Czerwon, H. J. (1999). A bibliometric analysis of international scientificcooperation of the European Union (19851995). Scientometrics , (2), 185–202. https://doi.org/10.1007/BF02458432Hoekman, J., Frenken, K., & Tijssen, R. J. W. (2010). Research collaboration at a distance: Changingspatial patterns of scientific collaboration within Europe. Research Policy , (5), 662–673. https://doi.org/10.1016/j.respol.2010.01.012Katz, J., & Martin, B. R. (1997). What is research collaboration? Research Policy , (1), 1–18.https://doi.org/10.1016/S0048-7333(96)00917-1Katz, J. S. (1994). Geographical proximity and scientific collaboration. Scientometrics , (1), 31–43.https://doi.org/10.1007/BF02018100Laudel, G. (2002). What do we measure by co-authorships? Research Evaluation , (1), 3–15.https://doi.org/10.3152/147154402781776961Leahey, E. (2016). From Sole Investigator to Team Scientist: Trends in the Practice and Study ofResearch Collaboration. Annual Review of Sociology , (1), 81–100. https://doi.org/10.1146/annurev-soc-081715-074219Leone Sciabolazza, V., Vacca, R., Kennelly Okraku, T., & McCarty, C. (2017). Detecting andanalyzing research communities in longitudinal scientific networks. PLOS ONE , (8), e0182516. https://doi.org/10.1371/journal.pone.0182516Luukkonen, T., Persson, O., & Sivertsen, G. (1992). Understanding Patterns of International Sci-entific Collaboration. Science, Technology, & Human Values , (1), 101–126. https://doi.org/10.1177/%2F016224399201700106Moed, H. F., De Bruin, R. E., Nederhof, A. J., & Tijssen, R. J. W. (1991). International scientificco-operation and awareness within the European community: Problems and perspectives. Scientometrics ,27 (3), 291–311. https://doi.org/10.1007/BF02093972Nederhof, A. J. (2006). Bibliometric monitoring of research performance in the Social Sciences and theHumanities: A Review. Scientometrics , (1), 81–100. https://doi.org/10.1007/s11192-006-0007-2Newman, M. E. J. (2001a). Scientific collaboration networks. II. Shortest paths, weighted networks,and centrality. Physical Review E , (1), 016132. https://doi.org/10.1103/PhysRevE.64.016132Newman, M. E. J. (2001b). Scientific collaboration networks. I. Network construction and fundamentalresults. Physical Review E , (1), 016131. https://doi.org/10.1103/PhysRevE.64.016131Newman, M. E. J. (2004). Detecting community structure in networks. The European Physical JournalB - Condensed Matter , (2), 321–330. https://doi.org/10.1140/epjb/e2004-00124-yPalla, G., Barabási, A.-L., & Vicsek, T. (2007). Quantifying social group evolution. Nature , (7136),664–667. https://doi.org/10.1038/nature05670Rammer, C., Kinne, J., & Blind, K. (2020). Knowledge proximity and firm innovation: A microgeo-graphic analysis for Berlin. Urban Studies , (5), 996–1014. https://doi.org/10.1177/0042098018820241Reichardt, J., & Bornholdt, S. (2004). Detecting fuzzy community structures in complex networks witha Potts model. Physical Review Letters , (21), 218701. https://doi.org/10.1103/PhysRevLett.93.218701Rijcke, S. de, Wouters, P. F., Rushforth, A. D., Franssen, T. P., & Hammarfelt, B. (2016). Evaluationpractices and effects of indicator usea literature review. Research Evaluation , (2), 161–169. https://doi.org/10.1093/reseval/rvv038Rimmert, C. (2018). Institutional disambiguation for further countries - an exploration with extensiveuse of wikidata (project report). (Report). Bielefeld: Bielefeld University, Institute for InterdisciplinaryStudies of Science (I SoS).Small, M. L. (2017).
Someone to Talk to . Oxford University Press.Small, M. L., & Adler, L. (2019). The Role of Space in the Formation of Social Ties.
Annual Review ofSociology , (1), 111–132. https://doi.org/10.1146/annurev-soc-073018-022707Smith-Doerr, L., Alegria, S. N., & Sacco, T. (2017). How Diversity Matters in the US Science andEngineering Workforce: A Critical Review Considering Integration in Teams, Fields, and OrganizationalContexts. Engaging Science, Technology, and Society , , 139. https://doi.org/10.17351/ests2017.142Sonnenwald, D. H. (2007). Scientific collaboration. Annual Review of Information Science andTechnology , (1), 643–681. https://doi.org/10.1002/aris.2007.1440410121Stadtfeld, C. (2018). The Micro-Macro Link in Social Networks (SSRN Scholarly Paper No. ID 3211795).Rochester, NY: Social Science Research Network.Stahlschmidt, S., Stephen, D., & Hinze, S. (2019).
Performance and Structures of the German ScienceSystem (p. 91). Studien zum deutschen Innovationssystem.Stephen, D., Stahlschmidt, S., & Hinze, S. (2020).
Performance and Structures of the German ScienceSystem 2020 . Studien zum deutschen Innovationssystem.Subramanyam, K. (1983). Bibliometric studies of research collaboration: A review.
Journal ofInformation Science , (1), 33–38. https://doi.org/10.1177/016555158300600105Traag, V. A., Van Dooren, P., & Nesterov, Y. (2011). Narrow scope for resolution-limit-free communitydetection. Physical Review E , (1), 016114. https://doi.org/10.1103/PhysRevE.84.016114Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: Guaranteeing well-connected communities. Scientific Reports , (1), 5233. https://doi.org/10.1038/s41598-019-41695-zWagner, C. S., Park, H. W., & Leydesdorff, L. (2015). The Continuing Growth of Global CooperationNetworks in Research: A Conundrum for National Governments. PLOS ONE , (7), e0131816. https://doi.org/10.1371/journal.pone.0131816 28uchty, S., Jones, B. F., & Uzzi, B. (2007). The Increasing Dominance of Teams in Production ofKnowledge. Science ,316