[PDF] Measuring national capability over big sciences multidisciplinarity: A case study of nuclear fusion research

Abstract

In the era of big science, countries allocate big research and development budgets to large scientific facilities that boost collaboration and research capability. A nuclear fusion device called the "tokamak" is a source of great interest for many countries because it ideally generates sustainable energy expected to solve the energy crisis in the future. Here, to explore the scientific effects of tokamaks, we map a country's research capability in nuclear fusion research with normalized revealed comparative advantage on five topical clusters -- material, plasma, device, diagnostics, and simulation -- detected through a dynamic topic model. Our approach captures not only the growth of China, India, and the Republic of Korea but also the decline of Canada, Japan, Sweden, and the Netherlands. Time points of their rise and fall are related to tokamak operation, highlighting the importance of large facilities in big science. The gravity model points out that two countries collaborate less in device, diagnostics, and plasma research if they have comparative advantages in different topics. This relation is a unique feature of nuclear fusion compared to other science fields. Our results can be used and extended when building national policies for big science.

Full PDF

MMeasuring national capability over big science’smultidisciplinarity: A case study of nuclear fusion research

Hyunuk Kim,

1, 2, 3

Inho Hong, and Woo-Sung Jung

1, 4, 5, ∗ Department of Industrial and Management Engineering,Pohang University of Science and Technology, Pohang 37673, Republic of Korea Kellogg School of Management, Northwestern University,Evanston, IL 60208, United States of America Northwestern Institute on Complex Systems,Evanston, IL 60208, United States of America Department of Physics, Pohang University of Science and Technology,Pohang 37673, Republic of Korea Asia Paciﬁc Center for Theoretical Physics, Pohang 37673, Republic of Korea

Abstract

In the era of big science, countries allocate big research and development budgets to large scien-tiﬁc facilities that boost collaboration and research capability. A nuclear fusion device called the“tokamak” is a source of great interest for many countries because it ideally generates sustainableenergy expected to solve the energy crisis in the future. Here, to explore the scientiﬁc eﬀects oftokamaks, we map a country’s research capability in nuclear fusion research with normalized re-vealed comparative advantage on ﬁve topical clusters – material, plasma, device, diagnostics, andsimulation – detected through a dynamic topic model. Our approach captures not only the growthof China, India, and the Republic of Korea but also the decline of Canada, Japan, Sweden, andthe Netherlands. Time points of their rise and fall are related to tokamak operation, highlightingthe importance of large facilities in big science. The gravity model points out that two countriescollaborate less in device, diagnostics, and plasma research if they have comparative advantagesin diﬀerent topics. This relation is a unique feature of nuclear fusion compared to other scienceﬁelds. Our results can be used and extended when building national policies for big science. ∗ [email protected] a r X i v : . [ c s . D L ] J a n . INTRODUCTION Big science is characterized by its big budgets, manpower, and machines. It includes anumber of multidisciplinary ﬁelds such as nuclear fusion, particle accelerators, and spacescience [1]. Most of them originated for military reasons in World War II and were mainlyled by superpowers. In recent decades, as these ﬁelds become more demanding, countriesactively collaborate to utilize the resources of others and build shared infrastructure [2–4].In this sense, compared to little science, big science requires more international collaborationand resource accessibility [5].A large facility is considered the core resource of big science. From construction to op-eration, it requires participation of various stakeholders under the leadership of nationalgovernment, resulting in economic spillovers to society [6–8]. A large facility also stimu-lates scientiﬁc advancements by supporting research activities that are hard to conduct ina laboratory. It attracts researchers of diverse disciplines and enhances scientiﬁc collabora-tions. Despite its scientiﬁc importance, little attention has been paid to examining how largefacilities raise national research capacities because of diﬃculties in unraveling the multidis-ciplinarity of big science [9–11]. Moreover, national research capacity is diﬃcult to quantifyas it is built on the complex interactions between private and public domains [12, 13]. De-pending on science and technology policies, countries have diﬀerent goals, such as trainingexperts, publishing papers, or granting patents, that constitute the national research capac-ity [14, 15].Among many aspects of the national research capacity, this study focuses on academicpublishing to estimate the capacity quantitatively [16–22], which we term “research capa-bility,” by implementing topic modeling and revealed comparative advantage on the bib-liographic information of research papers. The dynamic topic model [23, 24] ﬁrst detectssubject ﬁelds from paper abstracts and distributes publication counts over the detectedﬁelds in real values. Normalized revealed comparative advantage (NRCA) [25] is appliedto fractional publication counts for projecting a country’s research capability as well as itschanges by facility construction. Based on NRCA, we measure how similar two countries’research capabilities are and include the distance in a gravity model to show its impact oninternational collaboration. 2or a case study, we investigate nuclear fusion, in which the construction of large facil-ities and international collaborations are crucial. Nuclear fusion is a ﬁeld that countrieshave interest in as it produces clean, aﬀordable, and sustainable energy [26, 27]. The his-tory of nuclear fusion consists of the footprints of major successes in tokamaks [28]. Afterthe nuclear fusion reaction of hydrogen was identiﬁed as the source of solar energy in the1920s [29], scientists began to study controlled thermonuclear fusion for sustainable energyproduction in the 1950s [30]. The tokamak is a device that magnetically conﬁnes high-temperature plasmas essential for steady thermonuclear reactions [31], and now it is themost dominant and actively studied device for nuclear fusion research [32]. Tokamaks arecomposed of strong magnets for conﬁning plasmas, several wall-components in a vacuum ves-sel for protection, heating devices, and diagnostic devices, which require knowledge acrossdiverse ﬁelds: plasma physics, numerical simulations, diagnostics, material science, and en-gineering [31]. The performance of tokamaks positively scales with size, thus tokamaks havebecome greater, better, and more expensive [33–36]. The large budgets for tokamaks haveincreased international collaborations since the 1990s, as seen in the cases of JET (JointEuropean Torus) [37] and ITER (International Thermonuclear Experimental Reactor) con-struction [34].Our approach successfully captures various aspects of nuclear fusion from a bibliographicdatabase over 40 years, 1976–2016. The dynamic topic model disentangles multidisciplinarityand classiﬁes 41 topics grouped into ﬁve topical clusters: material, plasma, device, diagnos-tics, and simulation. Furthermore, the revealed comparative advantage identiﬁes leadingcountries that participate in international projects or have their own tokamak. The riseand fall of these countries match well with tokamak operation. With the gravity model ofscientiﬁc collaboration, we additionally address whether complementarity leads to collabo-ration in nuclear fusion research. The regression results show that countries collaborate lessif they have research capability in diﬀerent topics. It is a unique characteristic of nuclear fu-sion compared to other sciences in which complementarity enhances collaborations [38–42].This paper provides quantitative evidence for establishing strategic policies that initiate andevaluate big science projects [43]. 3

I. DATA AND METHODSA. Bibliographic data

We analyzed 25,085 nuclear fusion research papers published during 1976-2016. Theywere collected from the Scopus database (document type: article) and contain the term“tokamak” in the title, abstract, or keyword ﬁelds. Papers without aﬃliation informationwere manually ﬁlled by checking their original documents. When an author had multipleaﬃliations, we considered the ﬁrst one as her/his nationality. We used the fractional countingmethod to obtain the number of papers for each country. For example, if a paper was writtenby three American and two Korean researchers, 0.6 and 0.4 were assigned to both countries’paper counts.The fractional counting method gives more weight to leading countries, so that wouldembrace their inherent academic leadership. Nevertheless, the fractional counting methodgives less biased results than the full counting method that assigns an equal weight to allcountries in a paper. The full counting method could overrepresent some countries (e.g. theUnited States) which participate in many international projects. Systemic comparisons ofthe two methods recommend the fractional counting method in co-authorship analysis [44,45], especially for scientiﬁc ﬁelds conducting large-scale international experiments. For thisreason, we chose the fractional counting method to estimate research capability as well asthe degree of collaborations.Among 75 countries in our dataset, we focused on the top 14 countries that publishedmore than 250 papers in our time scope. The distribution of paper counts was highly skewed.These 14 countries published more than 90% of the research articles. The top 14 countrieswere the United States, Japan, China, Germany, the United Kingdom, Russia, France, Italy,the Republic of Korea, Switzerland, India, Sweden, Canada, and the Netherlands. The basicstatistics of these countries are listed in Table I. A paper written by more than two authorsin diﬀerent countries is classiﬁed as a collaborative paper.

B. Topic modeling and clustering

The dynamic topic model (DTM) conceptualizes the knowledge in nuclear fusion re-search [23, 24]. The DTM speciﬁes topics in a set of documents based on latent Dirichlet4

ABLE I. Summary statistics of 14 leading countries in nuclear fusion research. All values arereal numbers as we count the number of papers by the fractional counting method. Ratio is theproportion of collaborative papers to total papers.

Country Collaborative Papers Total Papers Ratio

United States 978.4 7646.4 0.13Japan 411.7 3025.7 0.14China 335.7 2777.7 0.12Germany 738.1 2147.1 0.34United Kingdom 522.5 1775.5 0.29Russia 299.5 1392.5 0.22France 403.6 1135.6 0.36Italy 325.1 964.1 0.34Republic of Korea 115.8 424.8 0.27Switzerland 153.5 409.5 0.37India 49.8 400.8 0.12Sweden 135.0 326.0 0.41Canada 73.6 292.6 0.25Netherlands 102.4 276.4 0.37 allocation (LDA) [46], and it also describes the temporal evolution of detected topics byupdating consequent input hyperparameters α t and β t by each year. α t aﬀects the topicdistribution of a document, and β t indicates the word distribution in a topic. The DTMinfers both parameters to reproduce the empirical word distribution under the assumptionthat a document is made by both processes in year t , choosing a topic for a document by α t and sampling words in that topic by β t . α t and β t are used as references to estimate α t +1 and β t +1 .In our DTM implementation, insigniﬁcant words were ﬁltered out if their term frequency-inverse document frequency (tf-idf) values were less than 0.01. Then, we used the words thatappeared more than 10 times in the whole document. As a result, our dictionary contained7,851 unique words, and the documents contained 1,619,233 words in total. The number oftopics K needed to be determined before running the DTM. Following the recent approach[43], we speciﬁed the number of topics K = 41 (see S1 Appendix). Open source codes werewritten by the authors of the DTM paper and available at https://github.com/blei-lab/dtm.We manually labelled 41 topics from their word frequencies (see S2 Table).The DTM provides an article’s topic distribution based on the learned parameters. Aswe set the number of topics to 41, the topic distribution of an article was given as a vector of5ength 41. Topic distribution was allocated to countries in proportion to their contributionson each article. For instance, if an article was written by American authors only, the topicdistribution of the article was fully given to the United States. For another article written bythree American and two Korean researchers, 60% of the topic distribution would be added tothe United States. In this way, a country’s research capability over 41 topics was estimatedfor each year from 1976-2016. C. Fractional publication and collaboration counts by topics

The fractional counting method was used for calculating a country’s publication and col-laboration counts (Fig 1). For year t when n t papers are published, we have two matrices,the fractional publication counts by countries ( A t : n t papers ×

75 countries) and the topicdistributions of papers ( B t : n t papers ×

41 topics). A Tt B t represents the fractional pub-lication counts of 75 countries by 41 topics at year t . Based on the ﬁve topical clustersthat we found (Fig 2), the fractional counts were summed into ﬁve columns to obtain thediscriminant power for further analysis. We will explain these topical clusters in the resultsection. We hereafter call this summarized matrix as national research capability over 5topical clusters at year t , R t (75 countries × D. Normalized revealed comparative advantage (NRCA)

Normalized revealed comparative advantage (NRCA) [25], one of revealed comparativeadvantage indices, represents how much an entity’s value exceeds expectations. When com-paring longitudinal RCA values, NRCA outperforms the Balassa index (BRCA) [47], themost popular RCA index that deﬁnes comparative advantage as a ratio of observationsto expectations. Let R ij,t be country i ’s research capability on topical cluster j at year t . N RCA ij,t , the NRCA of country i on topical cluster j at year t , is calculated as N RCA ij,t = ∆ R ij,t /R t = ( R ij,t − R it R j,t /R t ) /R t = R ij,t /R t − R it R j,t /R t , (1)6 ocument set Author list (country profile)Abstract Fractional publication counts by countries: ! " ( " papers x 75 countries) Topic distributions of papers: $ " ( " papers x 41 topics) (1) Fractional publication counts by topics = ! "& $ " (41 topics → 5 topical clusters: material, plasma, device, diagnostics, simulation, check Fig 2) (2) Fractional collaboration counts by topical clusters DeviceSimulationDiagnostics Kinetic theory, DriftModel, Numerical calculationFeedback controlApplication to society (general)Scaling lawTransport simulationSpectroscopyNeuturon detector, SpectrumSoft X − ray, ImagingDiagnosticsITER, Design, DEMOPower plant, BlanketVacuum vessel, DustDisruptionCooling, MagnetRealtime acquisition, EASTPower supplySuperconducting coil, KSTARPower, Gyrotron, Tore SupraPlasmaMaterial Operation scenarioDischargeEdge − localized modeTurbulent transport, GyrokineticGeodesic acoustic modeInternal transport barrier, Steady − stateMagnetohydrodynamicsNeoclassical tearing modePlasma flowResonant magnetic perturbationAlfven eigenmode, NSTXElectron cyclotron resonance heatingLower hybrid current driveEnergetic particle lossSawtooth crash, StellaratorSurface material, Carbon tileWall material, Liquid lithiumImpurityNeutral beam injectionDivertorFlux surface, SeparatrixProbe measurement, Scrape − off layer [Hierarchical tree of 41 topics] USA Japan China ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ Vector ( USA China Korea

Doc 1 0.3 0.2 0.5

Collaboration matrix of document 1: W * = + & + , * USA China KoreaUSA

China

Korea

Vector - Material Plasma Device Diagnostics Simulation

Doc 1 0.05 0.1 0.4 0.3 0.15

Topical cluster weights of document 1: . , *,01"23415 = 0.05 × , * , *,:51;<1 = 0.1 × , * , *,>2?4@2 = 0.4 × , * , *,>41BCD;"4@; = 0.3 × , * , *,F4

Scientiﬁc collaboration between country m and n in topical cluster j at year t , w mn,j,t ,is related to the number of publications of the two ( P m,j,t and P n,j,t ) and their geographicaldistance ( d mn ). The gravity model explains their relationships in many scientiﬁc ﬁelds [48,49]. P m,j,t and P n,j,t positively and d mn negatively aﬀects w mn,j,t . We added the capabilitydistance to the gravity model for checking whether complementarity increases collaboration.Our basic model is written as ln ( w mn,j,t ) ∼ αln ( P m,j,t ) + βln ( P n,j,t ) + γln ( d mn ) + λc mn,j,t , (3)where d mn is the Haversine great circle distance (km) between capitals. For two countries m and n , we counted w mn,j,t , P m,j,t , and P n,j,t in real values, and calculated c mn,j,t from thebinary transformed NRCA vectors. A positive λ indicates that complementarity stimulatescollaboration. 8 eviceSimulationDiagnostics Kinetic theory, DriftModel, Numerical calculationFeedback controlApplication to society (general)Scaling lawTransport simulationSpectroscopyNeuturon detector, SpectrumSoft X−ray, ImagingDiagnosticsITER, Design, DEMOPower plant, BlanketVacuum vessel, DustDisruptionCooling, MagnetRealtime acquisition, EASTPower supplySuperconducting coil, KSTARPower, Gyrotron, Tore SupraPlasmaMaterial Operation scenarioDischargeEdge−localized modeTurbulent transport, GyrokineticGeodesic acoustic modeInternal transport barrier, Steady−stateMagnetohydrodynamicsNeoclassical tearing modePlasma flowResonant magnetic perturbationAlfven eigenmode, NSTXElectron cyclotron resonance heatingLower hybrid current driveEnergetic particle lossSawtooth crash, StellaratorSurface material, Carbon tileWall material, Liquid lithiumImpurityNeutral beam injectionDivertorFlux surface, SeparatrixProbe measurement, Scrape−off layer FIG. 2. Hierarchical tree of 41 topics detected from the dynamic topic model. Topics were ag-glomerated by the ward.D method [50]. The distance between topics was measured by the Jensen-Shannon distance [51], a square root of the Jensen-Shannon divergence. Five topical clusters –material, plasma, device, diagnostics, and simulation – are revealed. The branches are colored bythe corresponding topical clusters.

III. RESULTSA. Knowledge structure of nuclear fusion research

The DTM detected 41 topics in the dataset. Each topic had its word distribution indi-cating the extent of word assignments to the topic. We assumed that two topics were closeif their word distributions were similar. The topic distance between topic k and k was9btained by the Jensen-Shannon distance [51], a square root of the Jensen-Shannon diver-gence. For simplicity, we used the word distribution at the last year, β ,k and β ,k .A knowledge structure of nuclear fusion research was drawn by agglomerating 41 topicswith the ward.D method [50]. The hierarchical tree consists of ﬁve distinguishable topicalclusters: material, plasma, device, diagnostics, and simulation (Fig 2).Each cluster is clearly characterized by its topics. We observe the details of each branchfrom the top of the tree. The “material” cluster is described by tokamak edge plasmasand components as plasmas interact with wall materials at the edge. The “plasma” clustercontains general plasma-related topics (i.e., plasma ﬂow, magnetohydrodynamics, and dis-charge), major instabilities in tokamak conﬁgurations (i.e., Alfv´en eigenmode, neoclassicaltearing mode, and edge-localized mode), and heating methods (i.e., lower hybrid currentdrive and electron cyclotron resonance heating). The “device” cluster includes mechanicalcomponents in tokamaks (i.e., coil, power supply, vessel, magnet, and blanket) and severaltokamaks (i.e., Tore Supra, KSTAR, and EAST). The “diagnostics” cluster is composed ofplasma diagnostics methods such as soft X-ray, neutron detector, and spectroscopy. Finally,the “simulation” cluster focuses on analytic calculations and computations. B. National research capability and its overall trends

Normalized revealed comparative advantage (NRCA) on the fractional publication countsextracted national research capability over 40 years (Fig 3). In all countries, NRCA changesare in good agreement with tokamak construction and operation, representing the scientiﬁceﬀects of large facilities across multiple domains. The United States and Japan have lednuclear fusion research, while Japan’s inﬂuence has been decreasing since the 2000s. It maybe due to the upgrade of their major tokamak JT-60 which was disassembled in 2009-2012and is being upgraded to JT-60SA for ﬁrst plasma in 2020. China rapidly develops researchcapability overall except in material-related topics. Even though we consider the rise ofChina in all science and technology ﬁelds, their pace in nuclear fusion research is surprisinglyfast. China’s tokamaks, HT-7 and HL-2A, raise research capability in device, diagnostics,and simulation. At the point of EAST (Experimental Advanced Superconducting Tokamak)operation in 2006, they also began to equip plasma capability as well. The other countriesoperating their own tokamaks, Germany, the United Kingdom, Russia, France, Italy, and10 anada NetherlandsSwitzerland India SwedenFrance Italy Republic of KoreaGermany United Kingdom RussiaUnited States Japan China

Year R an k MaterialPlasmaDeviceDiagnosticsSimulation

FIG. 3. Ranks of normalized revealed comparative advantages for the top 14 countries. Rank seriesof the countries are smoothed with LOESS (locally estimated scatterplot smoothing) and coloredby the topical clusters.

Switzerland, actively engage in nuclear fusion research. However, the countries without theirown tokamak operation, Sweden and the Netherlands, are losing their research capabilities.Canada’s fall seems plausible as they left tokamak projects in the early 2000s [52]. There aretwo interesting countries, the Republic of Korea and India, that obtain research capability inall ﬁelds. Their rises coincide with the ITER project and construction of tokamaks, KSTAR(ﬁrst plasma in 2008) and SST-1 (ﬁrst plasma in 2013).11 . Negative relation between complementarity and collaboration

Complementarity positively aﬀects collaboration in many science ﬁelds [38–42]. Re-searchers and countries ﬁnd collaborators that exchange knowledge as well as resources theydo not have. We assume complementarity boosts collaboration even in big science becausecountries have limited budgets and manpower. To observe whether our assumption holds,we implemented the gravity model of collaboration with the capability distance, a Jaccarddistance of the binary NRCA vectors in ﬁve topical clusters (Eq 3). The OLS regressionresults with ﬁxed time eﬀects are given in Table II. The coeﬃcients of publication counts oftwo countries are the same because they are symmetric in the collaboration matrix.

TABLE II. Gravity model OLS regression results.

Variables Material Plasma Device Diagnostics Simulation ln ( P m,j ) 0.497***(0.033) 0.508***(0.032) 0.411***(0.030) 0.438***(0.033) 0.488***(0.033) ln ( P n,j ) 0.497***(0.033) 0.508***(0.032) 0.411***(0.030) 0.438***(0.033) 0.488***(0.033) ln ( d mn ) -0.495***(0.044) -0.451***(0.040) -0.464***(0.042) -0.546***(0.049) -0.485***(0.043) c mn,j -0.133(0.222) -0.911***(0.284) -0.949***(0.232) -0.690***(0.175) -0.027(0.194)Observations 3518 3518 3518 3518 3518 R < < < In all topical clusters, as expected, the number of publications had a positive coeﬃcient,and the geographical distance had a negative coeﬃcient. This means that collaborationsoccur frequently when two countries have high research capability and locate closely. Incontrast to our assumption, the capability distance negatively aﬀects collaboration, indicat-ing that countries collaborate less if they have research capabilities in diﬀerent topics. Thistendency is found in three clusters, plasma, device, and diagnostics, with respect to fusionreaction in tokamak facilities. Collaborations on material and simulation are not related tothe capability distance. The regression results suggest that complementarity would aﬀectcollaborations diﬀerently by topics in big science. International collaborations in core knowl-edge ﬁelds happen when two countries mutually beneﬁt based on similar research capability.12

V. DISCUSSION AND CONCLUSION

Large facilities and international collaboration, two core components of big science, wereinvestigated with bibliographic data, the dynamic topic model, and revealed comparativeadvantage. In this study, we chose nuclear fusion for a case study. Word similarity betweentopics unfolded the knowledge structure of nuclear fusion comprising ﬁve multidisciplinarytopical clusters: material, plasma, device, diagnostics, and simulation. Diﬀerent countrieshave diﬀerent comparative advantages over these clusters. The time points that the compar-ative advantage trend changes match well with tokamak operation. Catching-up countriesthat have built their own tokamaks have developed their research capability while countriesthat do not operate a tokamak miss their productivity.Revealed comparative advantage can be used as a new indicator of big science projectevaluation. Through time series analysis [53], we can examine the connections between facil-ity construction and revealed comparative advantages in diﬀerent topical clusters. The timeseries analysis addresses whether knowledge spillover occurs in various scales from facilitiesto countries [54–56]. In addition, with external information such as the amount of funding,the number of employees, and instrument speciﬁcations, we can investigate the impact offacility construction and international collaboration in detail. The publishing policy of largefacilities also needs to be considered when interpreting the comparative advantage. Largefacilities that restrict the publication of academic papers for the purpose of secrecy [57] havelow research capability in our study, relative to others that promote academic publishing.These qualitative factors of facilities require further evaluations to estimate their scientiﬁcimpacts accurately as the measure for policy making, investment, and education [58].The international collaboration in nuclear fusion was estimated by the gravity model withthe capability distance that represents how similar two countries research capabilities are.The regression results show high capability distance distracts the international collaborationsin fusion reaction related clusters: plasma, device, and diagnostics. This tendency contrastswith that of other science ﬁelds favoring collaborators that have complementary comparativeadvantages [38–42]. Real collaborations in nuclear fusion governed by this pattern are worthstudying. Countries may have distinct motivations to collaborate with other countries andto participate in international projects. Political and societal factors would also be involvedin the policy making process. Understanding the history of nuclear fusion research gives us13nsights into what science policy a country has to take depending on the development stage.Our approach can be applied to other ﬁelds of big science. Particle physics and Antarc-tic science are the potential targets. They depend on large facilities, particle accelerators,and research stations in Antarctica. In particle physics, we expect that the dynamic topicmodel diﬀerentiates various types of particle accelerators [59]. A country’s strategic deci-sions for particle accelerators can be traced with comparative advantages on topical clus-ters. In Antarctic science, research stations may increase research capabilities on geography-dependent topics [60, 61] because its location expands the range of research activities. Anincreasing comparative advantage on spatial topics will support this idea. Antarctic science,especially, has interesting aspects that aﬀect the gravity model of collaboration. Collabora-tion in Antarctica would occur frequently between close research stations, not between closecapitals, so the geographical distance of the model should be deﬁned in a diﬀerent way. TheAntarctic Treaty System, which enforces the peaceful usage of Antarctica and freedom ofscientiﬁc investigation [62], can encourage countries to collaborate with others having com-plementary comparative advantages. It is necessary to determine in particle physics andAntarctic science whether collaboration in big science decreases by complementarity as inthe case of nuclear fusion. More studies are needed to understand the nature of big science.

ACKNOWLEDGMENTS

This work was supported by Basic Science Research Program through the National Re-search Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B03932590).H.K. acknowledges the NRF Grant funded by the Korean Government (NRF-2017H1A2A1044205,Global Ph.D. Fellowship Program). [1] A. M. Weinberg, Science , 161 (1961).[2] J. H. Capshew and K. A. Rader, Osiris , 2 (1992).[3] H. Xin and G. Yidong, Science , 1548 (2006).[4] J.-M. Fortin and D. J. Currie, PLoS ONE , e65263 (2013).[5] D. H. Sonnenwald, Annual Review of Information Science and Technology , 643 (2007).[6] E. Autio, A.-P. Hameri, and O. Vuola, Research Policy , 107 (2004).

7] W. Choi, H. Tho, Y. Kim, S. Hwang, and D. Kang, Fusion Engineering and Design , 1263(2017).[8] P. Castelnovo, M. Florio, S. Forte, L. Rossi, and E. Sirtori, Research Policy , 1853 (2018).[9] R. Heidler and O. Hallonsten, Scientometrics , 295 (2015).[10] O. Hallonsten, Research Evaluation , 486 (2016).[11] L. Qiao, R. Mu, and K. Chen, Technological Forecasting and Social Change , 102 (2016).[12] C. Freeman, Cambridge Journal of Economics , 5 (1995).[13] H. Etzkowitz and L. Leydesdorﬀ, Research Policy , 109 (2000).[14] I. Feller, Economic Development Quarterly , 283 (1997).[15] P. Lar´edo and P. Mustar, Research and innovation policies in the new global economy: Aninternational comparative analysis (Edward Elgar Publishing, 2001).[16] D. E. Chubin, Knowledge , 254 (1987).[17] K. B¨orner, R. Klavans, M. Patek, A. M. Zoss, J. R. Biberstine, R. P. Light, V. Larivi`ere, andK. W. Boyack, PLoS ONE , e39464 (2012).[18] A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. P. Hsu, and K. Wang, in Proceedings ofthe 24th international conference on world wide web (ACM, 2015) pp. 243–246.[19] Q. Wang and L. Waltman, Journal of Informetrics , 347 (2016).[20] G. Chen, L. Xiao, C.-p. Hu, and X.-q. Zhao, Scientometrics , 707 (2015).[21] M. R. Guevara, D. Hartmann, M. Aristar´an, M. Mendoza, and C. A. Hidalgo, Scientometrics , 1695 (2016).[22] N. Li, Scientometrics , 493 (2017).[23] D. M. Blei and J. D. Laﬀerty, in Proceedings of the 23rd international conference on Machinelearning (ACM, 2006) pp. 113–120.[24] S. Gerrish and D. M. Blei, in

ICML , Vol. 10 (Citeseer, 2010) pp. 375–382.[25] R. Yu, J. Cai, and P. Leung, The Annals of Regional Science , 267 (2009).[26] F. Chen, An indispensable truth: how fusion power can save the planet (Springer Science &Business Media, 2011).[27] D. Clery,

A piece of the sun (Gerald Duckworth & Co, 2013).[28] C. M. Braams and P. E. Stott,

Nuclear fusion: half a century of magnetic conﬁnement fusionresearch (CRC Press, 2002).

29] A. S. Eddington,

The internal constitution of the stars (University Press Cambridge, 1926)pp. viii, 407 p.[30] V. Smirnov, Nuclear Fusion , 014003 (2009).[31] J. Wesson and D. J. Campbell, Tokamaks , Vol. 149 (Oxford University Press, 2011).[32] M. Kikuchi, Energies , 1741 (2010).[33] J. D. Lawson, Proceedings of the Physical Society. Section B , 6 (1957).[34] R. Aymar, P. Barabaschi, and Y. Shimomura, Plasma Physics and Controlled Fusion , 519(2002).[35] K. Ikeda, Nuclear Fusion , 014002 (2009).[36] D. Grandoni, Huﬃngton Post (2015).[37] P. Rebut, R. Bickerton, and B. E. Keen, Nuclear Fusion , 1011 (1985).[38] W. Oh, J. N. Choi, and K. Kim, Journal of Management Information Systems , 266 (2005).[39] F. Barjak and S. Robinson, Social Geography , 23 (2008).[40] T. Heinze and S. Kuhlmann, Research Policy , 888 (2008).[41] M. Acosta, D. Coronado, E. Ferr´andiz, and M. D. Le´on, Scientometrics , 63 (2011).[42] C. Zhang and J. Guo, Scientometrics , 1129 (2017).[43] A. Gerow, Y. Hu, J. Boyd-Graber, D. M. Blei, and J. A. Evans, Proceedings of the NationalAcademy of Sciences , 201719792 (2018).[44] A. Perianes-Rodriguez, L. Waltman, and N. J. van Eck, Journal of Informetrics , 1178(2016).[45] H. W. Park, J. Yoon, and L. Leydesdorﬀ, Scientometrics , 1017 (2016).[46] D. M. Blei, A. Y. Ng, and M. I. Jordan, Journal of Machine Learning Research , 993 (2003).[47] B. Balassa, The Manchester School , 99 (1965).[48] R. Ponds, F. Van Oort, and K. Frenken, Papers in Regional Science , 423 (2007).[49] J. Hoekman, K. Frenken, and R. J. Tijssen, Research Policy , 662 (2010).[50] F. Murtagh and P. Legendre, Journal of Classiﬁcation , 274 (2014).[51] D. M. Endres and J. E. Schindelin, IEEE Transactions on Information Theory (2003).[52] G. Brumﬁel, Nature (2003).[53] C. W. Granger, Econometrica: Journal of the Econometric Society , 424 (1969).[54] A.-P. Hameri, The Journal of Technology Transfer , 27 (1997).

55] E. Horlings,

The societal footprint of big science: A literature review in support of evidence-based decision making (Rathenau Intituut, 2012).[56] R. Wylie, S. Markowski, and P. Hall, Defence and Peace Economics , 257 (2006).[57] D. B. Resnik, Episteme , 135 (2006).[58] O. Hallonsten, Scientometrics , 497 (2013).[59] H. Wiedemann, Particle accelerator physics (Springer, 2015).[60] G. E. Fogg,

A history of Antarctic science (Cambridge University Press, 1992).[61] H. Kim and W.-S. Jung, Industrial Engineering & Management Systems , 92 (2016).[62] P. A. Berkman, M. A. Lang, D. W. Walton, and O. R. Young, Antarctica, Science and theGovernance of International Spaces (2011). The recent work using the regression-based document inﬂuence model (rDIM) introducesa method to determine the number of topics K as an input of topic modeling [1]. In general,it runs a static LDA for a large K , and then it speciﬁes the number of signiﬁcant topics whosecorresponding documents have a suﬃcient number of words larger than w th . In addition tothat, we found the minimal K by varying the threshold w th . The details are as follows.First, we ran a static LDA for K = 500 following the reference. In each topic t with n d ( t ) corresponding document, we found the number of documents n x ( t ) that containedmore than w th words (tokens). Then, we determined the signiﬁcance of the topic fromthe proportion p ( t ) = n x ( t ) /n d ( t ). In the range near the average value of per-documenttokens, p ( t ) has a Gaussian distribution. From the kernel density estimation (KDE) of thisGaussian distribution, we determined the number of signiﬁcant topics whose proportion ofdocument p ( t ) is larger than the cutoﬀ proportion, where the derivative of KDE is minimal.By considering the size of tokens in a document, we set the threshold w th = 50. As a result,the number of topics was determined as K = 41 (S1 FIG). Proportion of documents p C oun t o f t op i cs KDECutoffHistogram

S1 FIG: Topic usage distribution for static LDA model. We used the topic usage distributionfor static K = 500 model to calculate the cutoﬀ that speciﬁes suﬃciently used topics. Theminimum of KDE (blue line) derivative determines the cutoﬀ (red dashed line), and thenumber of topics above this point, K = 41, is used for the DTM.18 The top 10 words for each topic were obtained from the dynamic topic model on 25,085abstracts between 1976 and 2016. We manually named 41 topics using the list of the top 15words of each topic, and the top 10 words among them are listed here.

Topic Top 10 words