[PDF] Growth rates of modern science: A latent piecewise growth curve approach to model publication numbers from established and new literature databases

Abstract

Growth of science is a prevalent issue in science of science studies. In recent years, two new bibliographic databases have been introduced which can be used to study growth processes in science from centuries back: Dimensions from Digital Science and Microsoft Academic. In this study, we used publication data from these new databases and added publication data from two established databases (Web of Science from Clarivate Analytics and Scopus from Elsevier) to investigate scientific growth processes from the beginning of the modern science system until today. We estimated regression models that included simultaneously the publication counts from the four databases. The results of the unrestricted growth of science calculations show that the overall growth rate amounts to 4.02% with a doubling time of 16.8 years. As the comparison of various segmented regression models in the current study revealed, the model with five segments fits the publication data best. We demonstrated that these segments with different growth rates can be interpreted very well, since they are related to either phases of economic (e.g., industrialization) and / or political developments (e.g., Second World War). In this study, we additionally analyzed scientific growth in two broad fields and the relationship of scientific and economic growth in UK. We focused on this country, since long-time series for publication counts and economic growth indices were available.

Full PDF

Growth rates of modern science: A latent piecewise growth curve approach to model publication numbers from established and new literature databases

Lutz Bornmann*, Rüdiger Mutz** & Robin Haunschild*** *Division for Science and Innovation Studies Administrative Headquarters of the Max Planck Society Hofgartenstr. 8, 80539 Munich, Germany. Email: [email protected] ** Center for Higher Education and Science Studies, CHESS University of Zurich Andreasstrasse 15, 8050 Zurich, Switzerland. Email: [email protected] *** Max Planck Institute for Solid State Research Heisenbergstraße 1, 70569 Stuttgart, Germany. Email: [email protected] 2

Abstract

Key words scientometrics, bibliometrics, growth of science, Microsoft Academic, Web of Science, Scopus, Dimensions 3 Introduction

Growth of science is an ongoing topic in studies on science of science. In a recent overview of science of science studies, Fortunato et al. (2018) stated that “early studies discovered an exponential growth in the volume of scientific literature … a trend that continues with an average doubling period of 15 years”. Popular early studies have been published by Derek John de Solla Price (1965; 1951, 1961) who can be seen as the pioneer in investigating growth of science processes (see de Bellis, 2009). In most of the studies on this topic published hitherto, bibliometric data have been used to measure growth of science (an alternative measure is the number of researchers, for instance). It is an advantage of using bibliometric data (compared to other data) that large-scale, multi-disciplinary databases are available based on worldwide publication productions. Another advantage is the characteristic of most scientific disciplines that publications are the main outcome: “science would not exist, if scientific results are not communicated. Communication is the driving force of science. That is why scientists have to publish their research results in the open, international scientific literature. Thus, publications are essential” (van Raan, 1999, p. 417). According to Merton (1988), “what we mean by the expression ‘scientific contribution’: an offering that is accepted, however provisionally, into the common fund of knowledge” (p. 620). In a previous study (Bornmann & Mutz, 2015), two authors of the current study investigated the growth of science based on data from the Web of Science database (Clarivate Analytics; Birkle, Pendlebury, Schnell, & Adams, 2020). Bornmann and Mutz (2015) not only used annual publication numbers but also cited references data (see Marx & Bornmann, 2016, for an overview of the use of cited references data in scientometrics). They argued that Web of Science data (publication counts) are scarcely suitable to investigate early periods of modern science, since early publications are not sufficiently covered. Cited references may 4 have the advantage of covering these early periods and a wider range of document types, including journal articles, books, book contributions or proceedings, which are still not fully included in the databases. However, cited references data can only serve as a less-than-ideal proxy of publication numbers, because non-cited publications are not considered. In recent years, new bibliographic databases have been introduced: Dimensions (Herzog, Hook, & Konkiel, 2020; Hook, Porter, & Herzog, 2018) from Digital Science and Microsoft Academic (Wang et al., 2020) which can be used to study growth processes in science from centuries back. Thus, it is the intention of the current study to use both databases for investigating these processes and compare the results with those from Web of Science and Scopus (Elsevier; Baas, Schotten, Plume, Côté, & Karimi, 2020). With Dimensions, Microsoft Academic, Web of Science, and Scopus, we considered in this study (the most) important multi-disciplinary literature databases currently available. The comparison of the empirical results based on the four databases may point to an assessment of growth processes in science that might be interpreted as valid – since the assessments can be made independently of the use of single data sources. We investigated the growth processes not only for all annual publications in the databases, but also for two broad fields: (1) Physical and Technical Sciences and (2) Life Sciences (including Health Sciences). Since scientometric research revealed that growth of science is related to economic development (Fernald & Jones, 2014; Salter & Martin, 2001), we additionally undertook a comparative analysis of economic and scientific growth processes. This comparative analysis could not be done based on worldwide data, since long-time series for publication counts and economic growth indices are not available at this level. Following seminal research by May (1997) and King (2004a, 2004b) on the relationship of science and economy, we focus instead on UK for which time series of economic development are available that reach back to the 17th century. Such historical data are not available for other countries (to the best of our knowledge). Using similar statistical methods as for publication data, we investigated in this 5 study annual growth rates in gross domestic product (GDP) as a measure of economic wealth of a nation similar to the approach by King (2004a, 2004b). Methods

Dataset used

We used bibliometric and economic data in this study. The five different databases and datasets are as follows:

Web of Science : The core citation indices of Web of Science (SCI-E, SSCI, and A&HCI) date back into the 1960s when they were founded by Eugene Garfield. The other citation indices were started later on (e.g., CPCI-S and CPCI-SSH). In total, the publications indexed in the Web of Science are divided into 44 different document types (e.g., review, news item or note). The coverage of the scientific literature dates back to 1900. The Web of Science is more selective with respect to the choice of indexed sources than the other databases in this study (Visser, van Eck, & Waltman, 2020). We used the advanced search of the Web of Science online interface with the query “py=1900-2018” in the indices SCI-E, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI, CCR-EXPANDED, and IC (Index Chemicus) (date of search: 30 August 2019). No restriction on document types was imposed. Via the “analyze results” function applied to publication years, we were able to download the number of indexed papers per year. Broad subject categories were defined via the Web of Science subject categories: - Physical and Technical Sciences : “Astronomy & Astrophysics”, “Chemistry”, “Crystallography”, “Electrochemistry”, “Geochemistry & Geophysics”, “Geology”, “Mathematics”, “Meteorology & Atmospheric Sciences”, “Mineralogy”, “Mining & Mineral Processing”, “Oceanography”, “Optics”, “Physical Geography”, “Physics”, “Polymer Science”, “Thermodynamics”, “Water Resources”, “Acoustics”, See http://apps.webofknowledge.com

Life Sciences (including Health Sciences): “Agriculture”, “Allergy”, “Anatomy & Morphology”, “Anesthesiology”, “Anthropology”, “Audiology & Speech-Language Pathology”, “Behavioral Sciences”, “Biochemistry & Molecular Biology”, “Biodiversity & Conservation”, “Biophysics”, “Biotechnology & Applied Microbiology”, “Cardiovascular System & Cardiology”, “Cell Biology”, “Critical Care Medicine”, “Dentistry, Oral Surgery & Medicine”, “Dermatology”, “Developmental Biology”, “Emergency Medicine”, “Endocrinology & Metabolism”, “Entomology”, “Environmental Sciences & Ecology”, “Evolutionary Biology”, “Fisheries”, “Food Science & Technology”, “Forestry”, “Gastroenterology & Hepatology”, “General & Internal Medicine”, “Genetics & Heredity”, “Geriatrics & Gerontology”, “Health Care Sciences & Services”, “Hematology”, “Immunology”, “Infectious Diseases”, “Integrative & Complementary Medicine”, “Legal Medicine”, “Life Sciences Biomedicine Other Topics”, “Marine & Freshwater Biology”, “Mathematical & Computational Biology”, “Medical Ethics”, “Medical Informatics”, “Medical Laboratory Technology”, “Microbiology”, “Mycology”, “Neurosciences & Neurology”, “Nursing”, “Nutrition & Dietetics”, “Obstetrics & Gynecology”, “Oncology”, “Ophthalmology”, “Orthopedics”, “Otorhinolaryngology”, “Paleontology”, “Parasitology”, “Pathology”, “Pediatrics”, “Pharmacology & Pharmacy”, “Physiology”, “Plant Sciences”, “Psychiatry”, “Public, Environmental & 7 Occupational Health”, “Radiology, Nuclear Medicine & Medical Imaging”, “Rehabilitation”, “Reproductive Biology”, “Research & Experimental Medicine”, “Respiratory System”, “Rheumatology”, “Sport Sciences”, “Substance Abuse”, “Surgery”, “Toxicology”, “Transplantation”, “Tropical Medicine”, “Urology & Nephrology”, “Veterinary Sciences”, “Virology”, and “Zoology”.

Scopus : Scopus was launched in 2004 by the publisher Elsevier. Coverage of the scientific literature dates back to 1861. The publications indexed in Scopus are divided into 16 different document types. Scopus has a broader coverage than Web of Science, especially in the social sciences and humanities (Visser et al., 2020). We used the advanced search of the Scopus online interface with the query “PUBYEAR AFT 1800” for this study (date of search: 30 August 2019). No restriction on document types was imposed. Via the “analyze search results” function applied to publication years, we were able to conveniently download the number of indexed papers per year. Broad subject categories were defined via the Scopus subject areas: - Physical and Technical Sciences : “Chemical Engineering”, “Chemistry”, “Computer Science”, “Earth and Planetary Sciences”, “Energy”, “Engineering”, “Environmental Science”, "Materials Science”, “Mathematics”, and “Physics and Astronomy”. -

Life Sciences (including Health Sciences): “Medicine”, “Nursing”, “Veterinary”, “Dentistry”, “Health Professions”, “Multidisciplinary ”, “Agricultural and Biological Sciences”, “Biochemistry, Genetics and Molecular Biology”, “Immunology and Microbiology”, “Neuroscience”, and “Pharmacology, Toxicology and Pharmaceutics”. Microsoft Academic : Microsoft Academic was first released in 2016. It can be considered an unconventional bibliographic database because its content is not delivered by the publishers but found by the search engine Bing on the publisher’s websites. Microsoft We included “Multidisciplinary” in Life Sciences – following the suggestions by Elsevier – since most of the papers in this category are also assigned to Life Sciences or Health Sciences categories. and bulk data access via the Azure platform . Microsoft Academic has a broader coverage than Web of Science and Scopus (Visser et al., 2020). We downloaded a snapshot of the Microsoft Academic data from the Azure platform (last update: 11 January 2019). The raw Microsoft Academic data were imported and processed in a locally maintained PostgreSQL database at the Max Planck Institute for Solid State Research. Our current snapshot of the Microsoft Academic database contains bibliographic data of 212,209,775 publications, such as title, publication year, and document type. Content coverage dates back to 1800. The publications indexed in Microsoft Academic are divided into five different document types (“Journal”, “Patent”, “Conference”, “BookChapter”, and “Book”). Unfortunately, 77,227,143 indexed items are not assigned to any document type. Via SQL commands, we produced items per publication year statistics in the Microsoft Academic database excluding the document type patent but included the items without document type. Microsoft Academic offers a subject classification on different hierarchical levels. There are 19 different fields on the highest level. Broad subject categories were defined via that highest level: - Physical and Technical Sciences : “Geology”, “Chemistry”, “Materials science”, “Mathematics”, “Engineering”, “Environmental science“, “Physics”, “Geography”, and “Computer science”. -

Life Sciences (including Health Sciences): “Biology” and “Medicine”.

Dimensions : Dimensions is the most recent database used in this study. It was launched in 2018 by Digital Science and contains meta-information about grants, publications, clinical trials, and patents. Dimensions is accessible via an online search interface , an API, and, additionally, Digital Science shares the raw data without cost for https://academic.microsoft.com https://azure.microsoft.com/ See https://app.dimensions.ai/ . The raw Dimensions data (last update: 26 September 2019) were downloaded, imported and processed in a locally maintained PostgreSQL database at the Max Planck Institute for Solid State Research. The raw data of the Dimensions database are provided as separate sub-databases: “Grants”, “Publications”, “Clinical trials”, and “Patents”. In the following, by using the term “Dimensions” in the text, we refer only to the Dimensions sub-database “Publications”. The indexed publications therein are divided into six different publication types (“article”, “chapter”, “proceeding”, “preprint”, “monograph”, and “book”). Dimensions offers the second largest coverage of the literature in this study (Visser et al., 2020). Dimensions offers a much larger coverage of books and book chapters than Web of Science or Scopus (Clarivate, 2020; Elsevier, 2020; Taylor, 2020). Via simple SQLs, we produced publications per publication year statistics without restrictions on publication types in the Dimensions database. Dimensions offers many different classification schemes, some of them are focused on specific disciplines or topic like Sustainable Development Goals (SDGs). For the purposes of our study, we have made use of the Dimensions implementation of the Australian and New Zealand Standard Research Classification (ANZSRC) Fields of Study (FOR) codes, as per the 2008 field definitions. The ANZSRC codes are delivered at three levels, the two least granular levels of which have been implemented in Dimensions. There are 22 fields of the highest level. Broad subject categories were defined in this study via that higher level: -

Physical and Technical Sciences : “Mathematical Sciences”, “Physical Sciences”, “Chemical Sciences”, “Earth Sciences”, “Environmental Sciences”, “Information and Computing Sciences”, “Engineering”, “Technology”, and “Built Environment and Design”. -

Life Sciences (including Health Sciences): “Biological Sciences”, “Agricultural and Veterinary Sciences”, and “Medical and Health Sciences”. See https://ds.digital-science.com/NoCostAgreement-Collaborators FRED : The economic research department of the Federal Reserve Bank of St. Louis (FRED) offers a series of datasets for economic analyses and for analyses of the historical development of economic indicators. A time series from 1770 to 2016 of the annual “Nominal Gross Domestic Product at Market Prices in the UK, Millions of British Pounds, Annual, Not Seasonally Adjusted” (NGDPMPUKA) for UK was downloaded as an EXCEL table. We use in the following the term “growth domestic product” or GDP instead of NGDPMPUKA (to facilitate the reading of the results). Since the values are nominal values, GDP is not adjusted for inflation. Publication counts for UK were retrieved from Dimensions for the years 1788 until 2016. The data retrieved from the various databases is the number of publications published in one year. For the growth analysis, however, the cumulative number of publications is used. If, for example, up to a year x, 1,000 publications were published, and in the year x 100, the accumulated number of publications in the year x is 1100 publications. The difference to year x-1 is exactly the absolute growth in year x, i.e. 100 publications, the number of publications published in year x. For simplification, “number of publications” is used below instead of “cumulative number of publications”.

Statistical analyses

Scientific growth processes do not necessarily run homogeneously over time, especially when a long-time horizon is chosen, for example, from the beginning of modern science in the 16th/17th century until today. Therefore, modern growth analysis has to simultaneously address three different problems: (1) Science can grow according to different growth functions. (2) It can be assumed that science grows at different rates in different time periods or segments, i.e., growth rates vary over time. (3) Growth functions might vary across different databases such as Scopus or Web of Science covering different time horizons. In the See https://fred.stlouisfed.org/series/NGDPMPUKA

11 following sections, solutions to the three problems are presented which refer to growth functions (unrestricted and restricted exponential growth), segmented regression, and latent growth curve models.

Growth functions

The simplest growth function is that of unrestricted growth in the form of an exponential function, where the growth of science in each year is proportional to the volume of publications available in the previous year. An equal percentage of volume grows every year. For example, if we assume an annual growth rate of 10% and 100 publications in a certain year, then there are 100+0.10*100=110 publications in the following year. One year later, there are 110+0.10*110=121 publications (and so on). Another growth function assumed by Price (1963) is that of restricted growth: Science would run exponentially at the beginning, but with time the growth process approaches an upper capacity limit with constantly decreasing growth rates (s-shaped course). In view of the limited capacities of human and investment capital for research (and other sections of society), the latter thesis by Price (1963) seems to be more plausible than the simplest growth function: Since resources (human resources, capital) are limited, growth cannot be limitless either. These considerations make it necessary to choose a statistical analysis approach that starts from different time segments, in which different growth rates apply and different growth functions are possible as well. The time segments themselves are not known in advance and have to be estimated. Such an opportunity is offered by the “segmented regression” or “piecewise regression” analyses, which start from different intervals of a dependent variable (in this case: time). These regression analyses apply different functional relationships and simultaneously make it possible to estimate time segments and parameters of the growth functions (Gallant & Fuller, 1973; McZgee & Carleton, 1970; Schwarz, 2015; Toms & Lesperance, 2003; Valsamis, Ricketts, Husband, & Rogers, 2019; Wagner, Soumerai, Zhang, & Ross-Degnan, 2002). In this study, we assume a time series in which the 12 total number of publications y t is available per year, where t denotes the index of the time series, and t =0 the starting year of the time series (e.g., for the year 1665: t =year-1665). We assume two growth functions (see above): Unrestricted exponential growth

The functional relationship for exponential growth assumes that the derivative of the function is proportional to the function itself: f ( t ) ~ b f ( t ) (Tsoularis & Wallace, 2002, p. 27). The resolution of this differential equation leads to a functional relationship, which can be represented in the following statistical model: ( ) ~ ( , )        t t t b b tt t y f t e e e N , (1) where e b represents the initial volume of publications at the starting point of the time series ( t =0), b the annual growth rate, and t  the residual with the variance σ as well as the correlation matrix of the residuals CORR ɛtɛt-1 . The latter is equated here with the identity matrix I , which means that the residuals do not (auto-)correlate. After the model estimation, we checked whether the residuals of the estimated model are actually auto-correlated or not. In the simplest case of an autoregressive process of first order (AR(1)), the residuals at time t are (auto-)correlated with the residuals at time t -1. If equation 1 is logarithmically transformed, a simple linear regression function can be obtained:

20 1 ln ( ) ~ ( , )         t t e t t t y b b t N (2) The doubling time k as the time the growth process needs to double the population size at a given time point is: 13 ln(2) / (ln(1 )), k g   (3) where k is the doubling time and g is the growth rate. Restricted exponential growth (Verhulst-Pearl)

For restricted exponential growth as a special case of a logistic growth model with a capacity limit C , the derivation of the function is proportional to the following function: f ( t )= b f ( t ) (1- f ( t )/ C ). The resolution of this differential equation leads to a functional relationship, which can be represented in the following statistical model (Tsoularis & Wallace, 2002, p. 28f.): ( ) ~ ( , )( )           t t t bKt tb bb tK e ey f t e Ne e e e (4) It can be seen from equation 4 that if t ->∞, the exponential expression in the denominator, e -b t , goes towards zero and the function approaches the capacity limit C = e K . At time t =0, the start of the time series, the exponential expression in the denominator, e -b t , is equal to 1 and the function corresponds to the initial volume e b multiplied by the error term e εt . A limited growth is assumed only for the first segment. The combination of s-shaped segments over time seems to be implausible in light of the empirical results on the growth of science by Bornmann and Mutz (2015). If equation 4 is logarithmically transformed, the following linear regression function results: ln ( ) ln (( ) ) ~ ( , )             t t b bb tKe t e t t y K b e e e e N (5) 14 In the following, we call the “restricted exponential growth model (Verhulst-Pearl)” the “logistic growth model”. Segmented regression

In addition to the functional model, a statistical framework model is required. We used segmented regression which defines the regression models for different time segments and can be represented in the form of nested IF-THEN clauses for each segment j . In the case of unrestricted growth in all segments j , the following overall model applies with year t as the starting year of the time series (e.g., 1665): IF t ≤ α THEN log( ) ( ) t t y b b t t    

ELSE IF t ≤ α THEN log( ) ( ) ( ) t t y b b a t b t a      

ELSE IF t ≤ a THEN log( ) ( ) ( ) ( ) t t y b b a t b a a b t a         … ELSE IF t ≤ α j THEN log( ) ( 1)( ( )) ( ) ~ ( , ) t t jt k k k j j t tk y b j b a a b t a N                 , (6) where a j denotes the year at which the j th time segment j ends, and where a = t – the starting year of the time series. In addition to the parameters of the growth model, the year parameters a to a j -1 are estimated. The same distribution of residuals is assumed for each segment. Publication counts is a count variable. The variable includes positive integer values with zero. This implies that the values are Poisson distributed (Hilbe, 2014, p. 2). In this study, however, a logarithmic transformation (base e ) of the publication data was favored 15 over a Poisson model for the following reasons: (1) with regard to growth rates of science, unrestricted growth can be assumed, in which the logarithmic transformation leads to a simple linear regression function. The parameters of the function can be interpreted in terms of the original non-transformed growth function (Panik, 2014, p. 33). (2) If it can be demonstrated that the observed values are well explained by the function (because of low residual variance), then neither the distribution function nor the transformation play a major role. (3) Due to the smaller scale of the values resulting from log-transformation, there is a greater chance that complex statistical models converge in the estimation process. Piecewise latent growth curve model with missing imputation

In this study, we used data from several bibliographic databases. We therefore needed to find an answer to the question of how the various datasets reflecting the same information (scientific output) should be analyzed statistically. It was one option to conduct the analyses for each database separately. This approach would accord with the analyses by Bornmann and Mutz (2015). Analyses for each database separately, however, run the risk of obtaining four different results that might reflect specific aspects of a database. Another option was to analyze the data from the different databases within one statistical model. This solution would still need solutions to the following problems: (1) The time intervals at which publication data are available vary from database to database. The largest time interval (from 1665 to 2018) is available from Dimensions. To analyze only the time interval for which all databases provide complete data would significantly limit the period of investigation of the development of science. (2) The publication data vary greatly in volume between the databases. Dimensions, for example, has the highest volume of publications when the entire time series is considered, whereas Web of Science has the comparatively lowest volume. Here, the question arises whether some form of data weighting according to volume is necessary. 16 The solution for these problems that we favored in this study was the application of the so-called “Latent Piecewise Growth Curve Model”. This model can be run in conjunction with an approach based on completed time series, i.e. incomplete time series are treated as statistical missing value problems (Capie & Wood, 1997; SAS Institute Inc., 2015). In the first step, based on the complete information across all four time series / databases, the missing values of a time series are imputed with estimated values. To take into account the inaccuracy of values in the estimation (when imputed values are used), five imputed values are estimated for each missing value. In a second step, for each of the five complete datasets with imputed values, a segmented regression model is estimated and then synthesized to an overall result considering the inaccuracy of the missing imputation in the calculation of standard errors. We assume “Missing at random” (MAR) as a prerequisite for missing imputation in this study, where the missingness is not at random, but can be possibly explained by the other time series. Web of Science, for example, covers the range from the year 1900 until 2018, and Dimensions the range from 1670 to 2016. MAR requires that the missing values for Web of Science between 1670 and 1899 are not the results of intended actions by the database provider, Clarivate Analytics. In the case of intended actions, for example, the company would systematically (completely) leave out publication years with low publications counts. With such intended actions, MNAR (“Missing not at random”) would exist and imputation would lead to distortions. The assumed inaccuracy of the model estimation by missing imputation reflects the uncertainty of the historical perspective: The further the empirical analysis goes back in history, the more uncertain the results become. The statistical analyses in this study were done with the statistical software package SAS and the procedures PROC NLMIXED, PROC NLIN and PROC MI (SAS Institute Inc., 2015). 17 Results

In this section, the results of the model estimations are presented. The first five years of each time series were discarded for the estimations because they seemed to reflect only a pseudo segment or artifact without any empirical meaning. Therefore, the actual starting years were 1670 for Dimensions, 1805 for Microsoft Academic, 1905 for Web of Science, and 1866 for Scopus. Each time series ran until the year 2018.

Model comparison

Statistical model comparisons make it possible to rule out unrealistic models with poor model fit in order to get the model with the relatively best fit to the data. The model formulation is associated with certain assumptions about scientific growth (see Table 1): (1) A model with unconstrained exponential growth can be distinguished from a model with logistic growth . (2) One can distinguish whether the models based on different bibliographic databases come to similar or different results (are there mixed-effects or not ?). (3) If there are significant differences between the results based on the databases, the following question would arise: Do the databases with a comparable high (low) volume of publications in the beginning of the time series show a high (low) increase in the later publication count? If so, the covariance or correlation between starting volume of publications and slope across the databases would be high (is there covariance or not ?). (4) The models can provide different answers to the question of how many segments exist in the growth of science ( how many segments can be distinguished?). Model comparisons can be undertaken based on the Schwarz`s Bayesian information criterion (BIC). The smaller the BIC, the better the model fits the data (see Table 1). Comparing model 1 and model 2, it becomes clear that a simple fixed-effects model (M ) does not fit the data well. The differences between the growth curves based on the various databases are too large, so that a mixed-effects model (M ) can be assumed which results in a 18 significantly smaller BIC. The hypothesis of logistic growth can be rejected as well since the exponential model fits better. Among the models in Table 1, model M with five segments and a covariance of intercept and slope in the first segment fits best. This result applies to all datasets (databases) considered in this study that refer to (1) all publications, (2) Physical and Technical Sciences publications, and (3) Life Science publications. Since the explained variance – measured in terms of the coefficient of determination (R ) – exceeds .99, any autocorrelation among residuals can be neglected (equations 2 to 5). The covariance matrix of the residuals, CORR ɛ t, ɛ t-1 , is assumed to be an identity matrix I . The model comparison in Table 1 demonstrates that the assumption of constant scientific growth over time is not realistic; hence, we can start with the premise that periods with different growth rates exist. This premise seems reasonable since, for example, the history of the 20 th century is characterized by two World Wars with drastic consequences for the science system worldwide. As the results by Bornmann and Mutz (2015) based on cited references data have shown, the negative effects of the World Wars on scientific activities are clearly visible (for the estimated parameters of the model, see Table S1 in the Appendix). 19 Table 1. Model comparison using Schwarz’s Bayesian information criterion (BIC) for publication data from different bibliographic databases including all publications, Life Sciences publications, and Physical and Technical Sciences publications M nr Model description Mixed effects? Covariance components? Number of segments Number of parameters All publications Life Sciences Physical and Technical Sciences M Exponential growth No No 1 3 5,752.7 3,651.2 4,819.2 M Exponential growth Yes No 1 5 3,224.1 2,786.7 3,325.0 M Exponential growth Yes Yes 1 7 3,208.4 * 3,304.3 M Logistic growth Yes Yes 1 7 3,285.1 2,798.4 3,290.9 M Segmented Regression to M Yes No 2 8 1,212.5 2,373.7 3,219.8 M Segmented Regression to M Yes Yes 2 10 1,209.1 2,364.5 3,209.7 M Segmented Regression to M Yes Yes 3 13 495.0 -103.5 356.9 M Segmented Regression to M Yes Yes 4 16 193.9 -233.8 56.6 M Segmented Regression to M Yes Yes 5 18 193.1 -284.5 -32.0 Note. * Iterations in the estimation process did not converge. 20 With respect to the single time series of the GDP, a model with seven segments fit the data best (see Table 2). For publication counts, a model with eight segments shows the best fit (see Table 2). We additionally compared the models using the mean square error (MSE) and the BIC derived from the MSE to select certain models (Kim & Kim, 2016) (see Table S2 in the Appendix for the estimated parameters of the model). Table 2. Model comparison using Schwarz’s Bayesian information criterion (BIC) for publication data and growth domestic product data (GDP) of UK M nr Model description Number of segments Number of parameters Publication count GDP MSE BIC MSE BIC M Exponential growth 1 2 0.152 -419.82 1.056 27.62 M Logistic growth 1 2 0.195 -365.83 2.055 237.81 M Segmented Regression to M

2 4 0.150 -414.51 0.058 -881.90 M Segmented Regression to M

3 6 0.006 -1,157.17 0.051 -912.71 M Segmented Regression to M

4 8 0.005 -1,174.78 0.038 -995.51* M Segmented Regression to M

5 10 0.004 -1,197.21* 0.038 -983.99* M Segmented Regression to M

6 12 0.002 -1,354.62* 0.024 -1,121.85 M Segmented Regression to M

7 14 0.002 -1,350.37 0.012 -1,343.92 M Segmented Regression to M

8 16 0.002 -1,356.72 0.029 -1,042.64* M Segmented Regression to M

9 18 0.002 -1,337.75* 0.016 -1,225.21 Notes: BIC = Schwarz’s Bayesian information criterion, MSE = Mean square error, optimal models are grey shaded. *No convergence of the iterations in the estimation process. 21

Growth rates of science (all publications)

In our analyses of growth processes in science using publication data, we follow typical assumptions such as those formulated by Long and Fox (1995): “while research productivity is not strictly equivalent to publication productivity, publication is generally taken as an indication of research” (p. 51). Figure 1 shows the result of the unrestricted growth (M ) and segmented unrestricted growth (M ) models based on the data from Dimensions, Microsoft Academic, Scopus, and Web of Science. The graphs in the figure present the annual logarithmized number of publications cumulated across time. The grey dots represent the missing imputed values for one imputation, the colored symbols the observed values (the raw data from the databases), and the black solid line (with the two black dashed lines) the predicted values from the regression analyses (with 95% prediction intervals). As the results of the unrestricted growth (M ) in Figure 1a show, the overall growth rate amounts to 4.02% with a doubling time of 16.8 years. As the model comparison in section 3.1 revealed, a model with five segments fits the data best. The results of this model are presented in Figure 1b. The colored dashed lines show the individual regression line based on the data from the various databases, and the black solid line the overall regression for the whole data (across all databases). The symbols represent single values, either observed (colored symbols) or imputed (grey dots). The results in the figure show – with the exception of the results based on the Scopus data for the first segment – that the predicted values from the regression (dashed lines) cover the observed values (points) very well. 22 Figure 1. Plots for a) unrestricted growth (M ) and b) segmented unrestricted growth (M ) based on the number of publications from four bibliographic databases 23 The five segments in Figure 1b seem to represent separate historical epochs in the modern history of science: These segments with different growth rates are oriented towards either phases of economic (e.g., industrialization) and / or political developments (e.g., World Wars): 1. Phase:

Emergence of modern physics and pre-industrialization (1675-1814). The phase up to the end of the Napoleonic wars is characterized by a moderate annual growth of 2.9% and a doubling time of 24.2 year, i.e., during 24.2 years the volume of publications doubles. This early phase of science is characterized by major discoveries in physics by Isaac Newton (1643-1727) and the development of the steam engine (James Watt, from 1769). 2.

Phase:

Industrial Revolution (1815-1880). In this phase of industrial revolution, science grew very strongly with an annual growth rate of 6.15% and a doubling time of 11.6 years. 3.

Phase:

Economic crises and periods of World Wars (1881-1943): The development of science flattened out with an annual growth rate of 3.85% and a doubling time of 18.4 years. In this period, two economic depressions and two World Wars took place. The “long depression” is a period that started in 1873 and ended in 1896. The period is mainly characterized by a deflation in the USA and Europe (Capie & Wood, 1997). The “long depression” can be distinguished from the “Great Depression” that ranged from 1929 until the beginning of the Second World War. 4.

Phase:

End of Second World War (1943-1946). Overall (across the data from all databases), a decrease in scientific production of -1.5% can be observed. 5.

Phase:

Post-war period (after 1946 until today): Since 1946, science has grown exponentially without restrictions with an annual growth rate of 5.1% and a doubling time of 13.8 years. 24

Growth rates of science for Life Sciences and Physical and Technical Sciences

In addition to the analyses including all publications, we have also conducted analyses for two broad fields: Life Sciences and Physical and Technical Sciences. The estimated parameters of the models are reported in Table S1 in the Appendix. The results are visualized in Figure 2 and Figure 3. With the comparison of two broad fields, we wanted to find out whether different fields are characterized by similar or different growth rates in their historical developments. As the results in Figure 2a show, the overall annual average growth rate for Life Science amounts to 4.94% with a doubling time of 14.4 years. The results for the Physical and Technical Sciences are similar, with a growth rate of 5.34% and a doubling time of 13.3 years. In agreement with the results for all publications in Figure 1b, the predicted values of the segmented regression model (dashed lines) cover the observed values (points) very well (high amount of explained variance) in both broad fields (see Figure 2b and Figure 3b). In both figures, we can observe trends that – although not completely congruent with the trends based on all publications – roughly illustrate the four central stages in the development of science and society: pre-industrialization (until 1810), Industrial Revolution (from 1810 onwards), Second World War (1939-1945) with a decline in the volume of publications, and the post-World War period after 1945. In the segment reflecting the period after 1945, with an annual growth rate of 5.8% and a doubling time of 12.3 years, the growth in the Physical and Technical Sciences is higher than the growth rate in the Life Sciences. In the Life Sciences the growth rate is 4.9% with a doubling time of 14.5 years. The growth rate in the Physical and Technical Sciences is also (slightly) higher than the growth rate that we calculated based on all publications: 5.1%. 25 Figure 2. Plots for a) unrestricted growth (M ) and b) segmented unrestricted growth (M ) based on the number of publications in Life Sciences 26 Figure 3. Plots for a) unrestricted growth (M ) and b) segmented unrestricted growth (M ) based on the number of publications in Physical and Technical Sciences 27 Comparative analysis of growth rates of science and of growth domestic product in UK

For a comparative analysis of economic and scientific growth, we used data from UK as explained in section 1. The publication counts were obtained by the Dimensions database. The average growth rate of science in UK since 1780 is 4.85% (see Figure 4a). This corresponds to a doubling time of 14.64 years. This growth rate is slightly higher than the average worldwide growth rate of 4.02% (see Figure 1a). The statistical analysis revealed eight segments with different growth rates (see Figure 4b). The growth is, therefore, more differentiated than the overall growth with five segments (see Figure 1a). Between 1780 and 1805 (pre-industrialization) as well as 1805 and 1844 (early industrialization), a strong growth of 7.45% and 5.76%, respectively, can be observed. The growth weakens to 3.63% in the phase of industrialization from 1848 and the First World War as well as the 1920s. Comparable to worldwide results (see Figure 1b), a significant slowdown in scientific growth with a growth rate of 2.45% is apparent around the Second World War (between 1940 and 1948). While the overall analysis shows an unrestricted exponential growth after 1945 (see Figure 4b), the growth of science in UK took place in three stages: a strong growth of 6.58% until 1959, which intensified between 1959 and 1983 (8.3%), and slowed down to 6.22% in the years after 1983. The growth rates in these three segments are even higher than the worldwide growth rate of 5.15% in the corresponding time segment (between 1945 and 2018). At the beginning of the 1980s, Margaret Thatcher was Prime Minister of UK and with her party, the Conservative Party, having won the majority in the House of Commons for the second time in 1983. 28 Figure 4. Plots for a) unrestricted growth (M ) and b) segmented unrestricted growth (M ) based on the number of publications from UK (using Dimensions data) 29 Figure 5. Plots for a) unrestricted growth (M ) and b) segmented unrestricted growth (M ) based on gross domestic product (GDP) data from UK (source: FRED Economic Research) 30 Figure 5 shows the annual GDP growth rate between 1700 and 2018 for comparison with the publication numbers. The figure is based on the logarithmized annual GDP, presented as raw data and predicted values from the regression model. Previous studies investigating the relationship between economic and scientific growth have demonstrated positive relationships (e.g., Halpenny, Burke, McNeill, Snow, & Torreggiani, 2010; Hart & Sommerfeld, 1998; Ntuli, Inglesi-Lotz, Chang, & Pouris, 2015). The results in Figure 5a reveal an annual GDP growth rate of 3% and doubling time of 23.5 years which is lower than the growth rate based on publication counts of 4.85% (see Figure 4a). At first glance, economic growth and scientific growth do not seem to be linked necessarily. A more detailed view shows, however, that both growths are related at certain points over time (see Figure 4b and Figure 5b). For example, science and economy grew from 1780 to the beginning of the 19th century (1810, 1805), i.e. in the phase of pre-industrialization at a comparable rate: whereas the economy grew by 4.2%, science grew by 5.76%. Furthermore, there is a coupling of economic and scientific development at the beginning of industrialization in the 1840s (1843, 1844) with a moderate annual growth rate of 2.34% in economy and 3.63% in science. A last temporal coupling can be observed in the years after the Second World War with a strong economic growth, especially from 1969 to 1987 of 13.5%. Three years later, in 1990, Margaret Thatcher resigned as Prime Minister. While the slowdown in the economy did not begin until after 1987, science began to grow at a rate of only 6.22% as early as 1983. Discussion

Modern science is based on knowledge-producing institutions and processes (Gieryn, 1982). Current research is a method of “systematically exploring the unknown to acquire knowledge and understanding. Efficient research requires awareness of all prior research and technology that could impact the research topic of interest, and builds upon these past 31 advances to create discovery and new advances” (Kostoff & Shlesinger, 2005, p. 199). Society expects a steady increase in scientific growth since only considerable growth processes would reflect more knowledge about and understanding of the world. Measurements of scientific growth processes are usually based on publication numbers, since the results of research mostly appear in publications: “in academic institutions, publications constitute in all scientific-scholarly subject fields an important form of academic output” (Moed, 2017, p. 63). The results of Digital Science (2016) show that especially the journal article becomes increasingly popular as a medium for presenting scientific results. The popularity of journal articles could also be the consequence of the higher than average growth in disciplines using journal articles. The motivation by researchers for publishing their results (in journal articles) is especially fostered by the specificity of the scientific reward system: “Publications have another function as well [besides the open availability of research results]: The principal way for a scholar to be rewarded for his contribution to the advancement of knowledge is through recognition by peers. In order to receive such an award, scholars publish their findings openly, so that these can be used and acknowledged by their colleagues” (Moed, 2017, p. 62). Although the publication of findings is so basic in science, researchers also process their findings in other forms of output (e.g., patents or presentations). An overview of indicators for measuring productivity based on these other forms can be found in Godin (2009). The problem of most of these indicators for measuring productivity or scientific growth, however, is that annual and historical data without missing values are scarcely available. In this study, we used publication data from four literature databases to investigate scientific growth processes from the beginning of the modern science system until today. The results of the unrestricted growth show that the overall growth rate amounts to 4.02% with a doubling time of 16.8 years. This annual growth rate (over the various databases) is different from the Web of Science growth rate of 2.96% reported in Bornmann and Mutz (2015), since 32 we considered in the current study a significantly longer time period than Bornmann and Mutz (2015): from 1900 until 2018 in this study (119 years) versus from 1980 until 2012 (33 years) in Bornmann and Mutz (2015). As the comparison of various segmented regression models in the current study revealed, the model with five segments fits the data best. We demonstrated that these segments with different growth rates can be interpreted very well since they are related to either phases of economic (e.g., industrialization) and/or political developments (e.g., World Wars). Obviously the war efforts (allocation of funds) led to a visible decline in research (by output measure of publication) but research went on nevertheless, possibly with even more vigor. However, that research was not being made available openly for security reasons (and researchers pulled in for the sake of war efforts from physics to languages, material science to mathematics/emerging computer science) – and arguably the results of war-time research triggered post-war discoveries, too. We additionally undertook two further analyses focusing on (1) growth in two broad fields (Life Sciences and Physical and Technical Sciences) as well as (2) the relationship between scientific and economic growth. (1) The comparison between the two broad fields revealed that although slight differences are observable, these differences are not so great that they can be denoted as fundamental. For example, whereas the overall annual average growth rate for Life Science is 4.94% with a doubling time of 14.4 years, the overall growth rate for Physical and Technical Sciences is 5.34% with a doubling time of 13.3 years. (2) In the investigation of the relationship of scientific and economic growth, we focused on UK – one of the few countries with corresponding available (historical) data. The results showed that the scientific growth rate of UK’s number of publications (4.85%) is slightly higher than the average worldwide growth rate (4.02%). Furthermore, the results demonstrated that the growth of UK’s number of publications is more differentiated (with eight segments) than the worldwide growth (with five segments). The comparison of the British economic and scientific growth rates revealed that the GDP growth rate is lower than the scientific growth 33 rate (3% versus 4.85%). This lower rate might be expectable, since not all scientific results are relevant for the economy. In the interpretation of the scientific growth rates that were mostly increasing in the historical development, two interpretations are possible: Either researchers were able to publish more publications in the same time or the increased publication counts can be traced back to an increase in the number of researchers. The study by Fanelli and Larivière (2016) targeted this question. Their results pointed to the second interpretation being more plausible. Fanelli and Larivière (2016) analyzed “individual publication profiles of over 40,000 scientists whose first recorded paper appeared in the Web of Science database between the years 1900 and 1998, and who published two or more papers within the first fifteen years of activity – an ‘early-career’ phase in which pressures to publish are believed to be high. As expected, the total number of papers published by scientists has increased, particularly in recent decades. However, the average number of collaborators has also increased, and this factor should be taken into account when estimating publication rates. Adjusted for co-authorship, the publication rate of scientists in all disciplines has not increased overall, and has actually mostly declined” (Fanelli & Larivière, 2016). Two limitations mentioned by Bornmann and Mutz (2015) are still valid for the current study and should be considered in the interpretation of the results: The first limitation refers to the use of publication counts to measure growth processes. According to Tabah (1999), there are advantages and disadvantages in using these numbers: “although counting publications is simple and relatively straightforward, interpretation of the data can create difficulties that have in the past led to severe criticisms of bibliometric methodology … The main problems concern the least publishable unit (LPU), disciplinary variance, variance in quality of work, and variance in journal quality” (p. 264). The second limitation concerns the interpretation of “growth” as an “increase in numbers”. According to Bornmann and Mutz (2015), “it is not clear whether an ‘increase in numbers’ is 34 directly related to an ‘increase of actionable knowledge’, for example for reducing needs, extending our knowledge about nature in some lasting way or some other ‘higher purposes’ (p. 2221). 35

Acknowledgements

The Microsoft Academic and Dimensions data used in this paper are from a locally maintained database at the Max Planck Institute for Solid State Research derived from the snapshots provided by Microsoft and Digital Science, respectively. Web of Science and Scopus data were retrieved using the corresponding web-interfaces. We thank members of the Digital Science team for providing us with feedback on an earlier version of our manuscript. References

Baas, J., Schotten, M., Plume, A., Côté, G., & Karimi, R. (2020). Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. (1), 377-386. doi: 10.1162/qss_a_00019. Birkle, C., Pendlebury, D. A., Schnell, J., & Adams, J. (2020). Web of Science as a data source for research on scientific and scholarly activity. Quantitative Science Studies, 1 (1), 363-376. doi: 10.1162/qss_a_00018. Bornmann, L., & Mutz, R. (2015). Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references.

Journal of the Association for Information Science and Technology, 66 (11), 2215-2222. doi: 10.1002/asi.23329. Capie, F., & Wood, G. (1997). Great depression of 1873–1896. In D. Glasner & T. F. Cooley (Eds.),

Business cycles and depressions: An encyclopedia (pp. 148-149). New York: Garland Publishing. Clarivate. (2020). Book Citation Index - Clarivate Analytics. Retrieved 24 September 2020, from http://wokinfo.com/products_tools/multidisciplinary/bookcitationindex/ de Bellis, N. (2009).

Bibliometrics and citation analysis: From the Science Citation Index to cybermetrics . Lanham, MD, USA: Scarecrow Press. Digital Science. (2016).

Publication patterns in research underpinning impact in REF2014 . London, UK: Digital Science. Elsevier. (2020). Books | Elsevier Scopus Blog. Retrieved 24 September 2020, from https://blog.scopus.com/topics/books Fanelli, D., & Larivière, V. (2016). Researchers? Individual publication rate has not increased in a century.

PLOS ONE, 11 (3), e0149504. doi: 10.1371/journal.pone.0149504. Fernald, J. G., & Jones, C. I. (2014). The future of U.S. economic growth (January 2014). NBER Working Paper No. w19830. Retrieved January 22, 2014, from http://ssrn.com/abstract=2384289 Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., . . . Barabási, A.-L. (2018). Science of science.

Science, 359 (6379). doi: 10.1126/science.aao0185. Gallant, A. R., & Fuller, W. A. (1973). Fitting segmented polynomial regression models whose join points have to be estimated.

Journal of the American Statistical Association, 68 (341), 144-147. Gieryn, T. F. (1982). Relativist/ constructivist programs in the sociology of science - redundance and retreat.

Social Studies of Science, 12 (2), 279-297. Godin, B. (2009). The value of science: Changing conceptions of scientific productivity, 1869 to circa 1970.

Social Science Information Sur Les Sciences Sociales, 48 (4), 547-586. doi: 10.1177/0539018409344475. Halpenny, D., Burke, J., McNeill, G., Snow, A., & Torreggiani, W. C. (2010). Geographic origin of publications in radiological journals as a function of GDP and percentage of GDP spent on research.

Academic Radiology, 17 (6), 768-771. doi: 10.1016/j.acra.2010.01.020. Hart, P. W., & Sommerfeld, J. T. (1998). Relationship between growth in gross domestic product (GDP) and growth in the chemical engineering literature in five different countries.

Scientometrics, 42 (3), 299-311. doi: 10.1007/Bf02458373. Herzog, C., Hook, D., & Konkiel, S. (2020). Dimensions: Bringing down barriers between scientometricians and data.

Quantitative Science Studies, 1 (1), 387-395. doi: 10.1162/qss_a_00020. Hilbe, J. M. (2014).

Modeling count data . Cambridge: Cambridge University Press. 37 Hook, D. W., Porter, S. J., & Herzog, C. (2018). Dimensions: Building context for search and evaluation.

Frontiers in Research Metrics and Analytics, 3 (23). doi: 10.3389/frma.2018.00023. Kim, J., & Kim, H. J. (2016). Consistent model selection in segmented line regression.

Journal of Statistical Planning and Inference, 170 , 106-116. doi: 10.1016/j.jspi.2015.09.008. King, D. A. (2004a). Correction.

Nature, 432 (7013), 8-8. doi: 10.1038/432008b. King, D. A. (2004b). The scientific impact of nations.

Nature, 430 (6997), 311-316. doi: 10.1038/430311a. Kostoff, R. N., & Shlesinger, M. F. (2005). CAB: Citation-assisted background.

Scientometrics, 62 (2), 199-212. doi: 10.1007/s11192-005-0014-8. Long, J. S., & Fox, M. F. (1995). Scientific careers - universalism and particularism.

Annual Review of Sociology, 21 , 45-71. Marx, W., & Bornmann, L. (2016). Change of perspective: Bibliometrics from the point of view of cited references-a literature overview on approaches to the evaluation of cited references in bibliometrics.

Scientometrics, 109 (2), 1397-1415. doi: 10.1007/s11192-016-2111-2. May, R. M. (1997). The scientific wealth of nations.

Science, 275 (5301), 793-796. McZgee, V. E., & Carleton, W. T. (1970). Piecewise regression.

Journal of the American Statistical Association, 65 (331), 1109-1124. doi: 10.1080/01621459.1970.10481147. Merton, R. K. (1988). The Matthew effect in science, II: Cumulative advantage and the symbolism of intellectual property.

ISIS, 79 (4), 606-623. Moed, H. F. (2017).

Applied evaluative informetrics . Heidelberg, Germany: Springer. Ntuli, H., Inglesi-Lotz, R., Chang, T., & Pouris, A. (2015). Does research output cause economic growth or vice versa? Evidence from 34 OECD countries.

Journal of the Association for Information Science and Technology, 66 (8), 1709-1716. doi: 10.1002/asi.23285. Panik, M. J. (2014).

Growth curve modelling . New York: Wiley. Price, D. J. D. (1965). Networks of scientific papers.

Science, 149 (3683), 510-515. Price, D. J. d. S. (1951). Quantitative measures of the development of science.

Archives Internationales d'Histoire des Sciences, 14 , 85-93. Price, D. J. d. S. (1961).

Science since Babylon . New Haven, CT, USA: Yale University Press. Price, D. J. d. S. (1963).

Little science, big science . New York, NY, USA: Columbia University Press. Salter, A. J., & Martin, B. R. (2001). The economic benefits of publicly funded basic research: A critical review.

Research Policy, 30 (3), 509-532. SAS Institute Inc. (2015).

SAS/STAT 14.1 user’s guide . Cary, NC.: SAS Institute Inc. Schwarz, C. J. (2015). Regression - hockey sticks, broken sticks, piecewise, change points. In

Course Notes for Beginning and Intermediate Statistics

Annual Review of Information Science and Technology, 34 , 249-286. Taylor, M. (2020). Open Access Books in the Humanities and Social Sciences: An Open Access Altmetric Advantage. arXiv e-prints , arXiv:2009.10442. Toms, J. D., & Lesperance, M. L. (2003). Piecewise regression: A tool for identifying ecological thresholds.

Ecology, 84 (8), 2034-2041. doi: 10.1890/02-0472. Tsoularis, A., & Wallace, J. (2002). Analysis of logistic growth models.

Mathematical Biosciences, 179 (1), 21-55. doi: 10.1016/S0025-5564(02)00096-2. Valsamis, E. M., Ricketts, D., Husband, H., & Rogers, B. A. (2019). Segmented linear regression models for assessing change in retrospective studies in healthcare. 38

Computational and Mathematical Methods in Medicine, 2019 . doi: 10.1155/2019/9810675. van Raan, A. F. J. (1999). Advanced bibliometric methods for the evaluation of universities.

Scientometrics, 45 (3), 417-423. Visser, M., van Eck, N. J., & Waltman, L. (2020). Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. arXiv:2005.10732. Retrieved from https://ui.adsabs.harvard.edu/abs/2020arXiv200510732V Wagner, A. K., Soumerai, S. B., Zhang, F., & Ross-Degnan, D. (2002). Segmented regression analysis of interrupted time series studies in medication use research.

Journal of Clinical Pharmacy and Therapeutics, 27 (4), 299-309. doi: 10.1046/j.1365-2710.2002.00430.x. Wang, K., Shen, Z., Huang, C., Wu, C.-H., Dong, Y., & Kanakia, A. (2020). Microsoft Academic Graph: When experts are not enough.

Quantitative Science Studies, 1 (1), 396-413. doi: 10.1162/qss_a_00021. 39

Appendix

Table S1. Results of the mixed effects segmented regression model (M ) with five segments and missing imputation Model part Parameter All publications Life Sciences Physical and Technical Sciences Value Standard error Doubling time [Year] Value Standard error Doubling time [Year] Value Standard error Double time Fixed effects

Intercept b -0.015 0.054 -45.5 0.007 0.020 94.0 -0.050 0.098 -13.4 Segment 5 b Random effects u σ u0 σ u1 u1u0 -.846 0.798 -.961* 0.043 -.885* 0.109 u σ u2 σ u3 σ u5 j σ ) with seven segments for growth domestic product (M ) and eight segments for publication counts (M ) for the UK data Model part Parameter Gross domestic product Publication counts Value Standard error Doubling time Value Standard error Doubling time Intercept b jj