Day of the week submission effect for accepted papers in Physica A, PLOS ONE, Nature and Cell
Catalin Emilian Boja, Claudiu Herteliu, Marian Dardala, Bogdan Vasile Ileanu
DDay of the week submission effect for accepted papers in Physica A, PLOS ONE, Nature and Cell
Catalin Emilian Boja a , Claudiu Herțeliu b,* , Marian Dârdală a , Bogdan Vasile Ileanu b a Department of Economic Informatics and Cybernetics, Bucharest University of Economic Studies, Bucharest, Romania, b Department of Statistics and Econometrics, Bucharest University of Economic Studies, Bucharest, Romania * Correspondence address. E-mail: [email protected], [email protected], Phone: +4 0722 455 586 th August 2018
Abstract:
The particular day of the week when an event occurs seems to have unexpected consequences. For example, the day of the week when a paper is submitted to a peer reviewed journal correlates with whether that paper is accepted. Using an econometric analysis (a mix of log-log and semi-log based on undated and panel structured data) we find that more papers are submitted to certain peer review journals on particular weekdays than others, with fewer papers being submitted on weekends. Seasonal effects, geographical information as well as potential changes over time are examined. This finding rests on a large (178 000) and reliable sample; the journals polled are broadly recognized (Nature, Cell, PLOS ONE and Physica A). Day of the week effect in the submission of accepted papers should be of interest to many researchers, editors and publishers, and perhaps also to managers and psychologists.
Keywords: day of the week effect, DWE, peer reviewed journals, seasonal effects, GIS
1. Introduction
In most of the world, weekends are set apart from weekdays. This delineation is a social convention and at first glance nothing seems to distinguish one day from another. However, Rossi and Rossi in (1977) found that positive moods are more likely to occur at weekends. French (1980) found that Monday stock returns tends to be negative while returns on Tuesday, Wednesday, Thursday and Friday tend to be positive. Chang et al. , 1993; Berument & Kiymaz, 2001; Bhattacharya et al. , 2003; Fidrmuc & Tena, 2015; Dhesi et al. , 2016 have also studied the DWE in the context of different financial settings. More recently DWE’s have been identified in ther areas such as ecology (Marr & Harley, 2002), medicine (Bell & Redelmeier, 2001), demography (Herteliu et al. , 2015), meteorology (Dessens et al. , 2001), risk calculations (Doherty et al. , 1998) and scientometrics (Cabanac & Hartley, 2013; Campos-Arceiz et al. , 2013; Hartley & Cabanac, 2017; Ausloos et al. , 2016; Ausloos et al. , 2017). Recently Cabanac & Hartley (2013) studied acceptance of technical manuscripts to journals and found big differences between week-day and week-end submissions. They found that submissions are at a maximum on Mondays. Hartley & Cabanac (2017) confirmed this result and also noticed that submissions are at a minimum at weekends. Ausloos et al. (2016) found that papers submitted on either Tuesdays or Wednesdays were more likely to be accepted for publication than papers submitted on weekends. Other reports (Shalvi et al. , 2010; Bornmann & Daniel, 2011; Schreiber, 2012) observed a seasonal effect – the so-called “seasonal overloading of editorial desks” - for decisions made during peer review process. Here using a large data set of papers published in academic journals, we confirm a DWE for the submission day of accepted papers, however seasonal or geographical signals are less noticeable. Papers submitted on Tuesdays are more likely to be accepted by Physica A and Nature whereas Wednesdays seem the most likely day to submit and secure acceptance to PLOS ONE. For Cell Mondays and Tuesdays seem the best submission days in case of accepted papers. Relative to previous researches, we introduce methodological improvements (meta-data scraper). Even where we confirm conclusions from previous studies, we rely on different econometric models (mix of log-log and semi-log based on undated and panel structured data) and visual elements (localization quotients).
2. Materials and Methods
The core data upon which the current paper relies is wholly public. Our approach is similar to that of
Cabanac & Hartley (2013). Even journals that hide papers behind a paywall freely offer (i) abstracts, (ii) authors’ information and (iii) a brief history of peer review process on their web sites. Using a tailor-made web scraper we extracted information from the following journals: Nature, Cell, PLOS ONE and Physica A: Statistical Mechanics and its Applications. The date when a manuscript is received by the journal’s system is provided by metadata and is labelled in similar manner: Nature uses “date received” while Cell, PLOS ONE and Physica A use “received”. If a first parsing of a paper’s information failed to retrieve the required data, the algorithm tried to parse again. In this way, we generated a large number of inquiries. Sometimes a publisher’s server banned our web scraper for several days. For this reason, the retrieved information is not exhaustive. Nevertheless, we recovered significant totals of the available data (Physica A: 83.4%; LOS ONE: 99.9%; Nature: 76.4% and Cell: 68.7%) in our target journals (as may be seen in the Table 1) and believe it is representative of the total. Moreover, since only electronic submissions offer precise information we collected data only for submissions after 2000. Papers that apparently did not pass a peer review process (editors’ lists, editorials, corrigenda, errata, etc.) signaled by no date for reception of either an initial or revised submission were removed from the data set.
Table 1.
Parsed papers sample by journal
No. Journal Time span (publication) Papers retrieved Items in Web of Science (WoS) Citable items in WoS Share of retrieved paper (%) (0) (1) (2) (3) (4) (5) (6) =(3/5)*100 1. Nature 2010-2016 4 653 18 462 6 092 76.4 2. Cell 2002-2016 3 777 8 408 5 497 68.7 3. PLOS ONE 2006-2016 160 172 168 289 160 306 99.9 4. Physica A 2001-2016 9 825 11 998 11 779 83.4
Total 178 427 207 107 183 674 97.1
Note:
We distinguish between the number of published papers/items according to WoS simple query (WWW1) and the citable items as so referred in Journal Citations Report JCR (WWW2). The Items per Citable Items Ratio (ICIR) varies: 1.02 (Physica A), 1.53 (Cell), 1.05 (PLOS ONE), 3.03 (Nature). Since JCR2016 is not released yet, in order to estimate the number of citable items in 2016, historical values for ICIR were used. The indexing process in WoS is not exactly synchronized with a calendar. E.g., on the 17 th of May 2018, the Physica A most recent issue (506) is (will be!) published on 15 th of September 2018 while for Nature the most recent issue (7705) was published on 17 th of May 2018. Therefore, the share of retrieved items in our dataset could occur to be larger than 100% sometimes (which is actually the case of PLOS ONE in 2009: 103.3%). We gathered dates of accepted papers’ submission using a Web scraper (Kobayashi & Takeda, 2000) developed to automatically search and retrieve needed information on the journals’ Web sites. The software application was developed by the authors using a Java platform. For parsing Web page content and extracting searched information we used the open-source Jsoup library (WWW3). Retrieved data was stored in a MySQL database (WWW4). The Web scraper executed recursively the following sequence of steps for each analyzed journal: 1.
The process starts with the journal Web page that contains web archive information for each issue. Starting from this page the scraper was able to extract the web link for each ssue. Each link was recorded, to avoid reaching the same link in future searches, and it was accessed, to process each issue’s Web page; 2.
From each issue’s Web page, recorded in the previous step, we extracted information for each article: title, authors, pages (if available) and their Web pages links for further processing. The sequence that searched for articles was implemented by a prior analysis Web page structure. For example, each online issue of the PLOS ONE journal defined the articles using a list of div HTML tags, each having the “article-block” class:
For each paper which passed a peer review process several variables were recorded: (i) journal; (ii) title; (iii) volume; (iv) author(s); (v) initial reception date; (vi) revised version reception date – f any; (vii) acceptance date – if any; (viii) online availability date – if any and (ix) number of pages – if available. Numerical values were assigned as follows: (x) number of pages – if available; (xi) number of authors; (xii) week-day of initial submission (1 for Monday, 2 for Tuesday, 3 for Wednesday, 4 for Thursday, 5 for Friday, 6 for Saturday and 7 for Sunday); (xiii) week-day of revised version (same codification like for variable xii); (xiv) week of initial submission (a number between 1 and 53); (xv) year of initial submission; (xvi) year of publication/ acceptance. In a second round of data collection, information regarding (xvii) corresponding authors’ affiliation country is recorded. For few thousand papers, the process of choosing affiliation country has been performed manually. Around a few hundred papers for which affiliation country was not clear enough (e.g. multiple corresponding authors from multiple countries, international organizations with no clear affiliation country, not enough data etc.) have been withdrawn from the database. For each journal distribution of papers by day of the week was tested (via Chi Square) against uniformity. The critical value for a significance level 0.05 and 6 degrees of freedom is 12.59. The same approach was conducted for the consolidated dataset. In addition the same procedure was applied if considering only week-days. The critical value being in this case 9.49 (for a significance level of 0.05 and 4 degrees of freedom). Furthermore, in order to test the sensitivity of deviations from uniformity, a regression model was designed. The time span for rolling over the calendar is the week. For each week and for each journal the following sequence was computed: UD ij =N ij /7 = ∑ 𝑛 𝑘𝑖𝑗7𝑘=1 = ∑ ∑ 𝑎 𝑘𝑖𝑗𝑝𝑎=17𝑘=1 (1) where UD ij denotes the uniform distribution of papers per day submitted during week i to journal j , N ij is the total number of papers received during week i by journal j and 𝑛 𝑘𝑖𝑗 is the number of papers submitted in the k th day of the week starting from Monday (1) to Sunday (7), in the week, i , of the year, for a particular journal j . Particularly a is the lowest unit recorded, the article and it is referred by the day of the week of submission ( k ), week of the year ( i ) and journal where it was submitted ( j ). In a given day, k from week i , journal j we have 0 to maximum p articles submitted. In this context, ratio to uniform distribution (R UD ) for a specific day k , of the week i for journal j is defined as follows: 𝑅𝑈𝐷 𝑘𝑖𝑗 = ∑ 𝑛 𝑘𝑖𝑗7𝑘=1 𝑈𝐷 𝑖𝑗 (2.1) .B. For “a” articles submitted in the same day of the week( k ), in the same week( i ) of the year we have the same RUD. For the consolidated data set of the journals, we have: 𝑅𝑈𝐷 𝑘𝑖 = ∑ ∑ 𝑛 𝑘𝑖𝑗7𝑘=1𝑝𝑗=1 𝑈𝐷 𝑖 , (2.2) where p =4, represents the number of journal taken into consideration In order to be clearer how the dependent variable is computed, an extract from the database of 47 PLOS ONE papers received in 14 weeks is presented in table 2. The step by step computation details are also available in table 2. The reception flow is quite diverse: there are weeks with one, two, three or four received papers, but there is one with 26. The daily flows are diverse too; most of the days show only one incoming paper but there is an exceptional one (7 th August, 2006) with 16.
Table 2.
Database extract and step by step computation for the dependent variable RUD kij
ID article (a) Received date Weekday (k=1=Monday to 7=Sunday) Received Week(i) UD ij RUD kij
Furthermore, a look on the dependent variable is important. This variable should be inserted as the dependent variable within the regression models. In order to do so, a preliminary inspection of its distribution (within each individual journal and also to the consolidated dataset) is necessarily. In order to use “Ordinarily Least Squares” (OLS) regression parameter estimations at least a normal distribution is expected for the dependent variable. A classical test – Jarque-Bera (JB) – for normality is used (table 3.a.). For each dataset, there are two hypothesis: H : normal (Gaussian) distribution and H : the distribution is not a normal one. The statistic test is computed by: 𝐽𝐵 = 𝑁 (
𝑆𝑘𝑒𝑤𝑛𝑒𝑠𝑠 + 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 ) where N is number of cases and skewness and kurtosis are measured basically by Pearson’s moment coefficients. This test follows a Chi square distribution with two degrees of freedom (most common critical value is: 𝜒 =5.99). A detailed presentation about formula used for Kurtosis and its components is available within supplementary information (SI_ Kurtosis.docx). able 3.a. Jarque-Bera normality test for RUD
Dataset (sample) Skewness Kurtosis Cases (N) Jarque-Bera test
Physica A 1.291 3.417 9 825 7 509.01 PLOS ONE 0.649 14.069 160 172 1 332 241.06 Nature 1.865 8.340 4 653 16 182.46 Cell 1.348 2.429 3 777 2 072.39 Consolidated 2.606 20.306 178 427 3 267 433.88
The potential dependent variable – RUD – does not follow normal distribution, no matter which dataset is considered. The most common way to adapt a non-Gaussian distribution for a potential use within the regression models is the log transformation (table 3.b.).
Table 3.b.
Jarque-Bera normality test for log RUD
Dataset (sample) Skewness Kurtosis Cases (N) Jarque-Bera test
Physica A -0.395 0.056 9 825 256.77 PLOS ONE -2.056 4.855 160 172 270 153.90 Nature -0.441 0.374 4 653 177.94 Cell -0.032 -0.397 3 777 25.45 Consolidated -1.414 4.014 178 427 179 243.15
The log transformation helps and the computed values for JB test became three digits for Physica A and Nature or even better – two digits – for Cell while within the consolidated dataset and PLOS ONE there are six digits figures. Two steps classical transformation (Linnet, 1987) are used. First one aims to avoid skewness while the second one tries to avoid kurtosis. Various parameters are tested for each dataset and those which minimize the JB test are kept (table 3.c.).
Table 3.c.
Two steps transformations and their subsequent Jarque-Bera normality test
Dataset (sample) First step transformation Second step transformation Skewness Kurtosis Cases (N) Jarque-Bera test
Physica A 𝑦 𝑃𝐴∗ = log 𝑅𝑈𝐷 − 0.06 𝑦 𝑃𝐴∗∗ = [(𝑦
𝑃𝐴∗ + 1) − 1] 1.82⁄ 𝑦 𝑃𝑂∗ = [(𝑅𝑈𝐷) − 1] 6⁄ 𝑦 𝑃𝑂∗∗ = [(𝑦
𝑃𝑂∗ + 1) −0.8 − 1] −0.8⁄ 𝑦 𝑁𝑇∗ = log 𝑅𝑈𝐷 + 0.05 𝑦 𝑁𝑇∗∗ = [(𝑦
𝑁𝑇∗ + 1) − 1] 0.8⁄ -0.065 0.224 4 653 13.00 Cell 𝑦 𝐶𝐿∗ = log 𝑅𝑈𝐷 − 0.05 𝑦 𝐶𝐿∗∗ = [(𝑦
𝐶𝐿∗ + 1) − 1] 1.15⁄
0 -0.386 3 777 23.45 Consolidated 𝑦 𝐶𝑇∗ = [(𝑅𝑈𝐷) − 1] 4⁄ 𝑦 𝐶𝑇∗∗ = [(𝑦
𝐶𝑇∗ + 1) −0.954 − 1] −0.954⁄ -0.067 0.302 178 427 811.55
After the transformations, the computed JB test for each sample looks much better even if the null hypothesis (Gaussian distribution) is still rejected. For Physica A, Nature and Cell the outcomes are small (two digits not far away from the critical value) while for PLOS ONE and the consolidated dataset there are three digits level records. A visual inspection (figure 1) of these distributions shows that there are plenty of outliers.
Physica A PLOS ONE r e l a t i v e f r e q u e n c y Nature Cell r e l a t i v e f r e q u e n c y r e l a t i v e f r e q u e n c y Consolidated dataset Figure 1.
Distributions of dependent variable by intervals a) Article unit model with week-day component
This model is a linear one: 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎1𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎2𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎3𝑖𝑗 +𝛽 𝑇𝐻𝑈 𝑎4𝑖𝑗 +𝛽 ∑ 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗7𝑘=6 + 𝜀 𝑎𝑘𝑖𝑗 (3) r e l a t i v e f r e q u e n c y r e l a t i v e f r e q u e n c y here 𝛽 to 𝛽 are regression parameters to be estimated. In this kind of model a is the index for a certain article received in the day k , week i for journal j , as defined above at (1). It worth to underline that day of the week dummy from (3) will vary only by index k . MON, TUE, WED, and
THU are dummy variables (1 if paper was submitted in a Monday or Tuesday or Wednesday or Thursday and 0 otherwise).
WEEKEND is also a dummy variable (generally, 1 if paper was submitted in a Saturday or in a Sunday and 0 otherwise). However, a subtle approach, introduced recently (Campos-Arceiz et al. , 2013), is used for manuscripts originating from almost two dozen of Countries having Special Weekends (SWC). It is about: Afghanistan, Algeria, Bahrain, Bangladesh, Brunei, Egypt, Hong-Kong, Iraq, Israel, Jordan, Kuwait, Libya, Mauritania, Nepal, Oman, Qatar, Saudi Arabia, Sudan, Syria, United Arab Emirates, and Yemen. Most common weekend type for this SWC is, currently, Friday and Saturday. It was also taken into account the changing of the weekend rules over time (2006-2009) as has been detailed by Campos-Arceiz et al. (2013). FRIDAY is considered to be the reference point. ε denoting the “perturbance” (here as well as in the next models), due to other factors, is assumed to be a white noise as usual in such regression model schemes. b) Article unit model with seasonal component
In order to test other reported (Shalvi et al. , 2010; Bornmann & Daniel, 2011; Schreiber, 2012) seasonal effects, three dummy variables are used (1 if paper was submitted within a season and 0 otherwise). Classic definition of seasons, as Hartley (2011) suggests, is used:
SPRING (March to May), S
UMMER (June to August),
FALL (September to November) and
WINTER (December to February). 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (4) WINTER is considered to be the reference point and, the indexes a , k , i , j have the same significance as in equation (3). Here we note that season dummies from (4) will vary only by index i , according to the position of the week, i c) Article unit model with geographic component Geographical data (corresponding authors’ affiliation country) is included in this model using four dummy variables (1 if the paper is originated within a specific continent and 0 otherwise):
AFRICA , AMERICA , ASIA , and
OCEANIA . 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 + 𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (5)
EUROPE is considered to be the reference point, the indexes a , k , i , j have the same significance as in equation (3). ere the geographic component from (5) will vary only by index a , since this variable is associated with the author of a certain article: a . d) Control variables and subsequent models We are aware that there are plenty of factors which can influence our dependent variable. Previous research demonstrates that the scientific production, in general, can be affected by various factors: country size (Lippi & Mattiuzzi, 2017); economic level (Bernardes & Albuquerque, 2003; Lippi & Mattiuzzi, 2017); level of funding and/ or science policy (Henriques & Larédo, 2013; Crespi & Geuna, 2008; Ebadi & Schiffauerova, 2016); team size (Ebadi & Schiffauerova, 2016); non-economic factors (Inönü, 2003). We select from this list two factors which can affect scientific papers’ production (and flows):
HDI (the numeric figure for each country regarding Human Development Index ) as a proxy for the economic (lack) of development and AUTHORS which denotes the number of authors for the paper j . In case of the AUTHORS, it is expected that an increase in team size also increases the chance that a co-author who does not work (no matter what kind of activity: manuscript design, preparation for submission, agreement on the final form etc.) outside of the regular program (e.g. during the weekends). Subsequently, this control factor should be positively correlated to the dependent variable. Furthermore, since the current paper is related in a great extent to the timing, we are identifying a very easy to measure important period of time which may interfere on leisure/ working time: C HRISTMAS is a dummy variable (1 if paper was submitted between 20 th of December and 10 th of January and 0 otherwise). When is about leisure/ working time balance, an interesting idea was coined by Hofstede et al. (2010:251). Here they explain what are the differences between Short Term (STO) and Long Term Oriented (LTO) countries. They state that for LTO countries “ leisure time is not important ” while for STO countries “ leisure time is important ”. Therefore, in our case is expected that countries with a higher LTO index to work harder during the week-ends. Hence, we take exact figures about
LTO for 93 countries (Hofstede et al. 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 ++𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (6) 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 +𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 + Data for HDI2016 is available on: http://hdr.undp.org/en/data . It is known that HDI is a geometric mean of three normalized indices (health, education, and income). 𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 ++𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (7) 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 +𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 ++𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 ++𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐶𝐻𝑅𝐼𝑆𝑇𝑀𝐴𝑆 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (8) 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 +𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 ++𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐶𝐻𝑅𝐼𝑆𝑇𝑀𝐴𝑆 𝑎𝑘𝑖𝑗 + 𝛽 log 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑎𝑘𝑖𝑗 +𝜀 𝑎𝑘𝑖𝑗 (9) 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 +𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 ++𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐶𝐻𝑅𝐼𝑆𝑇𝑀𝐴𝑆 𝑎𝑘𝑖𝑗 + 𝛽 log 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑎𝑘𝑖𝑗 ++𝛽 log 𝐻𝐷𝐼 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (10) 𝑦 𝑎𝑘𝑖𝑗∗∗ = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝑈𝐸 𝑎𝑘𝑖𝑗 + 𝛽 𝑊𝐸𝐷 𝑎𝑘𝑖𝑗 + 𝛽 𝑇𝐻𝑈 𝑎𝑘𝑖𝑗 +𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑎𝑘𝑖𝑗 +𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑎𝑘𝑖𝑗 +𝛽 𝐹𝐴𝐿𝐿 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝑀𝐸𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐴𝐹𝑅𝐼𝐶𝐴 𝑎𝑘𝑖𝑗 ++𝛽 𝐴𝑆𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝑂𝐶𝐸𝐴𝑁𝐼𝐴 𝑎𝑘𝑖𝑗 +𝛽 𝐶𝐻𝑅𝐼𝑆𝑇𝑀𝐴𝑆 𝑎𝑘𝑖𝑗 + 𝛽 log 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑎𝑘𝑖𝑗 ++𝛽 log 𝐻𝐷𝐼 𝑎𝑘𝑖𝑗 +𝛽 log 𝐿𝑇𝑂 𝑎𝑘𝑖𝑗 + 𝜀 𝑎𝑘𝑖𝑗 (11) The meaning of the indexes a , k , i , j , are the same as in (3), underlining that the AUTHORS, HDI and LTO will vary only by index a , CHRISTMAS by particular cases of index i . When the consolidated dataset is considered, the models are the same, the only thing which differs from the models 6-11 is the computations of the dependent variable, y. In this case based on the relation 2.2 and the applied transformation the variable will be 𝑦 𝑎𝑘𝑖∗∗ The approach for the model design is from simple to complex adding, step by step, specific factors. Hence, at first step (6) the days of the week are combined with seasons while at the second step (7) continents are added. Furthermore, one by one the control factors are inserted: CHRISTMAS (8), AUTHORS (9), HDI (10) and LTO (11). In order to check potential behavior changes over the time, the dataset is also analyzed in a longitudinal way. The covered timespan is split in five roll windows as follows: [2000-2004], [2005-2007], [2008-2010], [2011-2013], and [2014-2016]. The models (3-11) are run on every roll indow. The same procedure is followed when data set relies only on the individual journals (Nature, Cell, Physica A and PLOS ONE). The regression models were validated via classical tests for: (i) model itself, (ii) estimated parameters (iii) residuals and (iv) multicollinearity. For all tests, the maximum significance level was set to be equal to 0.1. e) The Panel regression
We define a particular structure of panel as follows: the time component is given by T=2 435 consecutive days from 01.01.2010 to 31.08.2016 and the cross-section is represented by top N=11 countries. Now the characteristics are referred a cell of type ( c , t ), the c being the index for countries and t for the time(day). The 𝑅𝑈𝐷 𝑘𝑖𝑗 defined in (2.1. and 2.2.) becomes
𝑅𝑈𝐷 𝑐𝑡 , the variation of day of the week k in the week i is now contained in c index. With the experience of previous models, in line with (Orazbayev, 2017), a test and a control for other factors is performed. The following model is used: 𝑙𝑜𝑔 𝑅𝑈𝐷 𝑐𝑡 = 𝛽 + 𝛽 𝑀𝑂𝑁 𝑐𝑡 + 𝛽 𝑇𝑈𝐸 𝑐𝑡 + 𝛽 𝑊𝐸𝐷 𝑐𝑡 +𝛽 𝑇𝐻𝑈 𝑐𝑡 + 𝛽 𝑊𝐸𝐸𝐾𝐸𝑁𝐷 𝑐𝑡 ++𝛽 log 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑐𝑡 +𝛽 𝐶𝐻𝑅𝐼𝑆𝑇𝑀𝐴𝑆 𝑐𝑡 + 𝛽 𝐹𝐴𝐿𝐿 𝑐𝑡 ++𝛽 𝑆𝑈𝑀𝑀𝐸𝑅 𝑐𝑡 +𝛽 𝑆𝑃𝑅𝐼𝑁𝐺 𝑐𝑡 + 𝛽 𝑙𝑜𝑔 𝐻𝐷𝐼 𝑐𝑗 +𝛽 𝑙𝑜𝑔 𝐿𝑇𝑂 𝑐𝑗 +𝛽 𝑙𝑜𝑔 𝑡 + 𝑢 𝑐 +𝜆 𝑐𝑡 (12) In order to keep a relevant daily granularity, we remove some parts of the dataset. First, from the available set of countries, only the top 11 countries were chosen for which total number of papers is more than 77% out of the total papers. The rest of 23% dataset was formed by countries which have a very small number of articles. Second, in the beginning of the period (2000-2009) in most of the days, the number of submitted papers was zero. Therefore a minimal daily granularity was not possible for this specific timespan. Hence, after timespan reduction as well as countries’ removal, the dataset used for panel regression records 120 258 papers representing more than two thirds out of the initial 178 427 papers, accumulated in a N x T =26 785 panel cells as defined above in this sub-section. In (12) the dependent variable is transformed: 𝑙𝑜𝑔 𝑅𝑈𝐷 𝑐𝑡 = ∑ 𝑙𝑜𝑔 𝑅𝑈𝐷 𝑎𝑐𝑡𝑝𝑎=1 (13) where p is the number of articles published in the day t from country c . The HDI and LTO are covariates specific to a country j . They were considered as fixed in time. The number of authors is also calculated as a mean, since in a specific day t as there are more articles with different number of authors. Hence 𝑙𝑜𝑔 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑐𝑡 = ∑ 𝑙𝑜𝑔 𝐴𝑈𝑇𝐻𝑂𝑅𝑆 𝑎𝑐𝑡𝑝𝑎=1 (14) In (12) the u c is the cross-sectional random component which measures the impact of other unknown factors associated with the results. 𝜀 𝑐𝑡 = 𝑢 𝑐 + 𝜆 𝑐𝑡 (15) enotes the total perturbation component Moreover, in particular cases, we have multiple articles from country j in a specific day t . We denote this factor as n ct and is treated as a weighting component. When used this component contains a multiplication of each value of continuous variable Y ct or X ct with the n jt value. Before running the estimation, the dependent variable was tested for the presence of unit root using specific panel data tests using LLC, IPS-WStat, ADF-Statistic. The null hypothesis of unit root presence was rejected each time. To obtain more insight into the geographical distribution of the manuscripts from the dataset GIS techniques and tools were used. Spatial data was obtained from WWW5 (2017). This contains at the 0 level the country shape. Data from POP2000, which stores the country’s population was recorded for the year 2000. In this way we built a spatial database with all data (spatial and nonspatial) and so could display all the data on a map. Graduated colors allow us to emphasize the intensity of publication activity at the country level. The ArcGIS software package then implemented the spatial database and enabled us to build customized geoprocessing tools for the spatial analysis. Thus, we have three scenarios for spatial analysis: -
Compute and display the total number of papers published by corresponding authors for each country; -
Compute and display the total number of papers published by corresponding authors per population for each country (as being shown in Figure SI.1. – Supplementary Information); -
Compute and display the Localization Quotient (LQ) in order to measure the intensity of publication activity from a country taking into consideration the moment of paper submission. We adapted a model (Furtună et al ., 2013) for the Localization Quotient indicator to determine the intensity of publication activity in a country relative to the whole world. The time period can be the day, week, month or year of submission or any combination of these combined with journal filtering. Our geoprocessing tool computed the Localization Quotient indicator and displayed it on the map. The model uses as input parameters the selection expression and the number of classes. A complex expression is based on logical operators that allow to establish nonstandard time period. For instance, to select only papers submitted in TUE, WEN, THU week days, the expression is: [received_week_day] = 2 OR [received_week_day] = 3 OR [received_week_day] = 4
In order to select papers submitted SAT and SUN week days and only from
Cell journal, the expression is: received_week_day] = 2 OR [received_week_day] = 3 AND [idj] = 3 where idj is the field for journal identification and the 3 value identify the
Cell journal. Number of classes is used to divide the countries in many groups, each country from a group will be filled with the same color. The statistical method for data classification is
Natural breaks (WWW6, 2017) This method forms the groups so that the variance of the values within the class is minimal and the variation of the values among classes is maximum. The geoprocessing model is illustrated in figure SI.2. (from the supplementary information). It uses predefined operation from
ArcToolbox component and customized operations defined to compute the Localization Quotient indicator (LQ) and to apply the graduated colors symbology on maps (Simb_GC). In order to define the custom operators, it was used the Python language to write de source code, which is available within Supplementary Information section.
3. Data analysis
Thanks to electronically available information we were able to study large data sets. However, the data sets only include information about papers that were accepted. This is in contrast to the smaller data set used by Hartley & Cabanac (2017) and Ausloos et al. (2016) which also included information about rejected and withdrawn papers. There is no information about rejected/ withdrawn papers in our set. We find that of the accepted papers, more were submitted on week days than on weekends (Figure 2).
Mon Tue Wed Thu Fri Sat Sun
PhysicaA Uniform
Mon Tue Wed Thu Fri Sat Sun
PLOS ONE Uniform
Figure 2.
The share of papers submitted on each day of the week for the examined journals
Note:
Percentages were computed for each journal. Chi Square values were computed based on absolute figures for each journal. It is found that the grouping factor (day of the week) is statistically significant (after performing a Chi Square test: p value is less than 0.01), - no matter which journal. The computed values for Chi Square test were: 994.3 (Physica A), 25 321.55 (PLOS ONE), 648.00 (Nature) and 657.82 (Cell). However, if weekends are excluded, only for PLOS ONE there are statistically significant differences among the days. The computed values for Chi Square test being: 7.47 (Physica A), 162.15 (PLOS ONE), 6.85 (Nature) and 1.34 (Cell) while the critical value is: 𝜒 =9.49. Most of the accepted manuscripts are submitted on Tuesdays for Physica A (17.7%) and Nature (18.4%), On Wednesdays for PLOS ONE (18.5%) while for Cell, Mondays and Tuesdays register 18.4% each. Weekends’ submissions are more rarely being under 10% for Cell, around 11% for PLOS ONE, 12% for Nature and more than 14% in case of Physica A. No matter which journal, there are 2-3 times more published papers submitted in any week-day when compare to those which have been submitted Saturday or Sunday. The geographic location of the papers’ corresponding authors (PCA) is another important dimension to be analyzed. Still, this is far from the current research topic. To put our data into context and as an indirect validation that our consolidated sample is consistent, a brief description from the geographical point of view is available in the supplementary Information section.
We performed multiple pooled or panel regression analysis to see clearly how variation in the data depended on the day of the week a paper was submitted. Description analysis (one to one) could also provide interesting information such as we show in figures 2 or SI.3 and SI.4. However, we believe that a regression model better emphasizes the reality since more than one factor can be inserted simultaneous within the model. We are aware that there are a lot of other cofounding variables (authors’ gender, age, ethnic and religious affiliation, academic – or not – position, manuscript’s number of references, time – hour/minute etc.) which are not included due to the Mon Tue Wed Thu Fri Sat Sun
Nature Uniform
Mon Tue Wed Thu Fri Sat Sun
Cell Uniform ata unavailability. Still, even with these limitations the analysis includes the regression tool in addition to the descriptive ones. The initial estimations methods were classic OLS for article unit models, models 6-11. Several tests such as Jarque Bera for testing the normality of residuals from regression, White or Bresuch-Pagan-Godfrey for heteroscedasticity effects of the residuals and Variance Inflation Factor (VIF) for testing the linear dependence of regressor s were performed. The classical assumptions of OLS were, in majority of the cases, rejected. Thus, the residuals are not coming from a normal distribution and usually they are heteroskedastic. Moreover, the non-commune behavior or even outliers are omnipresent, affecting the assumption and the quality of the OLS results. Hence, in order to mitigate this problem, the dependent variable was transformed as we described above and the OLS method was replaced by Robust Least Square (RLS). For the RLS method we selected a Bisquare optimization function. For the scale estimates the median centered method (MAD) was used whereas for the method, the M-estimation. The tuning parameters was kept as proposed in the paper of Holland & Welsch (1977). Our option regarding the selection of objective function, scale estimated, M-estimation method and tuning constants takes into account the spread and number of the outliers (above and below the mean) and the fact that on large samples the objective function seems not to discriminate the model regarding power (Ozlem, 2011) There is still an exception from such a rule: the robust regression method fails to estimate anything for the sample of manuscripts published by PLOS ONE. In this case, we provide only OLS estimation. When all results are analyzed we find solid evidence about consistency of the particular regression approaches, thus the existence of heteroskedasticities effects does not bias the main conclusions. The Panel-EGLS (Wooldridge, 2002) with cross-section random effects was used for model 12. All regressions (individual journals and consolidated data: Table 4 and 5 by time spans or by whole period) are valid (after applying ANOVA/ F test) with a very good significance (p value less than 0.01) while the levels of R are varying on a wide range. This low proportion of variance for the dependent variable is expected since a limited number of factors were available for modelling. In addition, sometimes in case of such large datasets, information can be very noisy. able 4. Regression estimates for the dependent variables for consolidated data set and each journal for each model
A. Consolidated data set
Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11)
0 1 2 3 4 5 6 7 8 9
Intercept
Monday (1 yes, 0 no) 0.01*** 0.01*** 0.019*** 0.011*** 0.011*** 0.0144*** 0.014***
Tuesday (1 yes, 0 no) 0.087*** 0.087*** 0.088*** 0.087*** 0.087*** 0.09*** 0.089***
Wednesday (1 yes, 0 no) 0.098*** 0.098*** 0.098*** 0.098*** 0.098*** 0.1*** 0.1***
Thursday (1 yes, 0 no)
Weekend (1 yes, 0 no) -0.539*** -0.539*** -0.538*** -0.541*** -0.541*** -0.538*** -0.538***
Adjusted R (weighted) 0.609*** Spring (1 yes, 0 no)
Summer (1 yes, 0 no) -0.0004 -0.0033*** -0.0035*** 0.0089*** 0.008*** 0.0091*** 0.009***
Fall (1 yes, 0 no) -0.0034*** -0.0029*** -0.0029*** 0.0095*** 0.009*** 0.009*** 0.009***
Adjusted R (weighted) America (1 yes, 0 no)
Africa (1 yes, 0 no) -0.0341*** -0.014*** -0.018***
Asia (1 yes, 0 no) -0.0461*** -0.006*** -0.005*** -0.005*** -0.005*** -0.003***
Oceania (1 yes, 0 no) -0.0391*** -0.01*** -0.009*** -0.009*** -0.0105*** -0.016***
Adjusted R (weighted) Christmas (1 yes, 0 no)
Adjusted R (weighted) log AUTHORS -0.0047*** -0.005*** -0.0046***
Adjusted R (weighted) log HDI -0.001*** 0.0013***
Adjusted R (weighted) log LTO -0.013***
Adjusted R . Physica A: Statistical mechanics and its applications Intercept
Monday (1 yes, 0 no)
Tuesday (1 yes, 0 no)
Wednesday (1 yes, 0 no)
Thursday (1 yes, 0 no)
Weekend (1 yes, 0 no) -0.215*** -0.215*** -0.212*** -0.213*** -0.213*** -0.214*** -0.219***
Adjusted R (weighted) 0.104*** Spring (1 yes, 0 no) -0.007 -0.006 -0.005 0.008 0.008 0.009 0.008
Summer (1 yes, 0 no) -0.003 0.002 0.002 0.016** 0.016** 0.017** 0.017**
Fall (1 yes, 0 no)
Adjusted R (weighted) America (1 yes, 0 no) -0.0092 -0.006 -0.006 -0.005 -0.003 0.009
Africa (1 yes, 0 no) -0.0041 0.02 0.019 0.02 0.029 0.06**
Asia (1 yes, 0 no) -0.0437*** -0.025*** -0.025*** -0.022*** -0.017** -0.019**
Oceania (1 yes, 0 no) -0.0594*** -0.068*** -0.066*** -0.066*** -0.066*** -0.069***
Adjusted R (weighted) Christmas (1 yes, 0 no)
Adjusted R (weighted) log AUTHORS -0.0146*** -0.0146*** -0.012**
Adjusted R (weighted) log HDI
Adjusted R (weighted) log LTO
Adjusted R C. PLOS ONE Intercept
Monday (1 yes, 0 no)
Tuesday (1 yes, 0 no) ednesday (1 yes, 0 no)
Thursday (1 yes, 0 no)
Weekend (1 yes, 0 no) -0.481*** -0.481*** -0.478*** -0.48*** -0.48*** -0.48*** -0.481***
Adjusted R Spring (1 yes, 0 no) -0.015*** -0.02*** -0.02*** 0.017*** 0.017*** 0.017*** 0.017***
Summer (1 yes, 0 no) -0.018*** -0.024*** -0.025*** 0.013*** 0.013*** 0.013*** 0.013***
Fall (1 yes, 0 no) -0.021*** -0.025*** -0.025*** 0.013*** 0.013*** 0.013*** 0.013***
Adjusted R America (1 yes, 0 no)
Africa (1 yes, 0 no) -0.04*** -0.014*** -0.016*** -0.016*** -0.006 -0.014*
Asia (1 yes, 0 no) -0.058*** -0.013*** -0.015*** -0.015*** -0.011*** -0.008***
Oceania (1 yes, 0 no) -0.043*** -0.013*** -0.012*** -0.012*** 0.013*** -0.02***
Adjusted R Christmas (1 yes, 0 no)
Adjusted R log AUTHORS -0.0009 -0.001 -0.0003
Adjusted R log HDI
Adjusted R log LTO -0.018***
Adjusted R D. Nature Intercept
Monday (1 yes, 0 no)
Tuesday (1 yes, 0 no) -0.001* -0.0009 -0.001 -0.0008 -0.0008 -0.0009 -0.0009
Wednesday (1 yes, 0 no) -0.016*** -0.017** -0.0169** -0.016** -0.016** -0.016** -0.0159**
Thursday (1 yes, 0 no) -0.033*** -0.032*** -0.032*** -0.030*** -0.030*** -0.030*** -0.030***
Weekend (1 yes, 0 no) -0.188*** -0.188*** -0.188*** -0.187*** -0.187*** -0.187*** -0.187***
Adjusted R (weighted) 0.19*** Spring (1 yes, 0 no) -0.021*** -0.017*** -0.018*** -0.008 -0.008 -0.008 -0.008 ummer (1 yes, 0 no) -0.005 -0.004 -0.004 0.004 0.004 0.004 0.004
Fall (1 yes, 0 no) -0.009 -0.005 -0.005 0.003 0.003 0.003 0.003
Adjusted R (weighted) America (1 yes, 0 no)
Africa (1 yes, 0 no)
Asia (1 yes, 0 no)
Oceania (1 yes, 0 no) -0.0046 0.01 0.01 0.01 0.01 -0.003
Adjusted R (weighted) Christmas (1 yes, 0 no)
Adjusted R (weighted) log AUTHORS
Adjusted R (weighted) log HDI
Adjusted R (weighted) log LTO -0.029
Adjusted R E. Cell Intercept
Monday (1 yes, 0 no)
Tuesday (1 yes, 0 no)
Wednesday (1 yes, 0 no)
Thursday (1 yes, 0 no)
Weekend (1 yes, 0 no) -0.1416*** -0.1428*** -0.143*** -0.1431*** -0.1431*** -0.143*** -0.1431***
Adjusted R (weighted) 0.0468*** Spring (1 yes, 0 no) -0.0257** -0.0275** -0.0269** -0.0136 -0.0136 -0.0135 -0.0135
Summer (1 yes, 0 no) -0.0286** -0.0274** -0.027** -0.0136 -0.0136 -0.0136 -0.0135
Fall (1 yes, 0 no)
Adjusted R (weighted) America (1 yes, 0 no) -0.0002 0.0059 0.0054 0.0053 0.0059 -0.0212
Africa (1 yes, 0 no) sia (1 yes, 0 no)
Oceania (1 yes, 0 no)
Adjusted R (weighted) Christmas (1 yes, 0 no)
Adjusted R (weighted) log AUTHORS -0.001 -0.001 -0.0007
Adjusted R (weighted) log HDI -0.0934 -0.0713
Adjusted R (weighted) log LTO -0.0697
Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% he first point we notice is that once the variables for day of the week are introduced, the other three groups (seasons, continents and controls) add very little new information. After each step (variable or a group of variables entered in the model) further increase of the explained variance (measured by adjusted R ) for the dependent variable is always very small. Second, no matter if it is about consolidated data set or journal specific or roll window, Weekend found to be with relevant impact both statistical and practical. Thus, every time the regression parameters were negative for that factor while statistically significant at the level 1%. Other variables testing the day of the week effect (Monday, Tuesday, Wednesday or Thursday) were more volatile. The negative influence of Weekends submissions seems to have a rational explanation: the quality of the papers might be thought to be increased after supplementary checks performed by research teams or their peers. The “negative” accepted paper submission effect during the week-end might be related to geography. There is an important share of population around the world (e. g. Muslim) which rest on Friday and works on Sunday. Of course, one might say that an important proportion of the people are not doing anything professional on weekends, neither writing, researching nor submitting. Thirdly, the estimated regression parameters for Christmas are always positive (no matter if it is about the consolidated data set or specific journal samples). With regard to this outcome, we note that scholars tend to submit papers during Christmas time more often than in other periods. Apart of the negative Weekends submissions effect, a closer look at the day of the week group of variables is necessarily since, as we mentioned above this is the most important one. Due to the size effect, PLOS ONE sample seems to drive the outcomes of the consolidated dataset. Indeed, for this two samples, the regression’s coefficients for all factors (Monday, Tuesday, Wednesday and Thursday) are synchronized: all of them are positive while their effective values are 4-5 times bigger for Tuesday, Wednesday and Thursday when compare to Monday. Since Friday was the reference point for this group, this means that for PLOS ONE accepted submissions are most likely to happen Tuesday, Wednesday and Thursday and, in a smaller amount, Monday. This is an analytical confirmation of the visual information presented for PLOS ONE earlier (figure 2). For Nature sample, there are two factors (Wednesday and Thursday) which are, while statistically significant, negative. The effective values for the regression’s coefficients in this case are 4-10 times smaller while compare to Weekends. This means that apart to weekend, accepted submissions to Nature occurs usually not Wednesday or Thursday. The Cell and Physica A sample shown a clear positive Monday effect while for some simpler (not with so many control factors inserted) models, for Physica A a positive Tuesday effect is visible. As mentioned in the material and methods section, the seasonality is checked systematically. As we mention above, this variables group is adding only a tiny fraction of new information within ll models. Let us split again our comments, first to look to the consolidated data set and its main driver (the PLOS ONE sample) together and after to have an opinion about the other three samples (Physica A, Nature and Cell). Due to the very large number of cases, for the consolidated and PLOS ONE data sets, every seasonal factor which is inserted in the model prove to be statistical significant. Still, there is a sign volatility (models M1-M5 versus M6-M9). This is a classical sign for multicollinearity. Indeed, the correlations matrix (Table_SI.2 from Supplementary Information) for this samples shows negative Pearson’s correlation coefficients between seasons with effective values between 0.33 and 0.35 and also a negative relationship to Christmas (correlation coefficients around 0.13-0.14). In this context, the sign volatility between models M1-M5 and M6-M9 appears to be natural since the control factor (Christmas) is present within models M6-M9. The outcomes for the other three samples (Physica A, Nature and Cell) show the same signs volatility. In addition, sometimes there is also a statistical significance volatility (M1-M5 versus M6-M9). The main reason for this volatility is the same: a persistent correlation between seasonal and Christmas factors. Also, the statistical significance volatility could occur due to smaller samples (size effect) under weak factors’ correlations within the dependent variables. The geographical factors from the regression models (America, Africa, Asia, and Oceania while Europe is the reference) are many times statistically insignificant. This geographic volatility (in terms of how many times a factor is statistical significant and how many times there is a change in the sign of the estimated parameter) needs a deeper analysis. The most probable explanation for this volatility is similar to the seasonal factors. There is a sample size effect: as long as the samples are bigger (consolidated data set, PLOS ONE and, in a lesser amount, Physica A) the signs and statistical significance volatility remains lower. Due to the weaker correlation to the dependent variable and persistent negative correlations between geographical factors, when smaller samples are analyzed the outcomes tend to be volatile. Due to the large number of models and samples when one is multiplying using the predefined roll windows, the outcomes (tens of pages of information) exceed the limited space available within a regular paper. Therefore, all detailed outcomes of the regression models for each roll window are available within Table SI.3. from the Supplementary Information while a brief visual presentation about regression coefficients’ signs and their significance is available within figure 3.
477 18
159 1695 8682-10-7-7-35 -3 -2-1-13-10 -3 -7 -7-21-14 -11 -14 -11-17 -9-6-15 -6 -2-2 -42 -32 -22 -12 -2 8 18 28 38-72 -52 -32 -12 8 28 48 InterceptMondayTuesdayWednesdayThursdayWeekendSpring
Summer
FallAmericaAfricaAsiaOceaniaLog10ChristmasLog10Authors
Log10HDI
Log10LTO
Physica A + * + + ? - * - - ?
77 42-7-7-21 -13 -8-14 -1-7-11 -7 -1 -1 -6 -12-6-2 -5 -2-2-7 -6-6-6 -5 -1-2-1 -35 -25 -15 -5 5 15 25 35 45-60 -40 -20 0 20 40 InterceptMondayTuesdayWednesday
Thursday
WeekendSpringSummerFallAmericaAfricaAsiaOceaniaLog10ChristmasLog10AuthorsLog10HDILog10LTO
PLOS ONE + * + + ? - * - - ? -11 -2-1-1-14-14-1 -10 -15-8 -6 -4-2-7-6 -2-1 -35 -25 -15 -5 5 15 25 35 45-60 -40 -20 0 20 40
InterceptMondayTuesdayWednesday
Thursday
WeekendSpringSummerFallAmericaAfricaAsiaOceaniaLog10ChristmasLog10AuthorsLog10HDILog10LTO
Nature + * + + ? - * - - ?
114 8 147
14 1012224652-10 -4-28 -14 -15 -1-14-21-4-17-7 -13 -6 -14-12-10-7-8-9 -5-3 -35 -25 -15 -5 5 15 25 35 45-60 -40 -20 0 20 40
InterceptMondayTuesdayWednesday
Thursday
WeekendSpringSummerFallAmericaAfricaAsiaOceaniaLog10ChristmasLog10AuthorsLog10HDILog10LTO
Cell + * + + ? - * - - ?
Figure 3.
Distributions of regression coefficients’ signs
Note: “+ *” denotes a statistically significant positive coefficient, “+” denotes a positive coefficient, “+ ?” denotes a coefficient for which the standard error could not be computed and, therefore, the t test for testing statistical significance is unavailable. The notations for negative coefficients are similar.
Since the samples’ size for each roll window became smaller and smaller, as expected, sometimes the volatility of the regression coefficients or their statistical significance became greater. Even so, the main conclusions: most important factor (Weekend); most important group of factors (days of the week) and most important control factor (Christmas) still stand. The results achieved after the panel analysis are convergent with those achieved under the unstructured sample. Here another proof for a significant impact of the week-day on the dependent variable emerges. Also it can be highlighted that other factors such as LTO or HDI are rather non-significant. The unobserved cross-sectional factors are found to be not relevant in the panel structure, as their share in total variation of the unobserved factors is almost zero (table 5.)
Table 5.
Panel regression estimates for the dependent variables for consolidated data set and PLOS ONE
Covariates Consolidated dataset PLOS ONE dataset
81 9866 322 185 141414
711 91 82-21 -5 -7-13 -3-9-9 -2-6-7 -9 -6-7 -3-4-2-2 -5-14 -7-3 -2-11-10-10 -6-2 -2 -35 -25 -15 -5 5 15 25 35 45-60 -40 -20 0 20 40 InterceptMondayTuesdayWednesday
Thursday
WeekendSpringSummerFallAmericaAfricaAsiaOceaniaLog10ChristmasLog10AuthorsLog10HDILog10LTO
Consolidated dataset + * + + ? - * - - ? o weighted Weighted by n ct No weighted Weighted by n ct Intercept
Monday (1 yes, 0 no) - - - -
Tuesday (1 yes, 0 no) 0.069*** 0.374*** 0.094*** 0.547***
Wednesday (1 yes, 0 no) 0.078*** 0.432*** 0.113*** 0.657***
Weekend (1 yes, 0 no) -0.374*** -0.860*** -0.357*** -0.792*** log AUTHORS -0.001 0.028*** 0.002 0.049***
Christmas (1 yes, 0 no) - - - -
Spring (1 yes, 0 no) -0.011*** -0.013 -0.014*** -0.017
Summer (1 yes, 0 no) -0.012*** 0.006 -0.020*** -0.011
Fall (1 yes, 0 no) - - - -
Log10HDI
Log10LTO
Ln(t)=(TREND) -0.012*** -0.099*** -0.004*** -0.083***
Adjusted R 𝝆 = 𝝈 𝒖𝒄𝟐 𝝈 𝜺𝒄𝒕𝟐 Cases
Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% Log10 denotes, logarithm in base 10, and ln the natural logarithm
The results from Table 5 are based on the model 12 with the component mentioned in 13-15
The regression methods signaled (on the consolidated dataset) that there are some patterns within the propensity of submitting papers Tuesday and Wednesday (while Friday was the reference day). On the other hand, accepted manuscripts submissions tend to be lower (negative regression coefficients) especially for weekends but also slightly so for Monday. Therefore, we create two daily intervals: Tuesday-Thursday and Saturday-Monday. With a methodology presented in Materials and Methods section an indicator: Localization Quotient (LQ) is introduced using GIS tools for both above-mentioned daily intervals. The outcomes are interesting and shown in the Figure 4 and Figure 5. The figures are complementary. In figure 4, countries which tend to submit papers during Tuesday-Thursday (TUE-THU) with a greater intensity than the world average submission rate for the same interval (TUE-THU) are highlighted by dark brown. This group comprises countries like: Belarus, Belize, Benin, Bosnia and Herzegovina, Greenland, Haiti, Iraq, Mauritania, Namibia, Nicaragua, and Syria. Every continent except Oceania is represented in this list. As expected, most of the important countries (who, by size, have an important weight over the world average) register a level of LQ between 93.33% and 136.5% of the world average TUE-THU submission rate.
Figure 4.
Distribution of the LQ for TUE-THU by PCA’s country of origin within the consolidate dataset (2001-2016)
Note:
The measurement unit for intervals is the percentage (%). The consolidated dataset rely on papers from: Physica A, PLOS ONE, Nature and Cell. LQ=Localization Quotient (the share of the papers submitted in day k in a country, divided by share of the papers submitted in same – k – day worldwide); TUE-THU= Tuesday to Thursday interval;
PCA= Papers’ Corresponding Authors; MAC=Macao (China);
HKG=Hong Kong (China); SGP=Singapore; CYP=Cyprus; ISR=Israel; TWN=Taiwan and EU=Europe. Countries marked with shading lines record no information in our dataset.
Figure 5.
Distribution of the LQ for SAT-MON by PCA’s country of origin within the consolidate dataset (2001-2016)
Note:
The measurement unit for intervals is the percentage (%). The consolidated dataset relies on papers from: Physica A, PLOS ONE, Nature and Cell. LQ=Localization Quotient (the share of the papers submitted in day k in a country, divided by share of the papers submitted in same – k – day worldwide); SAT-MON= Saturday to Monday interval;
PCA= Papers’ Corresponding Authors; MAC=Macao (China);
HKG=Hong Kong (China); SGP=Singapore; CYP=Cyprus; ISR=Israel; TWN=Taiwan and EU=Europe. Countries marked with shading lines record no information in our dataset. greater heterogeneity of the countries distribution by LQ designed for Saturday-Monday (SAT-MON) interval is easy to be seen in Figure 5. Countries that register low levels on the previous map are now concentrated in two leading groups. The first one is formed by two countries: Nigeria and Yemen. Their propensity of submitting papers during SAT-MON interval is more than twice as big as the world average. The second group register a propensity of submitting papers during SAT-MON interval greater than world average by a multiplier between 1.35 and 2.39. In this group countries like: Angola, Armenia, Burkina Faso, Central African Republic, Georgia, Iran, Ivory Coast, Libya, Mali, Macedonia, Moldova, Mongolia, Myanmar, Oman, Saudi Arabia, and Sri Lanka are included. Many of these countries belonging to the lead groups are known for their important Muslim population. Further research should be done on this topic.
4. Conclusions and discussion
In line with other reports (Cabanac & Hartley, 2013; Campos-Arceiz et al. , 2013; Hartley & Cabanac, 2017; Ausloos et al. , 2016; Ausloos et al. , 2017) our results confirm a DWE for published papers. Papers accepted for publication in the four journals included in our analysis are more likely to have been submitted on a week-day than on a day in the week-end. There are 2-3 times more published papers submitted in any week-day when compare to those which have been submitted Saturday or Sunday. The most likely week-day for historical accepted submissions differs from journal to journal. In the case of PLOS ONE there is a group of three days: Tuesday, Wednesday and Thursday; for Physica A there are two consecutive days: Monday and Tuesday; for Nature, there are two days: Friday and Monday while for Cell only Monday. Most of the seasonal factors proved to be statistical insignificant. However, the Christmas period (20 th of December to 10 th of January) has a statistically significant positive impact. More papers tend to be submitted in that time interval. The geographical dimension (papers’ corresponding authors’ affiliation country) brings new information concerning how current research is done (descriptive: GIS and analytical – a mix of log-log and semi-log based on undated and panel structured data regression models). Further work on this aspect would be interesting even if most geographical driven factors from the regression models provide a weak explanatory power. In addition to the day of the week, time is included here and examined from different perspective, namely the time trends in the data. The most important factors for the main topic of the manuscript (Week-days and Week-end) tend to be stable. Of course, some fluctuations ccur for other factors. The yearly time span from our data set is not long enough to use other specific time series techniques. This is a topic for further study.
5. References Ausloos, M., Nedic, O., & Dekanski, A. (2016). Day of the week effect in paper submission/acceptance/rejection to/in/by peer review journals.
Physica A: Statistical Mechanics and its Applications , , 197-203. 2. Ausloos, M., Nedic, O., Dekanski, A. Mrowinski, M., Fronczak, P., & Fronczak, A. (2017). Day of the week effect in paper submission/acceptance/rejection to/in/by peer review journals. II. An ARCH econometric-like modeling.
Physica A: Statistical Mechanics and its Applications , 468, 462-474. http://dx.doi.org/10.1016/j.physa.2016.10.078 3.
Bell, C. M., & Redelmeier, D. A. (2001). Mortality among patients admitted to hospitals on weekends as compared with weekdays.
New England Journal of Medicine , (9), 663-668. 4. Bernardes, A. T., & e Albuquerque, E. D. M. (2003). Cross-over, thresholds, and interactions between science and technology: lessons for less-developed countries.
Research policy , (5), 865-885. 5. Berument, H., & Kiymaz, H. (2001). The day of the week effect on stock market volatility.
Journal of economics and finance , (2), 181-193. 6. Bhattacharya, K., Sarkar, N., & Mukhopadhyay, D. (2003). Stability of the day of the week effect in return and in volatility at the Indian capital market: a GARCH approach with proper mean specification.
Applied Financial Economics , (8), 553-563. 7. Bornmann, L., & Daniel, H. D. (2011). Seasonal bias in editorial decisions? A study using data from chemistry.
Learned Publishing , (4), 325-328. 8. Cabanac, G., & Hartley, J. (2013). Issues of work–life balance among JASIST authors and editors.
Journal of the American Society for Information Science and Technology , (10), 2182-2186. 9. Campos-Arceiz, A., Koh, L. P., & Primack, R. B. (2013). Are conservation biologists working too hard?.
Biological Conservation,
Chang, E. C., Pinegar, J. M., & Ravichandran, R. (1993). International evidence on the robustness of the day-of-the-week effect.
Journal of Financial and quantitative Analysis , (04), 497-513. 11. Crespi, G. A., & Geuna, A. (2008). An empirical study of scientific production: A cross country analysis, 1981–2002.
Research Policy , (4), 565-579. 12. Dessens, J., Fraile, R., Pont, V., & Sanchez, J. L. (2001). Day-of-the-week variability of hail in southwestern France.
Atmospheric research , , 63-76. 13. Dhesi, G., Shakeel, M. B., & Xiao, L. (2016). Modified Brownian Motion Approach to Modeling Returns Distribution.
Wilmott , (82), 74-77. 4. Doherty, S. T., Andrey, J. C., & MacGregor, C. (1998). The situational risks of young drivers: The influence of passengers, time of day and day of week on accident rates.
Accident Analysis & Prevention , (1), 45-52. 15. Ebadi, A., & Schiffauerova, A. (2016). How to boost scientific production? A statistical analysis of research funding and other influencing factors.
Scientometrics , (3), 1093-1116. 16. Fidrmuc, J., & Tena, J. D. D. (2015). Friday the 13th: The Empirics of Bad Luck.
Kyklos , (3), 317-334. 17. French, K. R. (1980). Stock returns and the weekend effect.
Journal of financial economics , (1), 55-69. 18. Furtună, T. F., Reveiu, A., Dârdală, M., & Kanala, R. (2013). Analysing the spatial concentration of economic activities: a case study of energy industry in Romania.
Economic Computation & Economic Cybernetics Studies & Research , (4), 35-52. 19. Gupta, D. (2000). Has India escaped the Asian economic contagion?.
South Asia: Journal of South Asian Studies , (s1), 179-192. 20. Hartley, J. (2011). Write when you can and submit when you are ready!.
Learned publishing , (1), 29-32. 21. Hartley, J., & Cabanac, G. (2017). What can new technology tell us about the reviewing process for journal submissions in BJET?.
British Journal of Educational Technology , 48 (1), 212–220. 22.
Henriques, L., & Larédo, P. (2013). Policy-making in science policy: The ‘OECD model’unveiled.
Research Policy , (3), 801-816. 23. Herteliu, C., Ileanu, B. V., Ausloos, M., & Rotundo, G. (2015). Effect of religious rules on time of conception in Romania from 1905 to 2001.
Human Reproduction , (9), 2202-2214. 24. Hofstede, G., Hofstede, G. J. & Minkov, M. (2010). Cultures and Organizations, Software of the Mind, Intercultural Cooperation and Its Importance for Survival. McGraw-Hill: New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto 25.
Holland, P.W. & Welsch, R.E. (1977). Robust Regression using iteratively reweighted least-squares.
Communications in Statistics-Theory and Methods , 6(9),
Inönü, E. (2003). The influence of cultural factors on scientific production.
Scientometrics , (1), 137-146. 27. Kobayashi, M., & Takeda, K. (2000). Information retrieval on the web.
ACM Computing Surveys (CSUR) , (2), 144-173. 28. Linnet, K. (1987). Two-stage transformation systems for normalization of reference distributions evaluated.
Clinical chemistry , (3), 381-386. 29. Lippi, G., & Mattiuzzi, C. (2017). Scientific publishing in different countries: what simple numbers do not tell.
Annals of Research Hospitals , (4). 0. Marr, L. C., & Harley, R. A. (2002). Modeling the effect of weekday-weekend differences in motor vehicle emissions on photochemical air pollution in central California.
Environmental science & technology , (19), 4099-4106. 31. Orazbayev, S. (2017). Sequential order as an extraneous factor in editorial decision.
Scientometrics , (3), 1573-1592. 32. Ozlem, G.A. (2011). Comparison of Robust Regression Methods in Linear regression,
International Journal of Contemporary Mathematical Sciences,
Rossi, A. S., & Rossi, P. E. (1977). Body time and social time: Mood patterns by menstrual cycle phase and day of the week.
Social Science Research , (4), 273-308. 34. Schreiber, M. (2012). Seasonal bias in editorial decisions for a physics journal: you should write when you like, but submit in July.
Learned Publishing , (2), 145-151. 35. Shalvi, S., Baas, M., Handgraaf, M. J., & De Dreu, C. K. (2010). Write when hot–Submit when not: Seasonal bias in peer review or acceptance?.
Learned Publishing , (2), 117-123. 36. Wooldridge, J.W. (2002). Econometric Analysis of Cross Section and Panel Data. The MIT Press Cambridge: Massachusetts London, England 37.
WWW1, webofknowledge.com retrieved on multiple occasions in May and November 2016 38.
WWW2, https://jcr.incites.thomsonreuters.com/JCRJournalHomeAction.action retrieved on multiple occasions in May and November 2016 39.
WWW3, Jsoup, Java HTML Parser, https://jsoup.org/ retrieved on multiple occasions in April and May 2016 40.
WWW5, http://gadm.org/version2 retrieved on multiple occasions in May 2017 42.
WWW6, http://pro.arcgis.com/en/pro-app/help/mapping/symbols-and-styles/data-classification-methods.htm retrieved on multiple occasions in May 2017
Supplementary information
All information about each annual dataset on which the current paper relies is available from Excel file: Table_SI.1.xlsx. The customized script for all four journals analyzed in the current paper is available in the following Word file: SI_Scraper.docx. The Python’s scripts are available in Word File: SI_Python_Scripts.docx. The correlations matrix for all samples are available within the Word file: Table_SI.2.docx. The regression analysis outcomes for roll windows are available within: Table_SI.3.docx. The detailed formula of Kurtosis is available within SI_Kurtosis.docx. The Geographical presentation of our consolidated dataset is available in: SI_Geographical.docx. Four supplementary figures (SI.1., SI.2., SI.3. and SI.4.) are available in Word Files: Figure_SI_1.docx, Figure_SI_2.docx, Figure_SI_3.docx and Figure_SI_4.docx . cknowledgments
We acknowledge support from COST Action TD1210 “Analyzing the dynamics of information and knowledge landscapes (KNOWeSCAPE)”. We are grateful to Alexandru Agapie, David Berman, Alexandru Isaic-Maniu, Tudorel Andrei, Gurjeet Dhesi, Sebastian Buhai and Babar Syed for comments on an earlier draft. A preliminary version of the paper was presented to: (i) Cluj Economics and Business Seminar Series (CEBSS), Fall 2017 Session, Babes-Bolyai University and (ii) Annual Conference of the Romanian Academic Economists from Abroad (ERMAS), The 5th edition: 25-27 July 2018, "A. I. Cuza" University, Iasi. Thanks to the consistent feedback offered during the presentations from Cristian Litan, Alexandru Todea, Dorina Lazar, Marcel Voia and Cristian Dragos. David Berman, Peter Richmond and Roxana Herteliu-Iftode were our proof-readers. Finally, we thank to Teresa Dudley (Nature), Jared Graves (Cell), Kenneth A. Dawson (Physica A), and Nick Simon (PLOS ONE) for help with the metadata.
Author contribution
CEB obtained data using a self-designed scraper. CEB, CH, MD, and BVI performed data analysis and manuscript design in its initial and revised versions. CH coordinated the team work.
Competing financial interests
CEB, CH, MD, and BVI have nothing to declare regarding competing financial interests. able SI.1.
Parsed papers’ sample by journal and year of publication
Year Physica A PLOS ONE Nature Cell items within WoS citable items within JCR items in dataset share of sample items within WoS citable items within JCR items in dataset share of sample items within WoS citable items within JCR items in dataset share of sample items within WoS citable items within JCR items in dataset share of sample
0 1 2 3 4=100*3/2 5 6 7 8=100*7/6 9 10 11 12=100*11/10 13 14 15 16=100*15/14
Total 11998 11779 9825 83.4% 168289 160306 160172 99.9% 18462 6092 4653 76.4% 8408 5497 3777 68.7% Note:
Green shaded cells are authors' estimates
I. Scraper description and source code
Article.java
Java class that defines the article model. It is used to manage collected article data like title, URL, pages, journal, authors, received date, revised date, published date, start page and end page. The class acts like an entity model between the database and the processing routines. package ro.ase.crowler.model; import java.sql.PreparedStatement; import java.sql.SQLException; import java.sql.Statement; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import ro.ase.crowler.DB; import ro.ase.crowler.Parser; public class Article {
String title; String URL; String pages; int journalID; String authors; String history; boolean isParsed = false; String receivedDate; String revisedDate; String onlineDate; int startPage; int endPage; public String getReceivedDate() { return receivedDate; } public void setReceivedDate(String receivedDate) { this.receivedDate = receivedDate; } public String getRevisedDate() { return revisedDate; } public void setRevisedDate(String revisedDate) { this.revisedDate = revisedDate; } public String getOnlineDate() { return onlineDate; } public void setOnlineDate(String onlineDate) { this.onlineDate = onlineDate; } public int getStartPage() { return startPage; } public void setStartPage(int startPage) { this.startPage = startPage; } public int getEndPage() { return endPage; } public void setEndPage(int endPage) { this.endPage = endPage; } public String getHistory() { return history; } public void setHistory(String history) { this.history = history; // set dates for PLOS if (this.journalID == 1) { String[] dates = Parser.getPlosDates(history); this.setReceivedDate(dates[0]); this.setRevisedDate(dates[1]); this.setOnlineDate(dates[2]); } } public String getAuthors() { return authors; } public void setAuthors(String authors) { this.authors = authors; } public Article() { } public Article(String title, String uRL, String pages, int journalID) { super(); this.title = title; URL = uRL; this.pages = pages; this.journalID = journalID; } @Override public String toString() { return String.format("Title: %s \n Authors: %s \n Pages: %s \n URL: %s \n History: %s \n Journal ID: %d",this.title, this.authors, this.pages, this.URL, this.getHistory(), this.getJournalID()); } public void getElementInfo(Element article) { for (Element info : article.children()) { // Debug.log("Article info:" + info.attr("class") + " - " + // info.text()); if (info.hasClass("title")) { // Debug.log("Article URL:" + // info.child(0).child(0).attr("href")); this.setTitle(info.child(0).child(0).text()); this.setURL(info.child(0).child(0).attr("href")); } if (info.hasClass("authors")) { // Debug.log("Article authors:" + info.text()); this.setAuthors(info.text()); } if (info.hasClass("source")) { // Debug.log("Article URL:" + info.text()); this.setPages(info.text()); } } } public void getPLOSElementInfo(Element article) { final String journalURL = "http://journals.plos.org"; Elements info = article.getElementsByClass("list-title"); if (info.size() > 0) { this.setTitle(info.get(0).attr("title")); this.setURL(journalURL + info.get(0).attr("href")); } info = article.getElementsByClass("authors"); if (info.size() > 0) { String authors = ""; for (Element author : info.get(0).select("span.author")) authors += author.text(); this.setAuthors(authors); } /* * info = article.getElementsByClass("date"); if(info.size()>0) { * this.setHistory(info.get(0).text()); } */ /* * for (Element info : article.children()) { // Debug.log( * "Article info:" + info.attr("class") + " - " + // info.text()); * * if (info.hasClass("list-title")) { //Debug.log( * "Article URL:" + info.child(0).child(0).attr("href")); * this.setTitle(info.attr("title")); this.setURL(info.attr("href")); } * * if (info.hasClass("authors")) { //Debug.log( * "Article authors:" + info.text()); String authors = ""; for(Element * author : info.select("span.author")) authors+=author.text(); * this.setAuthors(authors); } if (info.hasClass("date")) { * //Debug.log("Article URL:" + info.text()); * this.setHistory(info.text()); //this.setPages(info.text()); } * * } */ } public void getNATURElementInfo(Element article) { // final String journalURL = "http://journals.plos.org"; Elements info = article.select("h1"); if (info.size() > 0) { this.setTitle(info.select("a").text()); } info = article.select("ul.links"); if (info.size() > 0) { Element firstURL = info.select("li").get(0); this.setURL(firstURL.child(0).attr("href")); } info = article.getElementsByClass("authors"); if (info.size() > 0) { String authors = ""; for (Element author : info.get(0).select("li")) authors += author.text() + ","; this.setAuthors(authors); } } public void save2DB(DB db, int journalID) throws SQLException {
String sql = "INSERT INTO `crowler`.`article` (`idjournal`, `title`, `url`, `pages`, `parsed`, `authors`, `history`, `received`, `revised`, `online`, `startPage`, `endPage`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?);"; PreparedStatement stmt = db.conn.prepareStatement(sql, statement.RETURN_GENERATED_KEYS); stmt.setInt(1, journalID); if (this.getTitle() != null && this.getTitle().length() > 255) stmt.setString(2, this.getTitle().substring(0, 254)); else stmt.setString(2, this.getTitle()); stmt.setString(3, this.getURL()); stmt.setString(4, this.getPages()); stmt.setInt(5, 0); stmt.setString(6, this.getAuthors()); stmt.setString(7, this.getHistory()); stmt.setString(8, this.getReceivedDate()); stmt.setString(9, this.getRevisedDate()); stmt.setString(10, this.getOnlineDate()); stmt.setInt(11, this.getStartPage()); stmt.setInt(12, this.getEndPage()); stmt.execute(); } public String getTitle() { return title; } public void setTitle(String title) { this.title = title; } public String getURL() { return URL; } public void setURL(String uRL) {
URL = uRL; } public String getPages() { return pages; } public void setPages(String pages) { this.pages = pages; // set pages int noPages[] = Parser.getPages(pages); this.setStartPage(noPages[0]); this.setEndPage(noPages[1]); } public int getJournalID() { return journalID; } public void setJournalID(int journalID) { this.journalID = journalID; } public boolean isParsed() { return isParsed; } public void setParsed(boolean isParsed) { this.isParsed = isParsed; } public void getCellElementInfo(Element article) { ffiliation.java
Another entity model class used to manage the authors affiliation package ro.ase.crowler.model; public class Affiliation { public final String id; private String address; private String country; private String order; public Affiliation(String id, String address, String country, String order) { super(); this.id = id; this.address = address; this.country = country; this.order = order; } public String getAddress() { return address; } public void setAddress(String address) { this.address = address; } public String getCountry() { return country; } public void setCountry(String country) { this.country = country; } public String getOrder() { return order; } public void setOrder(String order) { this.order = order; } public String getId() { return id; } } rticleAffiliation.java
An entity model class used to manage the authors affiliation data package ro.ase.crowler.model; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.text.DateFormatSymbols; import java.util.ArrayList; import ro.ase.crowler.DB; public class ArticleAffiliation { //public static final int JOURNAL_ID = 122016; public static final int JOURNAL_ID = 182016; public String title; ArrayList
URL = uRL; } public String getPages() { return pages; } public void setPages(String pages) { this.pages = pages; } public ArticleAffiliation(String title) { super(); this.title = title; this.authors = new ArrayList<>(); } public void addAuthor(Author author) { this.authors.add(author); } @Override public String toString() {
StringBuilder sb = new StringBuilder(); sb.append("*******************************"); sb.append("\nTitle: " + title); sb.append("\nReceived:"+receivedDate); sb.append("\nAccepted:"+revisedDate); sb.append("\nPublished:"+onlineDate); sb.append("\nURL:"+URL); sb.append("\nNo pages:"+pages); sb.append("\nHistory:"+getHistory()); sb.append("\nAuthors:"+getAuthors()); for (Author author : authors) sb.append(author.toString()); return sb.toString(); } public String getHistory(){
StringBuilder sb = new StringBuilder(); String[] dates = this.receivedDate.split("/"); if(dates.length>0){ String monthString = new DateFormatSymbols().getMonths()[Integer.parseInt(dates[1])-1]; sb.append("Received: "+monthString); sb.append(" "+dates[0]+", "+dates[2]+";"); } dates = this.revisedDate.split("/"); if(dates.length>0){ String monthString = new DateFormatSymbols().getMonths()[Integer.parseInt(dates[1])-1]; sb.append(" Accepted: "+monthString); sb.append(" "+dates[0]+","+dates[2]+";"); } dates = this.onlineDate.split("/"); if(dates.length>0){ String monthString = new DateFormatSymbols().getMonths()[Integer.parseInt(dates[1])-1]; sb.append(" Published: "+monthString); sb.append(" "+dates[0]+","+dates[2]); } return sb.toString(); } public String getAuthors(){
StringBuilder sb = new StringBuilder(); for(int i=0;i
An entity model class for managing author data. package ro.ase.crowler.model; import java.awt.List; import java.util.ArrayList; public class Author {
String surname; String givenNames; String email; ArrayList
StringBuilder sb = new StringBuilder(); sb.append("\n\nName: "+surname+" "+givenNames); for(Affiliation aff : affiliation){ sb.append("\n"+aff.getOrder()+" - "+aff.getId()); sb.append("\nAddress:"+aff.getAddress()); sb.append("\nCountry:"+aff.getCountry()); } if(!email.isEmpty()) sb.append("\nEmail:"+email); return sb.toString(); } }
B.java
Java class that manages the connection with the MySQL database used to store retrieved data package ro.ase.crowler; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; public class DB { public Connection conn = null; public DB() { try { Class.forName("com.mysql.jdbc.Driver"); String url = "jdbc:mysql://localhost:3306/crowler"; conn = DriverManager.getConnection(url, "root", "root"); System.out.println("conn built"); } catch (SQLException e) { e.printStackTrace(); } catch (ClassNotFoundException e) { e.printStackTrace(); } } public ResultSet runSql(String sql) throws SQLException {
Statement sta = conn.createStatement(); return sta.executeQuery(sql); } public boolean runSql2(String sql) throws SQLException {
Statement sta = conn.createStatement(); return sta.execute(sql); } @Override protected void finalize() throws Throwable { if (conn != null || !conn.isClosed()) { conn.close(); } } } arser.java
Class that retrieves the history dates (received, revised and published dates) of the peer review process. The class processes the information gathered from different sources and extracts the needed dates in a standard format that can be used for further processing. package ro.ase.crowler; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Locale; import ro.ase.crowler.util.Debug; public class Parser { public static String[] getPhysicaDates(String datesString){
String receivedDate = null; String revisedDate = null; String onlineDate = null; if(datesString!=null && !datesString.equals("")){ String[] dates = datesString.split(","); for(int i=0;i String receivedDate = null; String revisedDate = null; String onlineDate = null; if(datesString!=null && !datesString.equals("")){ String[] dates = datesString.split(";"); for(int i=0;i String strOutput = null; SimpleDateFormat sdfmt1 = new SimpleDateFormat("MMMM dd, yyyy", Locale. US ); SimpleDateFormat sdfmt2= new SimpleDateFormat("dd/MM/yyyy"); java.util.Date dDate = null; try { dDate = sdfmt1.parse(month+" "+day+", "+year); } catch (ParseException e) { e.printStackTrace(); } if(dDate!=null) { strOutput = sdfmt2.format(dDate); //Debug.log("Input date was "+strOutput); } return strOutput; } public static String getPLOSDate(String dateString){ String strOutput = null; SimpleDateFormat sdfmt1 = new SimpleDateFormat("MMMM dd, yyyy", Locale. US ); SimpleDateFormat sdfmt2= new SimpleDateFormat("dd/MM/yyyy"); java.util.Date dDate = null; try { dDate = sdfmt1.parse(dateString); } catch (ParseException e) { e.printStackTrace(); } if(dDate!=null) { strOutput = sdfmt2.format(dDate); //Debug.log("Input date was "+strOutput); } return strOutput; } public static String getNATUREDate(String dateString){ String strOutput = null; SimpleDateFormat sdfmt1 = new SimpleDateFormat("yyyy-MM-dd"); SimpleDateFormat sdfmt2= new SimpleDateFormat("dd/MM/yyyy"); java.util.Date dDate = null; try { dDate = sdfmt1.parse(dateString); } catch (ParseException e) { e.printStackTrace(); } if(dDate!=null) { strOutput = sdfmt2.format(dDate); //Debug.log("Input date was "+strOutput); } return strOutput; } public static int[] getPages(String pages){ if(pages==null) return null; int startPage = 0; int endPage = 0; if(pages.toLowerCase().contains("pages")){ pages = pages.toLowerCase().replace("pages", "").trim(); String pagesNo[] = pages.split("-"); if(pagesNo.length==2){ try{ startPage = Integer. parseInt (pagesNo[0]); endPage = Integer. parseInt (pagesNo[1]); } catch(Exception ex){ //ex.printStackTrace(); Debug. log ("Unable to get pages for "+ex.getMessage()); } } } return new int[]{startPage,endPage}; } } MLParser.java Java class that implements methods for parsing XML formatted files and extract needed data. The class has been used to extract data from the PLOS archive that contained published articles meta data structured in XML files. package ro.ase.crowler; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.DocumentBuilder; import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.xml.sax.SAXParseException; import com.mysql.jdbc.Util; import ro.ase.crowler.model.Affiliation; import ro.ase.crowler.model.ArticleAffiliation; import ro.ase.crowler.model.Author; import ro.ase.crowler.util.Debug; import ro.ase.crowler.util.Utility; import org.w3c.dom.Node; import org.w3c.dom.Element; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.FilenameFilter; import java.util.HashMap; public class XMLParser { private String folderPath; private static DB db = new DB(); public XMLParser(String folderPath) { this.folderPath = folderPath; } public void listFiles(final String filterName) { int contor = 0; long startTime = System.currentTimeMillis(); File root = new File(this.folderPath); // create new filename filter FilenameFilter fileNameFilter = null; if (filterName != null) { fileNameFilter = new FilenameFilter() { @Override public boolean accept(File dir, String name) { if (name.contains(filterName)) { return true; } return false; } }; } else fileNameFilter = new FilenameFilter() { @Override public boolean accept(File dir, String name) { return true; } }; for (File file : root.listFiles(fileNameFilter)) { Debug.log("File " + file.getName()); contor++; System.out.println("Processing file " + contor); this.parseDoc(file); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime) / (1000 * 60); System.out.println("Proceesing time " + time + " minutes"); } public void listErrorFiles(File errorFile, final String filterName) { try { int contor = 0; long startTime = System.currentTimeMillis(); try (BufferedReader br = new BufferedReader(new FileReader( errorFile))) { String line; while ((line = br.readLine()) != null) { if (line.startsWith("E")) { System.out.println("File " + line); if (line.contains(filterName)) { contor++; System.out.println("Processing file " + contor); this.parseDoc(new File(line)); } } } br.close(); } long endTime = System.currentTimeMillis(); long time = (endTime - startTime) / (1000 * 60); System.out.println("Proceesing time " + time + " minutes"); } catch (Exception e) { e.printStackTrace(); } } public void parseDoc(File fXmlFile) { try { Debug.log("----------------------------"); String fileName = fXmlFile.getName(); System.out.println("Parsing " + fileName); DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance(); dbFactory.setFeature( "http://apache.org/xml/features/nonvalidating/load-external-dtd", false); DocumentBuilder dBuilder = dbFactory.newDocumentBuilder(); Document doc = null; try { doc = dBuilder.parse(fXmlFile); } catch (SAXParseException e) { System.out.println("Trying to fix it ..."); Utility.repairFile(fXmlFile); fXmlFile = new File(fXmlFile.getAbsolutePath()); doc = dBuilder.parse(fXmlFile); } doc.getDocumentElement().normalize(); Debug.log("Root element :" + doc.getDocumentElement().getNodeName()); // check if research article /* * NodeList articleSubject = doc.getElementsByTagName("subject"); if * (articleSubject != null && articleSubject.getLength() > 0) { * String subject = articleSubject.item(0).getTextContent(); * Debug.log("Subject: "+subject); if(!subject.toLowerCase().equals( * "research article")) return; } */ if (!this.isResearchArticle(doc, "subject")) { return; } // get title /* * String title = ""; NodeList articleTitle = * doc.getElementsByTagName("article-title"); if (articleTitle != * null && articleTitle.getLength() > 0) { title = * articleTitle.item(0).getTextContent(); Debug.log("Title: " + * title); } */ String title = this.getTitle(doc, "article-title"); String receivedDate = this.getReceivedDate(doc, "date"); String acceptedDate = this.getAcceptedDate(doc, "date"); String publishedDate = this.getPublishedDate(doc, "pub-date"); String doi = this.getDOI(doc, "article-id"); String noPages = this.getNoPages(doc, "page-count"); ArticleAffiliation article = new ArticleAffiliation(title); article.setOnlineDate(publishedDate); article.setPages(noPages); article.setURL(doi); article.setRevisedDate(acceptedDate); article.setReceivedDate(receivedDate); // get affiliation // build a map of unique affiliations HashMap String getTitle(Document doc, String tagName) { String title = ""; NodeList articleTitle = doc.getElementsByTagName(tagName); if (articleTitle != null && articleTitle.getLength() > 0) { title = articleTitle.item(0).getTextContent(); Debug.log("Title: " + title); } return title; } /* * * Internal method for getting the received date */ String getReceivedDate(Document doc, String tagName) { // date String receivedDate = ""; NodeList dateList = doc.getElementsByTagName(tagName); if (dateList != null && dateList.getLength() > 0) { for (int i = 0; i < dateList.getLength(); i++) { Element eElement = (Element) dateList.item(i); String type = eElement.getAttribute("date-type"); if (type.equals("received")) { String day = eElement.getElementsByTagName("day").item(0) .getTextContent(); String month = eElement.getElementsByTagName("month") .item(0).getTextContent(); String year = eElement.getElementsByTagName("year").item(0) .getTextContent(); receivedDate = day + "/" + month + "/" + year; } } } Debug.log("Received day: " + receivedDate); return receivedDate; } /* * * Internal method for getting the accepted date */ String getAcceptedDate(Document doc, String tagName) { // date String acceptedDate = ""; NodeList dateList = doc.getElementsByTagName(tagName); if (dateList != null && dateList.getLength() > 0) { for (int i = 0; i < dateList.getLength(); i++) { Element eElement = (Element) dateList.item(i); String type = eElement.getAttribute("date-type"); if (type.equals("accepted")) { String day = eElement.getElementsByTagName("day").item(0) .getTextContent(); String month = eElement.getElementsByTagName("month") .item(0).getTextContent(); String year = eElement.getElementsByTagName("year").item(0) .getTextContent(); acceptedDate = day + "/" + month + "/" + year; } } } Debug.log("Accepted day: " + acceptedDate); return acceptedDate; } /* * * Internal method for getting the accepted date */ String getPublishedDate(Document doc, String tagName) { // pub-date String onlineDate = ""; NodeList dateList = doc.getElementsByTagName(tagName); if (dateList != null && dateList.getLength() > 0) { for (int i = 0; i < dateList.getLength(); i++) { Element eElement = (Element) dateList.item(i); String type = eElement.getAttribute("pub-type"); if (type.equals("epub")) { String day = eElement.getElementsByTagName( "day").item(0).getTextContent(); String month = eElement.getElementsByTagName( "month") .item(0).getTextContent(); String year = eElement.getElementsByTagName( "year").item(0).getTextContent(); onlineDate = day + "/" + month + "/" + year; } } } Debug.log("Online day: " + onlineDate); return onlineDate; } /* * * Internal method for getting the article DOI */ String getDOI(Document doc, String tagName) { // article-id String doi = ""; NodeList articleId = doc.getElementsByTagName(tagName); if (articleId != null && articleId.getLength() > 0) { for (int i = 0; i < articleId.getLength(); i++) { Element eElement = (Element) articleId.item(i); String attValue = eElement.getAttribute("pub-id-type"); if (attValue.equals("doi")) doi = eElement.getTextContent(); } } Debug.log("DOI: " + doi); return doi; } /* * * Internal method for getting the article no pages */ String getNoPages(Document doc, String tagName) { // page-count String noPages = ""; NodeList pageCount = doc.getElementsByTagName(tagName); if (pageCount != null && pageCount.getLength() > 0) { Element eElement = (Element) pageCount.item(0); noPages = eElement.getAttribute("count"); } Debug.log("Page count: " + noPages); return noPages; } /* * * Internal method for getting affiliation */ HashMap HashMap Java class that implements the crawler logic. The main objective of this class is to provide processing functions that are used to extract the data from different HTML or XML structures. package ro.ase.crowler; import java.io.IOException; import java.net.SocketTimeoutException; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.util.ArrayList; import org.jsoup.HttpStatusException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; import ro.ase.crowler.model.Article; import ro.ase.crowler.util.Debug; public class Crowler { public static DB db = new DB(); // process each article page for Physica A public static void processJournalArticle(String articleURL, Article articol) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where URL = '" + articleURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { articol.setParsed(true); } else { // get useful information boolean retry = true; Document doc = null; while (retry) { try { doc = Jsoup.connect(articleURL).timeout(10 * 1000).get(); if (doc.text().contains("Physica A")) { Debug.log(articleURL); retry = false; } } catch (SocketTimeoutException ex) { Debug.log("SocketTimeoutException............retrying"); } } Debug.log("Getting article content"); Elements articleHistory = doc.select("dl.articleDates"); if (articleHistory != null) { String history = articleHistory.text(); if (history.isEmpty() || history.equals("")) Debug.log("HISTORY: " + articleHistory.html()); articol.setHistory(articleHistory.text()); } else Debug.log("History not found"); // get all links and recursively call the processPage method Elements articleContent = doc.select("div.article-content"); for (Element item : articleContent) { // Debug.log("Processing "+link); for (Element info : item.children()) { // Debug.log("Article info:"+info.attr("class")+ " // - "+ info.text()); if (info.hasClass("article-author-list")) { Elements authors = doc.select("span.author-name"); for (Element author : authors) Debug.log("XXX Author:" + author.text()); } if (info.hasClass("article-title")) Debug.log("XXX Article Title:" + info.text()); if (info.hasClass("article-history") || info.hasClass("articleDates")) { Debug.log("XXX Article History:" + info.text()); articol.setHistory(info.text()); } } } } } // process the article page for PLOS public static void processPLOSJournalArticle(String articleURL, Article articol) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where URL = '" + articleURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // get useful information boolean retry = true; Document doc = null; while (retry) { try { doc = Jsoup.connect(articleURL).timeout(10 * 1000).get(); if (doc.text().contains("PLOS")) { Debug.log(articleURL); retry = false; } } catch (SocketTimeoutException ex) { Debug.log("SocketTimeoutException............retrying"); } } Debug.log("Getting article content"); Elements articleInfo = doc.select("div.articleinfo"); if (articleInfo.size() > 0) { for (Element para : articleInfo.get(0).select("p:contains(Received)")) { if (para.text().contains("Received")) { String text = para.text(); text = text.replace("", ""); text = text.replace("", ""); Debug.log("HISTORY: " + text); articol.setHistory(text); break; } } } } } // process the article page for Nature public static void processNATUREJournalArticle(String articleURL, Article articol) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where URL = '" + articleURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // get useful information boolean retry = true; Document doc = null; while (retry) { try { doc = Jsoup.connect(articleURL).timeout(10 * 1000).get(); if (doc.text().contains("nature")) { Debug.log(articleURL); retry = false; } } catch (SocketTimeoutException ex) { Debug.log("SocketTimeoutException............retrying"); } } Debug.log("Getting article content"); Elements articleInfo = doc.select("dl.dates"); if (articleInfo.size() > 0) { Elements dates = articleInfo.select("time"); if (dates.size() >= 3) { String received = dates.get(0).text(); String accepted = dates.get(1).text(); String online = dates.get(2).text(); Debug.log(String.format("HISTORY: Received %s, Accepted %s, Published %s", received,accepted, online)); articol.setHistory(String.format("HISTORY: Received %s, Accepted %s, Published %s", received, accepted, online)); articol.setReceivedDate(Parser.getNATUREDate( dates.get(0).attr("datetime").trim())); articol.setRevisedDate(Parser.getNATUREDate( dates.get(1).attr("datetime").trim())); articol.setOnlineDate(Parser.getNATUREDate( dates.get(2).attr("datetime").trim())); } } } } // process the Nature issue page public static void processNATUREJournalPage(String journalURL, int journalID) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from Record where URL = '" + journalURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // store the URL to database to avoid parsing again sql = "INSERT INTO `record` " + "(`URL`,`JournalID`) VALUES " + "(?,?);"; PreparedStatement stmt = db.conn.prepareStatement( sql, Statement.RETURN_GENERATED_KEYS); stmt.setString(1, journalURL); stmt.setInt(2, journalID); stmt.execute(); // get useful information boolean retry = true; Document doc = null; while (retry) { try { Thread.sleep(500); doc = Jsoup.connect(journalURL).timeout(10 * 1000).get(); if (doc.text().contains("nature")) { Debug.log(journalURL); } retry = false; } catch (Exception ex) { retry = true; } } // get all links and recursively call the processPage method Elements articles = doc.select("ul.article-list>li"); for (Element article : articles) { // Debug.log("Processing "+link); Article articol = new Article(); articol.getNATURElementInfo(article); if (articol.getURL() != null && articol.getURL().contains("http") && !articol.getURL().contains("full")) processNATUREJournalArticle(articol.getURL(), articol); articol.setJournalID(journalID); Debug.log("Date articol \n " + articol.toString()); if (articol.getHistory() != null) { articol.save2DB(db, journalID); } } } } // process the PLOS issue page public static void processPLOSJournalPage(String journalURL, int journalID) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from Record where URL = '" + journalURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // store the URL to database to avoid parsing again sql = "INSERT INTO `record` " + "(`URL`,`JournalID`) VALUES " + "(?,?);"; PreparedStatement stmt = db.conn.prepareStatement( sql, Statement.RETURN_GENERATED_KEYS); stmt.setString(1, journalURL); stmt.setInt(2, journalID); stmt.execute(); // get useful information boolean retry = true; Document doc = null; while (retry) { try { Thread.sleep(500); doc = Jsoup.connect(journalURL).timeout(10 * 1000).get(); if (doc.text().contains("PLOS")) { Debug.log(journalURL); } retry = false; } catch (Exception ex) { retry = true; } } // get all links and recursively call the processPage method Elements articles = doc.select("ul public static void processJournalPage(String journalURL, int journalID) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from Record where URL = '" + journalURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // store the URL to database to avoid parsing again sql = "INSERT INTO `record` " + "(`URL`,`JournalID`) VALUES " + "(?,?);"; PreparedStatement stmt = db.conn.prepareStatement( sql, Statement.RETURN_GENERATED_KEYS); stmt.setString(1, journalURL); stmt.setInt(2, journalID); stmt.execute(); // get useful information boolean retry = true; Document doc = null; while (retry) { try { doc = Jsoup.connect(journalURL).timeout(10 * 1000).get(); if (doc.text().contains("Physica A")) { Debug.log(journalURL); } retry = false; } catch (Exception ex) { retry = true; } } // get all links and recursively call the processPage method Elements articles = doc.select("ul.article"); for (Element article : articles) { // Debug.log("Processing "+link); Article articol = new Article(); articol.getElementInfo(article); if (articol.getURL() != null) processJournalArticle(articol.getURL(), articol); articol.setJournalID(journalID); Debug.log("Date articol \n " + articol.toString()); if (!articol.isParsed()) articol.save2DB(db, journalID); } } } // process the Nature issue page public static void reprocessJournalArticle() throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where history = ''"; ResultSet rs = db.runSql(sql); while (rs.next()) { String url = rs.getString("url"); int id = rs.getInt("idarticle"); // get useful information boolean retry = true; Document doc = null; while (retry) { try { doc = Jsoup.connect(url).timeout(10 * 1000).get(); if (doc.text().contains("Physica A")) { Debug.log(url); retry = false; } } catch (SocketTimeoutException ex) { Debug.log("SocketTimeoutException............retrying"); } catch (HttpStatusException ex) { Debug.log(ex.getMessage() + " - " + url); retry = false; continue; } } Debug.log("Getting article content"); Elements articleHistory = null; if (doc != null) articleHistory = doc.select("dl.articleDates"); if (articleHistory != null) { String history = articleHistory.text(); if (history.isEmpty() || history.equals("")) Debug.log("HISTORY: " + articleHistory.html()); else { sql = "update article SET history = ? where idarticle=?"; PreparedStatement stmt = db.conn.prepareStatement(sql); stmt.setString(1, history); stmt.setInt(2, id); stmt.execute(); Debug.log("Article with id = " + id + " updated with history " + history); } } else Debug.log("History not found"); } } // reprocess pages for which the history data was missing public static void reprocessJournalArticleHP() throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where history <>''"; ResultSet rs = db.runSql(sql); while (rs.next()) { String history = rs.getString("history"); String pages = rs.getString("pages"); int idJournal = rs.getInt("idjournal"); int id = rs.getInt("idarticle"); String[] dates = null; int[] page = null; if (idJournal == 1) { dates = Parser.getPlosDates(history); // page = Parser.getPages(pages); } else if(idJournal >= 300){ dates = Parser.getPhysicaDates(history); page = Parser.getPages(pages); } if (dates != null) { sql = "update article SET received = ?, revised = ?, online = ?, startPage = ?, endPage = ? where idarticle=?"; PreparedStatement stmt = db.conn.prepareStatement(sql); stmt.setString(1, dates[0]); stmt.setString(2, dates[1]); stmt.setString(3, dates[2]); if (page != null && page.length == 2) { stmt.setInt(4, page[0]); stmt.setInt(5, page[1]); } else { stmt.setInt(4, 0); stmt.setInt(5, 0); } stmt.setInt(6, id); stmt.execute(); Debug.log("Article with id = " + id + " updated with history " + history); } } } // process the Cell issue page public static void processCellJournalPage(String journalIssueURL, int journalID) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from Record where URL = '" + journalIssueURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // store the URL to database to avoid parsing again sql = "INSERT INTO `record` " + "(`URL`,`JournalID`) VALUES " + "(?,?);"; PreparedStatement stmt = db.conn.prepareStatement( sql, Statement.RETURN_GENERATED_KEYS); stmt.setString(1, journalIssueURL); stmt.setInt(2, journalID); stmt.execute(); // get useful information boolean retry = true; Document doc = null; while (retry) { try { Thread.sleep(500); doc = Jsoup.connect(journalIssueURL).timeout(10 * 1000).get(); if (doc.text().contains("PLOS")) { Debug.log(journalIssueURL); } retry = false; } catch (Exception ex) { retry = true; } } // get all links and recursively call the processPage method Elements articles = doc.select("div.articleCitation"); for (Element article : articles) { // Debug.log("Processing "+link); Article articol = new Article(); articol.getCellElementInfo(article); if (articol.getURL() != null) processCellJournalArticle(articol.getURL(), articol); articol.setJournalID(journalID); Debug.log("Date articol \n " + articol.toString()); if (articol.getHistory() != null) { articol.save2DB(db, journalID); } } } } // process the Cell article page public static void processCellJournalArticle(String articleURL, Article articol) throws SQLException, IOException { // check if the given URL is already in database String sql = "select * from article where URL = '" + articleURL + "'"; ResultSet rs = db.runSql(sql); if (rs.next()) { } else { // get useful information boolean retry = true; Document doc = null; int contor = 0; while (retry && contor < 3) { try { doc = Jsoup.connect(articleURL).timeout(10 * 1000).get(); if (doc.text().contains("cell")) { Debug.log(articleURL); retry = false; contor = 0; } } catch (SocketTimeoutException ex) { contor++; Debug.log("SocketTimeoutException............retrying"); } } if(doc!=null){ Debug.log("Getting article content"); Elements articleInfo = doc.select("div.articleDates"); String history = ""; if (articleInfo.size() > 0) { for (Element para : articleInfo.get(0).select("span.pubDatesRow")) { if(para.hasText()){ if(!para.text().contains(";")) history += para.text()+";"; else history += para.text(); } } Debug.log("HISTORY: " + history); articol.setHistory(history); } } } } // process each issue page for Physica A public static ArrayList Utility class that contains different processing methods for extracting week days, country names and for correcting data formatting errors (some XML files where not well formatted). package ro.ase.crowler.util; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.FileReader; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.PrintWriter; import java.io.Writer; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.SQLException; import java.sql.Statement; import java.text.DateFormat; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Calendar; import java.util.Date; import java.util.Locale; import ro.ase.crowler.DB; import com.mysql.fabric.xmlrpc.base.Array; public class Utility { public static void repairFile(File file) throws FileNotFoundException, IOException { String oldName = file.getAbsolutePath(); copyFile (file, oldName + ".old"); File existingFile = new File(oldName + ".old"); Writer out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(oldName), "UTF-8")); try (BufferedReader br = new BufferedReader( new FileReader(existingFile))) { String line; boolean isLineOk = false; while ((line = br.readLine()) != null) { if (!isLineOk && line.startsWith(" getCountryNames() { String[] locales = Locale. getISOCountries (); ArrayList RETURN_GENERATED_KEYS ); stmt.setString(1, country); stmt.execute(); } catch (SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } public static void correctUSStateNames(DB db) { for (State country : State. values ()) { System. out .println("Update for state "+country.getState()); String sql = "UPDATE `crowler`.`affiliation` SET `country` = 'United States of America' where country like '%"+country.getState()+"%'"; PreparedStatement stmt; try { stmt = db.conn.prepareStatement(sql, Statement. RETURN_GENERATED_KEYS ); //stmt.setString(1, country.getState()); stmt.execute(); } catch (SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } public static void insertCountryNames(DB db, ArrayList RETURN_GENERATED_KEYS ); stmt.setString(1, country); stmt.execute(); } catch (SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } } public static void correctEmailAddressCell(DB db, String filter) { System. out .println("Update for filter" + filter); String sql = "SELECT authorId,idarticle,email, affiliation.id, country, address, affiliation.order as affiliationOrder FROM crowler.authors, crowler.affiliation where authors.id=affiliation.authorId and country like '%"+filter+"%';"; PreparedStatement stmt; try { stmt = db.conn.prepareStatement(sql); ResultSet rs = stmt.executeQuery(); int contor = 0; while (rs.next()) { contor++; System. out .println("----------------- Processing article " + contor); int authorId = rs.getInt("authorId"); int idArticle = rs.getInt("idarticle"); int affId = rs.getInt("id"); String country = rs.getString("country"); int order = rs.getInt("affiliationOrder"); if(order!=1){ String sqlUpdate = "UPDATE `crowler`.`affiliation` SET `country` = '' where id = ?"; PreparedStatement stmt2 = db.conn.prepareStatement(sqlUpdate); stmt2.setInt(1, affId); stmt2.execute(); String sqlUpdateAuthor = "UPDATE `crowler`.`authors` SET `email` = ? where id = ?"; PreparedStatement stmt3 = db.conn.prepareStatement(sqlUpdateAuthor); stmt3.setString(1, country); stmt3.setInt(2, authorId); stmt3.execute(); } } } catch (SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static void extractDates(DB db) throws ParseException { System. out .println("Extract dates"); String sql = "SELECT authorId,idarticle FROM crowler.aut_iso_detailed where online_year is null;"; PreparedStatement stmt; try { stmt = db.conn.prepareStatement(sql); ResultSet rs = stmt.executeQuery(); int contor = 0; while (rs.next()) { contor++; System. out .println("----------------- Processing article " + contor); int authorId = rs.getInt("authorId"); int idArticle = rs.getInt("idarticle"); System. out .println("Article ID "+idArticle); String sql_article = "SELECT idjournal,authors,received,revised,online,startPage, endPage FROM crowler.article where idarticle = ?;"; PreparedStatement stmt_article = db.conn.prepareStatement(sql_article); stmt_article.setInt(1, idArticle); ResultSet rs_article = stmt_article.executeQuery(); if(rs_article.next()){ String authors = rs_article.getString("authors"); String receivedDate = rs_article.getString("received"); String revisedDate = rs_article.getString("revised"); String onlineDate = rs_article.getString("online"); int startpage = rs_article.getInt("startPage"); int endPage = rs_article.getInt("endPage"); int journalid = rs_article.getInt("idjournal"); int receivedValues[] = Utility. getDataInfo (receivedDate); int revisedValues[] = Utility. getDataInfo (revisedDate); int online[] = Utility. getDataInfo (onlineDate); int authorsNo = 0; String[] authorsList = authors.split(","); if(authorsList.length != 0){ String lastAuthor = authorsList[authorsList.length-1]; if(lastAuthor.isEmpty() || lastAuthor.equals("") || lastAuthor.equals(" ")) authorsNo = authorsList.length-1; else authorsNo = authorsList.length; } //System.out.println("Authors:" + authors); //System.out.println("Authors number: "+authorsNo); int pagesNo = endPage-startpage+1; //System.out.println("Pages number:"+pagesNo); String sqlUpdate = "UPDATE crowler.aut_iso_detailed SET received_week_day = ?, received_day = ?, " + "received_month = ?, received_year = ?," + "revised_week_day = ?, revised_day = ?,revised_month = ?,revised_year = ?," + "online_week_day = ?, online_day = ?,online_month = ?,online_year = ?," + "authors_number = ? , pages_number = ? , journalid = ? " + "WHERE idarticle = ? AND authorid = ?;"; PreparedStatement stmt_update = db.conn.prepareStatement(sqlUpdate); stmt_update.setInt(1, receivedValues[0]); stmt_update.setInt(2, receivedValues[1]); stmt_update.setInt(3, receivedValues[2]); stmt_update.setInt(4, receivedValues[3]); stmt_update.setInt(5, revisedValues[0]); stmt_update.setInt(6, revisedValues[1]); stmt_update.setInt(7, revisedValues[2]); stmt_update.setInt(8, revisedValues[3]); stmt_update.setInt(9, online[0]); stmt_update.setInt(10, online[1]); stmt_update.setInt(11, online[2]); stmt_update.setInt(12, online[3]); stmt_update.setInt(13, authorsNo); stmt_update.setInt(14, pagesNo); stmt_update.setInt(15, journalid); stmt_update.setInt(16, idArticle); stmt_update.setInt(17, authorId); stmt_update.execute(); } } } catch (SQLException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public static int[] getDataInfo(String processedDate) throws ParseException { int[] values = new int[4]; Date received = null; DateFormat df = new SimpleDateFormat("dd/MM/yyyy"); try { received = df.parse(processedDate); } catch (Exception e) { values[0] = 0; values[1] = 0; values[2] = 0; values[3] = 0; return values; } // System.out.println(received.toLocaleString()); String dayOfWeek = new SimpleDateFormat("EEEE", Locale.ENGLISH).format(received); // System.out.println(dayOfWeek); // Friday // System.out.println("Day number:"+Utility.getWeekDay(dayOfWeek)); Calendar calendar = Calendar.getInstance(); calendar.setTime(received); int receivedWeekDay = calendar.get(Calendar.DAY_OF_WEEK) - 1; if (receivedWeekDay == 0) receivedWeekDay = 7; int receivedDay = calendar.get(Calendar.DAY_OF_MONTH); int receivedMonth = calendar.get(Calendar.MONTH) + 1; int receivedYear = calendar.get(Calendar.YEAR); values[0] = receivedWeekDay; values[1] = receivedDay; values[2] = receivedMonth; values[3] = receivedYear; // System.out.println("Week Day:"+receivedWeekDay); // System.out.println("Day:"+receivedDay); // System.out.println("Month:"+receivedMonth); // System.out.println("Year:"+receivedYear); return values; } public static int getWeekDay(String weekday) { switch(weekday){ case "Monday": return 1; case "Tuesday": return 2; case "Wednesday": return 3; case "Thursday": return 4; case "Friday": return 5; case "Saturday": return 6; case "Sunday": return 7; default: throw new IllegalArgumentException(); } } } I. Python Script for LQ tool that compute Localization Quotient : import arcpy import arcpy.da as da tab_tari = arcpy.GetParameterAsText(0) tab_date = arcpy.GetParameterAsText(1) lts = [rnds for rnds in da.SearchCursor(tab_date, ("iso2", "FREQUENCY", "FREQUENCY_1") ) ] st = 0 ssel = 0 for rd in lts: if rd[1] is not None: st += rd[1] if rd[2] is not None: ssel += rd[2] rap_jos = float(ssel) / st * 100 for rd in lts: if rd[2] is None: rap_sus = 0. else: rap_sus = float(rd[2]) / rd[1] * 100 coef_loc = rap_sus / rap_jos * 100 with da.UpdateCursor(tab_tari, ("lq"), "ISO2 = '"+rd[0]+"'") as crs: for rdu in crs: rdu[0] = coef_loc crs.updateRow(rdu) Python Script for Simb_GC tool that applies a graduated colors symbology on Countries layer: import arcpy tari = arcpy.GetParameterAsText(0) nrcl = arcpy.GetParameter(1) mxd = arcpy.mapping.MapDocument("CURRENT") df = arcpy.mapping.ListDataFrames(mxd, "World")[0] fsl = "d:/stat_art/simbol_GC.lyr" sl = arcpy.mapping.Layer(fsl) dl = arcpy.mapping.ListLayers(mxd, tari, df)[0] arcpy.mapping.UpdateLayer(df, dl, sl, "TRUE") if dl.symbologyType == "GRADUATED_COLORS": dl.symbology.numClasses = nrcl dl.symbology.valueField = "lq" arcpy.SetParameter(2,True) else: arcpy.SetParameter(2,False) arcpy.RefreshActiveView() arcpy.RefreshTOC() able SI.2. Correlation matrix for the independent variables for consolidated data set and for each journal A. Consolidated data set Variables Mon Tue Wed Thu Week-end Spring Summer Fall America Africa Asia Oceania Christ-mas log AUTH log HDI log LTO Monday Tuesday -0.216** 1 Wednesday -0.217** -0.224** 1 Thursday -0.215** -0.223** -0.223** 1 Weekend -0.160** -0.165** -0.166** -0.162** 1 Spring -0.012** 0.005* 0.003 0.001 -0.002 1 Summer Fall 0 -0.001 0.002 0.002 0 -0.332** -0.332** 1 America Africa -0.005* 0.001 -0.003 0.002 0.010** 0 -0.001 0 -0.083** 1 Asia -0.021** -0.011** -0.015** -0.008** 0.111** -0.002 -0.011** 0.004 -0.437** -0.073** 1 Oceania Christmas log AUTH -0.002 -0.001 0.004 0.003 -0.004 0.001 0 -0.005* -0.050** -0.007** 0.041** -0.035** 0.003 1 log HDI log LTO -0.012** -0.009** -0.010** -0.002 0.050** -0.001 -0.006* -0.001 -0.735** -0.126** 0.601** -0.276** 0.009** 0.091** -0.335** 1 B. Physica A: Statistical mechanics and its applications Variables Mon Tue Wed Thu Week-end Spring Summer Fall America Africa Asia Oceania Christ-mas log AUTH log HDI log LTO Monday Tuesday -0.214** 1 Wednesday -0.211** -0.213** 1 Thursday -0.207** -0.208** -0.205** 1 Weekend -0.189** -0.190** -0.188** -0.175** 1 Spring -0.01 -0.007 0.037** 0.01 -0.011 1 Summer -0.011 0 -0.027** -0.006 0.013 -0.329** 1 Fall America Africa Asia -0.046** -0.002 -0.008 -0.012 0.101** 0.008 0.003 -0.007 -0.487** -0.122** 1 Oceania Christmas og AUTH -0.015 -0.014 0 -0.007 0.016 0.004 -0.012 0.008 0.019 -0.005 0.112** -0.027** 0 1 log HDI log LTO -0.025* -0.003 0.01 -0.017 0.028** 0.024* -0.004 -0.004 -0.569** -0.343** 0.515** -0.175** -0.008 0.082** -0.111** 1 C. PLOS ONE Variables Mon Tue Wed Thu Week-end Spring Summer Fall America Africa Asia Oceania Christ-mas log AUTH log HDI log LTO Monday Tuesday -0.216** 1 Wednesday -0.217** -0.225** 1 Thursday -0.216** -0.224** -0.225** 1 Weekend -0.157** -0.163** -0.164** -0.162** 1 Spring -0.013** 0.006* 0.002 0 0 1 Summer Fall America Africa -0.006* 0.001 -0.003 0.003 0.008** 0.001 -0.002 0 -0.082** 1 Asia -0.020** -0.012** -0.016** -0.007** 0.114** -0.002 -0.011** 0.004 -0.428** -0.074** 1 Oceania Christmas log AUTH -0.001 0 0.004 0 0.004 0 -0.005 -0.002 -0.108** 0.002 0.096** -0.046** 0.005* 1 log HDI log LTO -0.011** -0.009** -0.012** -0.002 0.055** -0.002 -0.006* 0 -0.732** -0.115** 0.605** -0.291** 0.010** 0.149** -0.339** 1 D. Nature Variables Mon Tue Wed Thu Week-end Spring Summer Fall America Africa Asia Oceania Christ-mas log AUTH log HDI log LTO Monday Tuesday -0.224** 1 Wednesday -0.212** -0.212** 1 Thursday -0.212** -0.213** -0.201** 1 Weekend -0.174** -0.174** -0.164** -0.165** 1 Spring -0.011 0.015 -0.013 0.013 0.006 1 Summer Fall -0.022 -0.008 0.014 0.004 0.008 -0.343** -0.341** 1 America Africa -0.016 0.001 0.003 -0.015 0.008 -0.005 -0.005 -0.003 -0.038** 1 Asia -0.009 0.007 -0.009 -0.016 0.022 0.002 -0.004 0.009 -0.346** -0.01 1 Oceania hristmas log AUTH -0.006 -0.039** 0.009 0.01 0.025 -0.017 -0.007 -0.005 -0.072** 0.026 0.026 0.025 0.029 1 log HDI log LTO -0.004 0.001 -0.004 0.017 -0.004 -0.001 0.014 -0.029 -0.863** -0.006 0.453** -0.166** -0.01 0.058** -0.336** 1 E. Cell Variables Mon Tue Wed Thu Week-end Spring Summer Fall America Africa Asia Oceania Christ-mas log AUTH log HDI log LTO Monday Tuesday -0.226** 1 Wednesday -0.219** -0.219** 1 Thursday -0.224** -0.224** -0.217** 1 Weekend -0.157** -0.157** -0.152** -0.155** 1 Spring Summer -0.006 -0.009 -0.012 0.018 0.023 -0.368** 1 Fall -0.041* 0.007 0.031 0.015 -0.02 -0.338** -0.337** 1 America -0.004 0.011 -0.003 -0.014 0.056** -0.002 -0.012 0.019 1 Africa -0.008 -0.008 -0.008 -0.008 -0.005 0.027 -0.01 -0.009 -0.027 1 Asia Oceania Christmas log AUTH -0.015 -0.029 -0.016 0.022 0.013 0.038* -0.03 -0.001 -0.084** -0.013 .059** .067** 0.016 1 log HDI log LTO Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% Table SI.3. Regression estimates for the dependent variables for consolidated data set and each journal for each model by time A. Consolidated data set Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11) 0 1 2 3 4 5 6 7 8 9 Intercept Monday (1 yes, 0 no) 0.084*** 0.085*** 0.088*** 0.088*** 0.099*** 0.099*** 0.099*** Tuesday (1 yes, 0 no) 0.005 0.004 0.009 0.007 0.017 0.017 0.016 Wednesday (1 yes, 0 no) -0.0117 -0.01 -0.005 -0.007 0.0001 -0.0009 -0.0018 Thursday (1 yes, 0 no) -0.0196 -0.0189 -0.019 -0.021 -0.012 -0.011 -0.015 Weekend (1 yes, 0 no) -0.467*** -0.472*** -0.469*** -0.466*** -0.441*** -0.44*** -0.44*** Adjusted R (weighted) 0.116*** Spring (1 yes, 0 no) -0.026 -0.02 -0.02 0.002 -0.0003 -0.0004 -0.006 Summer (1 yes, 0 no) -0.019 -0.005 -0.007 0.015 0.015 0.015 0.014 Fall (1 yes, 0 no) -0.026 -0.021 -0.024 -0.001 -0.0031 0.0008 -0.005 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) Asia (1 yes, 0 no) -0.089*** -0.081*** -0.080*** -0.073*** -0.045* -0.058** Oceania (1 yes, 0 no) -0.159* -0.168** -0.1674* -0.143* 0.148* -0.067 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R B. Physica A: Statistical mechanics and its applications Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) -0.00039 -0.0011 -0.0018 -0.002 -0.002 -0.003 -0.003 Thursday (1 yes, 0 no) -0.00029 -0.0001 -0.00069 -0.002 -0.003 -0.003 -0.006 Weekend (1 yes, 0 no) -0.227*** -0.226*** -0.224*** -0.223*** -0.223*** -0.224*** 0.0225*** Adjusted R (weighted) 0.08*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) -0.005 -0.009 -0.0126 0.006 0.004 0.005 0.0003 Adjusted R (weighted) America (1 yes, 0 no) -0.011 -0.0159 -0.017 -0.016 -0.0145 0.014 Africa (1 yes, 0 no) Asia (1 yes, 0 no) -0.052*** -0.047*** -0.047*** -0.047*** -0.045** -0.054*** Oceania (1 yes, 0 no) -0.097 -0.097* -0.096* -0.098* -0.098* -0.045 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.029 -0.027 -0.019 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R E. Cell Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) -0.003 -0.0118 -0.0132 -0.0152 -0.0163 -0.0163 -0.017 Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) -0.049 -0.0482 -0.049 -0.0512 -0.0525 -0.0522 -0.0531 Weekend (1 yes, 0 no) -0.088 -0.0542 -0.0512 -0.0572 -0.0572 -0.0556 -0.0557 Adjusted R (weighted) 0.04*** Spring (1 yes, 0 no) -0.1281*** -0.1296*** -0.13*** -0.1155*** -0.1148*** -0.1142*** -0.1149*** Summer (1 yes, 0 no) -0.0791** -0.08** -0.0789** -0.0637* -0.0637* -0.0641* -0.065* Fall (1 yes, 0 no) -0.0312 -0.0351 -0.0347 -0.0206 -0.0201 -0.0213 -0.0205 Adjusted R (weighted) America (1 yes, 0 no) -0.0466* -0.0396 -0.0394 -0.0404 -0.0416 0.001 Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) -0.0073 -0.0032 -0.0047 -0.0048 0.0074 -0.0044 Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0282 -0.0273 -0.0293 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% N.A. – Due to the lack of variability (no papers having PCA – Papers’ Corresponding Authors – from Africa) this factor was automatically removed from the model. A. Consolidated data set Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11) 0 1 2 3 4 5 6 7 8 9 Intercept Monday (1 yes, 0 no) 0.0528** 0.053*** 0.053*** 0.055*** 0.056*** 0.053*** 0.052** Tuesday (1 yes, 0 no) 0.074*** 0.075*** 0.076*** 0.077*** 0.078*** 0.078*** 0.075*** Wednesday (1 yes, 0 no) 0.043** 0.48** 0.049** 0.05** 0.05** 0.047** 0.046** Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.460*** -0.459*** -0.459*** -0.457*** -0.455*** -0.456*** -0.463*** Adjusted R (weighted) 0.121*** Spring (1 yes, 0 no) -0.037* -0.046** -0.046** -0.035* -0.035* -0.033 -0.028 Summer (1 yes, 0 no) -0.002 -0.003 -0.003 0.007 0.006 0.008 0.01 Fall (1 yes, 0 no) -0.039** -0.045*** -0.04** -0.032* -0.034* -0.031* -0.027 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) -0.033 0.025 0.025 0.031 0.072 0.129* Asia (1 yes, 0 no) -0.029* 0.012 0.012 0.019 0.035* 0.036* Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R B. Physica A: Statistical mechanics and its applications Intercept Monday (1 yes, 0 no) -0.0004 0.0009 0.00073 0.0002 0.0005 -0.003 -0.002 Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.190*** -0.192*** -0.191*** -0.191*** -0.191*** -0.193*** -0.200*** Adjusted R (weighted) 0.109*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) -0.032** -0.0233 -0.023 -0.019 -0.020 0.009 Africa (1 yes, 0 no) -0.015 0.016 0.017 0.0068 -0.002 0.077 Asia (1 yes, 0 no) -0.031** -0.0179 -0.0169 -0.011 -0.013 -0.013 Oceania (1 yes, 0 no) -0.037 -0.023 -0.021 -0.027 -0.026 -0.0398 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.086*** -0.091*** -0.088*** Adjusted R (weighted) log HDI -0.058 0.061 Adjusted R (weighted) log LTO Adjusted R C. PLOS ONE Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) -0.107*** -0.0936** -0.0972** -0.0865** -0.0865** -0.0897** -0.0878** Weekend (1 yes, 0 no) -0.756*** -0.7513*** -0.7583*** -0.750*** -0.750*** -0.751*** -0.7498*** Adjusted R Spring (1 yes, 0 no) -0.149*** -0.1437*** -0.1444*** -0.1128** -0.1127*** -0.1114*** -0.1084*** Summer (1 yes, 0 no) -0.042 -0.0526 -0.0550* -0.0235 -0.0235 -0.0232 -0.0239 Fall (1 yes, 0 no) -0.140*** -0.1257*** -0.1289*** -0.0975*** -0.0975*** -0.0946*** -0.0914*** Adjusted R America (1 yes, 0 no) -0.0126 -0.0111 -0.0105 -0.0103 -0.0110 -0.024 Africa (1 yes, 0 no) -0.0246 0.0228 0.0239 0.0235 0.0939 0.0336 Asia (1 yes, 0 no) -0.0219 0.0250 0.0257 0.0256 0.0423 0.0448 Oceania (1 yes, 0 no) Adjusted R Christmas (1 yes, 0 no) Adjusted R log AUTHORS Adjusted R log HDI Adjusted R log LTO -0.0345 Adjusted R E. Cell Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.1329*** -0.1368*** -0.1396*** -0.1376*** -0.1385*** -0.1386*** -0.1393*** Adjusted R (weighted) 0.045*** Spring (1 yes, 0 no) -0.0228 -0.025 -0.0236 -0.0088 -0.0074 -0.0073 -0.0087 Summer (1 yes, 0 no) -0.0475* -0.044* -0.0435* -0.0284 -0.029 -0.0292 -0.0296 Fall (1 yes, 0 no) -0.0295 -0.030 -0.0299 -0.0151 -0.015 -0.0151 -0.0172 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0312 -0.0306 -0.0287 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO -0.1766 Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% N.A. – Due to the lack of variability (no papers having PCA – Papers’ Corresponding Authors – from Africa) this factor was automatically removed from the model. A. Consolidated data set Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11) 0 1 2 3 4 5 6 7 8 9 Intercept Monday (1 yes, 0 no) 0.026*** 0.028*** 0.028*** 0.023*** 0.023*** 0.024*** 0.023*** Tuesday (1 yes, 0 no) 0.062*** 0.064*** 0.065*** 0.061*** 0.061*** 0.061*** 0.060*** Wednesday (1 yes, 0 no) 0.078*** 0.079*** 0.080*** 0.076*** 0.076*** 0.077*** 0.077*** Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.584*** -0.583*** -0.583*** -0.586*** -0.586*** -0.587*** -0.586*** Adjusted R (weighted) 0.316*** Spring (1 yes, 0 no) -0.006 0.0004 0.000003 0.026*** 0.026*** 0.027*** 0.028*** Summer (1 yes, 0 no) -0.058*** -0.053*** -0.053*** -0.027*** -0.027*** -0.026*** -0.026*** Fall (1 yes, 0 no) -0.067*** -0.058*** -0.058*** -0.032*** -0.032*** -0.031*** -0.031*** Adjusted R (weighted) America (1 yes, 0 no) -0.003 0.006 0.005 0.006 0.0061 0.0093 Africa (1 yes, 0 no) -0.067*** -0.019 -0.019 -0.019 -0.079*** -0.068** Asia (1 yes, 0 no) -0.042*** 0.004 0.003 0.004 -0.01 -0.011 Oceania (1 yes, 0 no) -0.053*** -0.029** -0.03** -0.03** -0.025* -0.022 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI -0.122*** -0.124*** Adjusted R (weighted) log LTO Adjusted R B. Physica A: Statistical mechanics and its applications Intercept Monday (1 yes, 0 no) -0.019 -0.211 -0.020 -0.025 -0.024 -0.022 -0.024 Tuesday (1 yes, 0 no) -0.0104 -0.008 -0.007 -0.004 -0.004 -0.005 -0.004 Wednesday (1 yes, 0 no) -0.015 -0.018 -0.016 -0.019 -0.018 -0.013 -0.013 Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.237*** -0.236*** -0.231*** -0.234*** -0.234*** -0.237*** -0.238*** Adjusted R (weighted) 0.131*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) -0.041** -0.029* -0.029* -0.006 -0.006 -0.003 -0.002 Fall (1 yes, 0 no) -0.022 -0.018 -0.018 0.005 0.003 0.004 0.008 Adjusted R (weighted) America (1 yes, 0 no) -0.015 -0.014 -0.015 -0.018 -0.017 -0.007 Africa (1 yes, 0 no) -0.032 0.008 0.007 0.0006 0.0049 0.006 Asia (1 yes, 0 no) -0.057*** -0.039*** -0.038*** -0.042*** -0.039** -0.042** Oceania (1 yes, 0 no) -0.116** -0.118*** -0.124*** -0.128*** -0.128*** -0.130** Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R C. PLOS ONE Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.568*** -0.5654*** -0.563*** -0.5663*** -0.5662*** -0.5666*** -0.5661*** Adjusted R Spring (1 yes, 0 no) -0.0189** -0.0102 -0.0105 0.0244*** 0.0244*** 0.0254*** 0.0257*** Summer (1 yes, 0 no) -0.078*** -0.0718*** -0.0718*** -0.03693*** -0.0369*** -0.0363*** -0.0361*** Fall (1 yes, 0 no) -0.084*** -0.0763*** -0.0761*** -0.0411*** -0.0411*** -0.0413*** -0.0409*** Adjusted R America (1 yes, 0 no) -0.0182*** -0.0049 -0.0053 -0.0049 -0.0046 -0.0076 Africa (1 yes, 0 no) -0.0628*** -0.0212 -0.0117 -0.0124 -0.0439 -0.0549 Asia (1 yes, 0 no) -0.0677*** -0.0197 -0.0212*** -0.0213*** -0.0301*** -0.0269** Oceania (1 yes, 0 no) -0.0427** -0.0154 -0.0166 -0.0162 -0.0134 -0.0180 Adjusted R Christmas (1 yes, 0 no) Adjusted R log AUTHORS Adjusted R log HDI -0.1479 -0.1104 Adjusted R log LTO -0.0079 Adjusted R D. Nature Intercept Monday (1 yes, 0 no) -0.008 -0.0068 -0.0072 -0.003 -0.0012 -0.0015 -0.0017 Tuesday (1 yes, 0 no) -0.0168 -0.0158 -0.016 -0.0152 -0.013 -0.012 -0.0137 Wednesday (1 yes, 0 no) -0.039** -0.042*** -0.042*** -0.039** -0.037** -0.036** -0.036** Thursday (1 yes, 0 no) -0.078*** -0.0766*** -0.0763*** -0.073*** -0.071*** -0.0708*** -0.0721*** Weekend (1 yes, 0 no) -0.166*** -0.167*** -0.167*** -0.165*** -0.164*** -0.164*** -0.163*** Adjusted R (weighted) 0.147*** Spring (1 yes, 0 no) -0.030** -0.043*** -0.042*** -0.029* 0.028* -0.027* -0.0269* Summer (1 yes, 0 no) -0.008 -0.0168 -0.0166 -0.003 -0.002 -0.002 -0.0006 Fall (1 yes, 0 no) -0.034** -0.0298 -0.0299** -0.017 -0.017 -0.016 -0.015 Adjusted R (weighted) America (1 yes, 0 no) -0.016 -0.013 -0.015 -0.014 -0.016 -0.068** Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO -0.141* Adjusted R E. Cell Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.0604* -0.0592* -0.0616* -0.0615* -0.0622* -0.0623* -0.0626* Adjusted R (weighted) 0.061*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) Oceania (1 yes, 0 no) -0.0381 -0.0322 -0.0333 -0.0357 -0.036 -0.0063 Adjusted R (weighted) Christmas (1 yes, 0 no) -0.014 -0.0145 -0.013 -0.0142 Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% N.A. – Due to the lack of variability (no papers having PCA – Papers’ Corresponding Authors – from Africa) this factor was automatically removed from the model. A. Consolidated data set † Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11) 0 1 2 3 4 5 6 7 8 9 Intercept Monday (1 yes, 0 no) -0.049 -0.048 -0.048 -0.047 -0.047 -0.0472 -0.047 Tuesday (1 yes, 0 no) 0.044 0.045 0.0453 0.0454 0.0454 0.0459 0.046 Wednesday (1 yes, 0 no) 0.068 0.069 0.069 0.07 0.07 0.07 0.07 Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.586 -0.585 -0.584 -0.586 -0.586 -0.586 -0.586 Adjusted R (weighted) 0.673*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) -0.0066 -0.009 -0.0092 -0.0006 -0.0006 -0.0007 -0.0009 Fall (1 yes, 0 no) -0.0091 -0.004 -0.004 0.0039 0.0039 0.0038 0.0036 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) -0.021 -0.0123 -0.0126 -0.0126 -0.012 -0.016 Asia (1 yes, 0 no) -0.03 -0.003 -0.0033 -0.0033 -0.0019 0.0009 Oceania (1 yes, 0 no) -0.024 -0.005 -0.0052 -0.0053 -0.0054 -0.013 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0003 -0.0004 -0.00002 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO -0.0187 Adjusted R B. Physica A: Statistical mechanics and its applications Intercept Monday (1 yes, 0 no) -0.032* -0.033* -0.032* -0.034* -0.034* -0.035* -0.040** Tuesday (1 yes, 0 no) -0.059*** -0.058*** -0.058*** -0.060*** -0.061*** -0.060*** -0.063*** Wednesday (1 yes, 0 no) -0.010 -0.008 -0.008 -0.008 -0.008 -0.009 -0.014 Thursday (1 yes, 0 no) -0.063*** -0.061*** -0.061*** -0.062*** -0.063*** -0.064*** -0.070*** Weekend (1 yes, 0 no) -0.232*** -0.230*** -0.230*** -0.233*** -0.234*** -0.235*** -0.241*** Adjusted R (weighted) 0.117*** Spring (1 yes, 0 no) -0.023 -0.021 -0.021 0.000 0.0001 0.0006 -0.0044 Summer (1 yes, 0 no) -0.010 -0.0047 -0.0046 0.017 0.018 0.019 0.016 Fall (1 yes, 0 no) -0.036** -0.0296 -0.0303* -0.007 -0.007 -0.007 -0.008 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) Asia (1 yes, 0 no) Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.023 -0.020 -0.009 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO -0.026 Adjusted R C. PLOS ONE Intercept Monday (1 yes, 0 no) -0065*** -0.065*** -0.065*** -0.064*** -0.064*** -0.064*** -0.064*** Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.532*** -0.531*** -0.530*** -0.532*** -0.532*** -0.531*** -0.532*** Adjusted R Spring (1 yes, 0 no) Summer (1 yes, 0 no) -0.0045*** -0.0069*** -0.0068*** 0.0013 0.0113 0.0011 0.00099 Fall (1 yes, 0 no) -0.0080*** -0.0028** -0.0026** 0.0056*** 0.005*** 0.0056*** 0.0053*** Adjusted R America (1 yes, 0 no) Africa (1 yes, 0 no) -0.0324*** -0.0186*** -0.0189*** -0.0189*** -0.0155*** -0.0207*** Asia (1 yes, 0 no) -0.0486*** -0.0077*** -0.0073*** -0.0073*** -0.0058*** -0.0041*** Oceania (1 yes, 0 no) -0.039*** -0.0069*** -0.0065*** -0.006*** -0.00666*** -0.011*** Adjusted R Christmas (1 yes, 0 no) Adjusted R log AUTHORS Adjusted R log HDI Adjusted R log LTO -0.0109*** Adjusted R D. Nature Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) -0.0049 -0.0051 -0.0049 -0.0052 -0.0052 -0.0045 -0.0041 Wednesday (1 yes, 0 no) -0.0237** -0.0238** -0.023** -0.0231** -0.023** -0.0229** -0.023** Thursday (1 yes, 0 no) -0.0177* -0.0176* -0.0171* -0.0156* -0.0156* -0.0153* -0.0148 Weekend (1 yes, 0 no) -0.217*** -0.2174*** -0.2177*** -0.2165*** -0.2166*** -0.2172*** -0.216*** Adjusted R (weighted) 0.247*** Spring (1 yes, 0 no) -0.015* -0.0107 -0.0109 -0.0016 -0.0016 -0.0017 -0.0019 Summer (1 yes, 0 no) -0.006 0.0019 0.0008 0.0101 0.0101 0.0104 0.0102 Fall (1 yes, 0 no) -0.009 -0.0057 -0.0066 0.0028 0.0028 0.0024 0.0023 Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) -0.0864 -0.0032 0.0014 0.0019 -0.030 -0.042 Asia (1 yes, 0 no) Oceania (1 yes, 0 no) -0.0143 -0.0003 -0.0008 -0.0008 0.002 -0.0210 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0007 -0.0017 -0.0017 Adjusted R (weighted) log HDI -0.245 -0.234 Adjusted R (weighted) log LTO -0.052 Adjusted R E. Cell Intercept Monday (1 yes, 0 no) -0.0089 -0.0146 -0.0126 -0.0128 -0.0131 -0.0108 -0.117 Tuesday (1 yes, 0 no) -0.0169 -0.0213 -0.0203 -0.0207 -0.0213 -0.0193 -0.0198 Wednesday (1 yes, 0 no) -0.0322 -0.0383 -0.0442* -0.0443* -0.0447* -0.0444* -0.0442* Thursday (1 yes, 0 no) -0.0457* -0.0454* -0.0436 -0.0447* -0.045* -0.0426 -0.044 Weekend (1 yes, 0 no) -0.1971*** -0.2033*** -0.2041*** -0.2022*** -0.2021*** -0.2006*** -0.1996*** Adjusted R (weighted) 0.07*** Spring (1 yes, 0 no) -0.0517** -0.0602*** -0.0585*** -0.0514** -0.0511** -0.0513** -0.045** Summer (1 yes, 0 no) -0.0374 -0.0462** -0.0464** -0.0393* -0.0395* -0.0389* -0.0369 Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0147 -0.0171 -0.0171 Adjusted R (weighted) log HDI -0.8308 -0.6873 Adjusted R (weighted) log LTO -0.2034 Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% † For the consolidated dataset for this timespan the regression coefficients’ standard errors could not be computed. Subsequently, there is no information about their statistical significance. N.A. – Due to the lack of variability (no papers having PCA – Papers’ Corresponding Authors – from Africa) this factor was automatically removed from the model. A. Consolidated data set † Models Variables/ characteristics M1 (equation 3) M2 (equation 4) M3 (equation 5) M4 (equation 6) M5 (equation 7) M6 (equation 8) M7 (equation 9) M8 (equation 10) M9 (equation 11) 0 1 2 3 4 5 6 7 8 9 Intercept Monday (1 yes, 0 no) 0.087 0.087 0.088 0.09 0.09 0.091 0.090 Tuesday (1 yes, 0 no) 0.143 0.143 0.143 0.145 0.145 0.145 0.145 Wednesday (1 yes, 0 no) 0.143 0.143 0.143 0.143 0.143 0.143 0.142 Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.467 -0.467 -0.467 -0.468 -0.468 -0.468 -0.469 Adjusted R (weighted) 0.72*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) -0.027 -0.005 -0.0046 -0.0047 -0.0088 -0.0103 Asia (1 yes, 0 no) -0.036 -0.0012 -0.0009 -0.0009 -0.0018 -0.00099 Oceania (1 yes, 0 no) -0.028 -0.008 -0.0071 -0.0073 -0.0067 -0.0095 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.001 -0.00102 -0.0024 Adjusted R (weighted) log HDI -0.0087 -0.0179 Adjusted R (weighted) log LTO -0.0052 Adjusted R B. Physica A: Statistical mechanics and its applications Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) -0.016 -0.014 -0.013 -0.013 -0.013 -0.010 -0.011 Weekend (1 yes, 0 no) -0.151*** -0.0149*** -0.151*** -0.151*** -0.151*** -0.152*** -0.158*** Adjusted R (weighted) 0.122*** Spring (1 yes, 0 no) -0.024 -0.0141 -0.0155 -0.0149 -0.014 -0.015 -0.014 Summer (1 yes, 0 no) -0.015 -0.007 -0.007 -0.007 -0.007 -0.006 -0.005 Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) -0.031 -0.023 -0.023 -0.023 -0.019 -0.046 Asia (1 yes, 0 no) -0.001 0.012 0.012 0.012 0.016 0.012 Oceania (1 yes, 0 no) -0.027 -0.031 -0.031 -0.031 -0.031 -0.071 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO -0.004 Adjusted R C. PLOS ONE † Intercept Monday (1 yes, 0 no) Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) Weekend (1 yes, 0 no) -0.382 -0.382 -0.381 -0.383 -0.383 -0.383 -0.384 Adjusted R Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) Adjusted R America (1 yes, 0 no) -0.0033 -0.0011 -0.0011 -0.0011 -0.0011 -0.0023 Africa (1 yes, 0 no) -0.034 -0.007 -0.0067 -0.0067 -0.011 -0.011 Asia (1 yes, 0 no) -0.047 -0.0041 -0.0038 -0.0038 -0.0048 -0.0041 Oceania (1 yes, 0 no) -0.037 -0.0130 -0.0121 -0.0121 -0.011 -0.013 Adjusted R Christmas (1 yes, 0 no) Adjusted R log AUTHORS Adjusted R log HDI -0.023 -0.0197 Adjusted R log LTO -0.0033 Adjusted R D. Nature Intercept Monday (1 yes, 0 no) -0.0011 -0.0021 -0.0022 -0.0005 -0.0005 -0.0003 -0.0005 Tuesday (1 yes, 0 no) Wednesday (1 yes, 0 no) Thursday (1 yes, 0 no) -0.0372* -0.0369* -0.0364* -0.0361** -0.0361** -0.0364*** -0.0365*** Weekend (1 yes, 0 no) -0.468*** -0.1473*** -0.1484*** -0.1487*** -0.1487*** -0.149*** -0.149*** Adjusted R (weighted) 0.137*** Spring (1 yes, 0 no) -0.0277** -0.0194* -0.0198* -0.0136 -0.0135 -0.0127 -0.0127 Summer (1 yes, 0 no) -0.0069 -0.0103 -0.01 -0.044 -0.004 -0.0034 -0.0031 Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) Africa (1 yes, 0 no) Asia (1 yes, 0 no) Oceania (1 yes, 0 no) -0.0006 0.0178 0.0192 0.0193 0.0172 0.0324 Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS -0.0004 -0.0002 -0.0013 Adjusted R (weighted) log HDI Adjusted R (weighted) log LTO Adjusted R E. Cell Intercept Monday (1 yes, 0 no) -0.0112 -0.0101 -0.012 -0.0146 -0.0147 -0.0137 -0.0144 Tuesday (1 yes, 0 no) -0.0151 -0.0167 -0.0173 -0.0144 -0.0144 -0.0136 -0.0142 Wednesday (1 yes, 0 no) -0.0533* -0.0504 -0.0529 -0.0605* -0.0606* -0.0602* -0.0617* Thursday (1 yes, 0 no) -0.071 -0.0112 -0.0109 -0.0083 -0.0083 -0.008 -0.0093 Weekend (1 yes, 0 no) -0.1348*** -0.1323*** -0.1313 -0.1395*** -0.1395*** -0.1375*** -0.1399*** Adjusted R (weighted) 0.039*** Spring (1 yes, 0 no) Summer (1 yes, 0 no) Fall (1 yes, 0 no) Adjusted R (weighted) America (1 yes, 0 no) -0.0258 -0.0183 -0.0214 -0.0214 -0.0203 -0.0641 Africa (1 yes, 0 no) N.A. N.A. N.A. N.A. N.A. N.A. Asia (1 yes, 0 no) -0.0022 0.0092 -0.0018 -0.0018 -0.0253 -0.0208 Oceania (1 yes, 0 no) Adjusted R (weighted) Christmas (1 yes, 0 no) Adjusted R (weighted) log AUTHORS Adjusted R (weighted) log HDI -0.531 -0.5196 Adjusted R (weighted) log LTO -0.1139 Adjusted R Note: * Statistically significant at level 10% ** Statistically significant at level 5% *** Statistically significant at level 1% † For the consolidated and PLOS ONE datasets for this timespan the regression coefficients’ standard errors could not be computed. Subsequently, there is no information about their statistical significance. N.A. – Due to the lack of variability (no papers having PCA – Papers’ Corresponding Authors – from Africa) this factor was automatically removed from the model. SI. Kurtosis formula and its components 𝐾𝑢𝑟𝑡𝑜𝑠𝑖𝑠 = 𝑁(𝑁+1)(𝑁−1)(𝑁−2)(𝑁−3) [ 𝑆 𝑉(𝑥) ] -3 (𝑁−1) (𝑁−2)(𝑁−3) where N is number of cases S is sum of deviations to the mean raised to the fourth power V(x) is the population variance in its unbiased version 𝑆 = ∑(𝑦 − 𝑦̅) 𝑉(𝑥) = 𝑆 𝑛 − 1 where S is sum of squared deviation to the mean 𝑆 = ∑(𝑦 − 𝑦̅) For normal distributions the value of kurtosis indicator should be 0. A positive value indicates a Leptokurtic distribution while a negative one appears in case of a Platykurtic distribution. SI. Geographic descriptive information All maps, which are designed within our study, rely on information for the whole timespan covered (2001-2016). At a glance, in figure SI.3, it is very easy to see that United States of America with a little bit more than 46 thousand papers (25.8% from the total) is at the top followed by China with more than 26 thousand papers (14.6% from the total), Germany with 10 600 papers (6% from the total), and United Kingdom with around 10 100 papers (5.7% from total). At the bottom of the ranking there are more than a dozen of important (from territorial view point) countries/territories which record no paper in the dataset: Afghanistan, Antarctica (as expected, since it is not a country per se), Democratic Republic of Congo, Chad, Guyana, Kirghizstan, Papua New Guinea, Paraguay, Somalia, South Sudan, Surinam, Turkmenistan, Tajikistan, Western Sahara etc. We have to admit that there are two potential immediate explanations. The first one is regarding the lack of development of the countries enlisted in this group, while the second one could be the fact that in our dataset, Nature and Cell are elite journals while PLOS ONE and Physica A could be considered as being in the higher segment of the journals rankings. Is not our purpose here to rank the journals but the last statement rely for scientometrics indicators like Impact Factor (IF) or Article Influence Score (AIS). The use of gross figures (number of papers) when one compares countries could be misleading due to the size effect (i.e. big – from the population viewpoint – countries tend to rank up and vice versa). Therefore, papers population ratio (PPR) is used (figure SI.4) for a better emphasize of the papers’ distribution per countries. The leading group comprise (alphabetically): Australia, Denmark, Iceland, Israel, Netherlands, Norway, Singapore, Switzerland, and Sweden. The second group is mainly formed by two compact territories: North America (Canada and United States of America) and Western Europe – with some exceptions which belong to the first group (Austria, Belgium, Finland, France, Germany, Italy, Luxembourg, Portugal, Spain, and United Kingdom). In addition, there are Estonia and Slovenia – two Eastern and Central European countries – and two insular ones from Pacific: New Zeeland and Taiwan. Figure SI.1. Geoprocessing model for compute and display the total number of papers published by corresponding authors per population for each country Figure SI.2. Geoprocessing model for Localization Quotient indicator (LQ) Figure SI.3. Distribution of the papers by PCA’s country of origin within the consolidate dataset (2001-2016) Note: The consolidated dataset rely on papers from: Physica A, PLOS ONE, Nature and Cell. PCA= Papers’ Corresponding Authors; MAC=Macao (China); HKG=Hong Kong (China); SGP=Singapore; CYP=Cyprus; ISR=Israel; TWN=Taiwan and EU=Europe. Countries marked with shading lines record no information in our dataset. Figure SI.4. Distribution of the PPR by PCA’s country of origin within the consolidate dataset (2001-2016) Note: The consolidated dataset rely on papers from: Physica A, PLOS ONE, Nature and Cell. PPR=Papers Population Ratio (population in million inhabitants); PCA= Papers’ Corresponding Authors;