Gender disparity in the authorship of biomedical research publications during the COVID-19 pandemic
CCOVID-19 amplifies gender disparities in research
Goran Muri´c a , Kristina Lerman a,b , and Emilio Ferrara a,b,c a USC Information Sciences Institute; a USC Department of Computer Science; a USC Annenberg School of CommunicationThis manuscript was compiled on June 12, 2020
Early evidence suggests that women, including female researchers,are disproportionately affected by the COVID-19 pandemic, with neg-ative consequences to their productivity. Here, we test this hypothe-sis by analyzing the proportion of male and female researchers thatpublish scientific papers during the pandemic. We use data frombiomedical preprint servers and Springer-Nature journals to showthat the fraction of women publishing during the pandemic drops sig-nificantly across disciplines and research topics, after controlling fortemporal trends. The impact is particularly pronounced for biomed-ical papers related to COVID-19 research. Further, by geocoding au-thor’s affiliations, we show that gender disparities are exacerbated inpoorer countries, even though these countries had less of a gendergap in research prior to the pandemic. Our results illustrate how ex-ceptional events like a global pandemic can further amplify genderinequalities in research. Our work could inform fairer scientific eval-uation practices, especially for early-career female researchers whomay be disproportionately affected by the pandemic. science of science | gender disparities | research evaluation A s of the date of this writing, the COVID-19 pandemichas claimed over 400,000 lives worldwide and disruptedalmost all aspects of human society. Stay-at-home orders, lockdowns and school closures have affected scientists as well, andespecially those caring for children or elder family members (1,2). As a result, the productivity of female scientists appearsto have decreased (3–5). Early evidence suggests that theproportion of publications with female authors is lower duringthe pandemic (6, 7); furthermore, the proportion of femalescientists publishing specifically about COVID-19 is droppingmuch lower than expected, by almost 23% (8, 9).To investigate whether COVID-19 exacerbates the gendergap in scientific publishing, we collect data about 80,875 pa-pers from biorXiv (43,459 papers), medrXiv (5,080 papers)and selected high-impact Springer-Nature journals (32,348papers). By using state-of-the art gender inference techniques,we identify the most likely gender of the 445,633 authors. Forthose publications that do not provide the location of theauthor’s affiliated institution, we infer it by using an ad hoclocalization method. We build a set of statistical models toestimate the expected prevalence of female authors publishingduring the COVID-19 pandemic and compare it to observeddata.Our analysis focuses on 361,234 authors (228,449 maleand 132,789 female) whose gender could be inferred with highaccuracy. Even after correcting for temporal trends, we observean average drop of 5% in the proportion of female authorsduring the COVID-19 pandemic. For research topics relateddirectly to COVID-19, the proportion of female scientists infirst author positions drops by 44%. Male authors are morelikely to publish papers about COVID-19 as the first authorson preprint servers. Overall, during the pandemic, scientistspublished at increasing rate on preprint servers. On average, we observe 22% more publications than expected and 28.7%increase in number of authors (26% increase of females and29% increase for males). Although the relative increasedproductivity applies to both genders, male are experiencingit at higher rates than female scientists, further widening thegender gap.When data is disaggregated by country, gender disparitiesbecome even more apparent, as high as 40% in some cases.Furthermore, the gender gap widening during the pandemic isinversely correlated with the country’s GDP per capita: largedeveloping countries like India and Brazil exhibit the largestgap widening.
Results
The gender gap in research during the COVID-19 pandemic.
First, we establish the baselines, that are the expected propor-tions of women that appear either as the first author and asthe author regardless of the authorship order. The expectedproportions are calculated using the OLS model and the his-torical data from the year 2019 (see Materials and Methods).We then calculate the true observed proportions of femaleauthors that publish during the COVID-19 pandemic in 2020and compare it to the baselines.The aggregate results suggest that the proportion of femaleauthors publishing on all topics as the first author has de-creased by 4 . A (expected arithmeticmean: 0 .
37; observed arithmetic mean: 0 . .
5% (expected arithmetic mean: 0 .
37; observed arith-metic mean: 0 . .
4% (expected arithmetic mean: 0 .
34; observedarithmetic mean: 0 . B ) during the pandemic(26% for biorXiv and 363% for medrXiv), and the drop ofpublications on Springer-Nature journals (84%). Such a trendsuggests that during the pandemic the researchers are trying tomake their results available as quickly and widely as possible,often circumventing lengthy peer-review process. Despite theabsolute increase in numbers, the fraction of women publishingon preprint servers drops significantly. That is particularly All authors conceived and designed the study. GM collected and analyzed the data. All authorswrote and revised the manuscript.All the code and data necessary to replicate the results are available on GitHub at:https://github.com/gmuric/GenderGapCovid To whom correspondence should be addressed: Emilio Ferrara. E-mail: [email protected] a r X i v : . [ c s . D L ] J un ll papers COVID-19 Non COVID-1910%20%30%40% First author
Expected proportion of female authorsObserved proportion of female authors
All papers COVID-19 Non COVID-19
Any author A
50 25 0 25 50AUBRITSEESINCAFRNLGBDEUSCHCNJPKR
All papers
50 25 0 25 50BRCAESINKRSEITAUGBFRDENLCNUSCHJP
COVID-19
50 25 0 25 50AUBRSEITESINCAFRCHUSNLGBDECNJPKR
Non COVID-19 C All medrXiv biorXiv Springer
Papers
Expected numberObserved number B All Female Male
Authors
PandemicTime t % o ff e m a l e a u t h o r s f f Exp f Obs d D Fig. 1. A. The comparison of the expected and observed proportion of female authors that publish during the COVID-19 pandemic.
Green bars represent theexpected proportion of female authors, estimated by the OLS model from the historical data from 2019. Orange bars are the observed proportion of female authors thatpublish during the COVID-19 pandemic. The papers are divided by the topic in three groups: 1) all papers from the dataset, 2) the papers that deal directly with the COVID-19and related topics, 3) the papers that are not about COVID-19 or related topics.
B. Number of papers and authors during the COVID-19 pandemic.
Green bars are theexpected numbers and the orange bars are the actual numbers. We observe high influx of papers on preprint servers and drop of submissions to peer-reviewed journals. Thattranslates to the increased number of authors
C. Percentage drop in proportion of female authors during the pandemic across the countries.
Orange points mark thepercentage decrease in fraction of female authors. Green points mark the increase.
D. Statitical model.
Illustration of the OLS model used to calculate the expected numbersand proportions. evident for COVID-19 related papers where the relative dropin proportion of females is 35% and 21% for medrXiv and biorXiv respectively. The trend for
Springer-Nature journalsis slightly different. We still observe the drop in proportionof female authors as the first authors across the disciplines.However, the fraction of women authors regardless of theauthorship order is increasing from 0 .
33 to 0 .
38, that is therelative increase of 14%. Note that only 0 .
3% of all papersfrom
Springer-Nature journals deals with COVID-19 relatedtopics. That is much smaller than 2 .
3% in biorXiv and 48%in medrXiv .We see the evidences that women are getting underrepre-sented in COVID-19 research, especially in papers publishedon preprint servers. That confirms some earlier suggestionsthat female first authors contribute less to COVID-19 studiesthan research in other areas (9). Women remain underrepre-sented even though we observe the increased publishing ratefor both genders during the pandemic (Fig. 1 B ). Country-level Analysis.
The global pandemic has touched al-most every nation on the planet. Countries, however, re-sponded differently to contain the spread of the disease. Thevariability of the measures and their timing, combined withdifferences in cultural norms and outbreak severity, have had a variable impact on scientific communities across the world.Country-level analysis better reveals global trends, as theaggregate data can be skewed by the countries with dispropor-tionately large number of publications such as the US withalmost 29% of all authors in the data set. Additionally, it canreveal regional, political and cultural differences between thenations.We identify the most likely country of the author based ontheir affiliation (see Materials and Methods) and measure thedifference between the expected and observed proportion offemale authors during the pandemic. Figure 1 C shows the pan-demic gender gap across countries. The values represent per-centage difference between the expected and observed fractionof female authors publishing in biorXiv , medrXiv and selected Springer-Nature journals between February and April 2020.Points to the left (orange) of the mid-line represent countrieswith less than expected fraction of female authors, and pointsto the right of the mid-line (in green) represent an increasein the fraction of female authors. Papers dealing explicitlywith the topic of COVID-19 (middle panel) show a greatergender gap than papers on other research topics (right panel).In Italy, for example, the relative drop in the proportion offemale authors is 40% (expected arithmetic mean: 0 .
43; ob-served arithmetic mean: 0 . % o f f e m a l e a u t ho r s BrazilChinaIndia ItalyJapanS.KoreaSwedenUS10K 20K 30K 40K 50K 60K
GDP per capita -10%0%10%20%30%40% P e r ce n t a g e d r op BrazilChinaIndia ItalyJapanS.KoreaSwedenUS
Europe and AmericasAsia
Fig. 2. Gender disparity in research and the GDP. (Upper) Proportion of womenactive in research is higher in countries with lower per capita GDP. (Lower) Theproportion of female authors of research articles decreased more than expected incountries with lower per capita GDP. affiliated with Italian institutions are publishing dispropor-tionately more than their female colleagues about COVID-19.Similar result applies to Australia, United Kingdom, Franceand Germany. The opposite is true for Switzerland and Japan,where the proportion of women publishing about COVID-19 in-creases by 5 .
6% and 15% respectively. Missing points indicatethat there was not enough data points during the pandemicto calculate the observed mean.The gender gap for non COVID-19 related research (rightpanel) exists during the pandemic, but it is smaller than forCOVID-19 research. Again, we observe a stark contrastsbetween the countries, with proportion of female authors (re-gardless of author order) publishing during the pandemicdecreasing in Australia, Brazil and Sweden, and gender gapshrinking in South Korea and Japan.Gender disparities in research are strongly associated with acountry’s wealth (10). Figure 2 shows that wealthier countries—with higher per capita gross domestic product (GDP)—haveproportionally fewer women in research, with Asian countrieshaving consistently fewer women researchers. In addition,wealthier countries show smaller pandemic-related drop inwomen’s participation in research than poorer countries, withwealthier Asian countries experiencing an increase in the pro-portion of active women researchers. This suggests that womenexperience bigger life disruptions in poorer countries, whichaffect their productivity.
Materials and Methods
Data.
The data on published papers is collected from three separatesources: (i) biorXiv (43 K papers and 232 K authors), providedby the Rxivist, the API provider for biorXiv publications (11); (ii) medrXiv (5 K papers and 35 K authors), scraped directly frommedrxiv.org; (iii) Springer-Nature (32 K papers and 210 K authors),data from 70 journals that have H-index larger than 12, collectedusing the
Springer-Nature OpenAccess API . For each source, wecollect the meta-data of all the papers published between January1 st rd medrXiv isfrom Jun 25 th biorXiv and medrXiv , wekeep the date of publishing. For Springer-Nature journals, we keepthe date of manuscript submission. We additionally store the title and the abstract of the papers. For each author, we preserve the name , affiliation and the authorship order. Additionally, we usesocio-economic data on countries, including their respective GDPper capita provided by Our World in Data (ourworldindata.org).As a heuristic to identify locations of authors’ institutions, we usedata provided by GRID (grid.ac).
Model.
To measure the discrepancy between expected and observedproportion of female authors, we first establish a baseline model.Using historical data before January 31 st f = bt + c , where f , the proportion of female authors, servesas a dependent variable, t is time measured in weeks, b and c arethe slope and the intercept. We train the separate model dependingon the level of disaggregation (country, publisher...). The modelis illustrated in Fig. 1 D . From the model, we derive the expectedfraction f Exp = P ˆ f/n , that is the mean fraction of all predictedvalues for the observed period and f Obs = P f true /n . The errorfor the predicted value is the mean standard error of the prediction.The error of the observed value is calculated as the standard errorof the mean SE = σ/ √ n . The errors for the percentage drops inFig. 1 C are calculated as the total sum of the errors of predictedand observed values. Identifying author’s gender and location.
To infer the author’s gen-der from their name we use a state-of-the-art tool, namely the genderizer.io API (12). Given an input name, the model returns agender and a confidence score between 0.5 and 1. The uncertaintyis greater for Asian names that often are not gender-specific (13).We filter out all authors for which the confidence score is lowerthan 0.8. Overall, 19% of names yields scores below such threshold,with Chinese and Korean topping the ranking with 54% and 41%respectively. To identify authors’ location, we first locate a toponymin the author’s affiliation. If there is no toponym, we query theGRID.ac database and find the institution with the most similarname and assign the institution’s location to the author.
Identifying COVID-19 papers.
The papers that deal specifically withCOVID-19 and similar topics are identified by the set of keywordsthat appear in their title or the abstract.
ACKNOWLEDGMENTS.
DARPA support via W911NF1920271.
1. KR Myers, et al., Quantifying the immediate effects of the covid-19 pandemic on scientists. arXiv preprint arXiv:2005.11358 (2020).2. M Kowal, et al., Who Is the Most Stressed During COVID-19 Isolation? Data From 27 Coun-tries.
PsyArXiv (2020).3. A Minello, The pandemic and the female academic.
Nature (2020).4. G Viglione, Are women publishing less during the pandemic?
Nature , 365–366 (2020).5. N Amano Patino, F Elisa, C Giannitsarou, Z Hasna, The Unequal Effects of Covid-19 onEconomists’ Research Productivity.
Camb. Work. Pap. Econ . (2020).6. N Fuchs-Schundeln, Gender structure of paper submissions at the Review of Economic Stud-ies during COVID-19: First evidence. (2020).7. A Fazackerley, Women’s research plummets during lockdown - but articles from men increase(2020).8. JP Andersen, MW Nielsen, NL Simone, RE Lewiss, R Jagsi, Meta-research: Is covid-19amplifying the authorship gender gap in the medical literature? (2020).9. P Vincent-Lamarre, CR Sugimoto, V Larivière, The decline of women’s research productionduring the coronavirus pandemic (2020).10. L Holman, D Stuart-Fox, CE Hauser, The gender gap in science: How long until women areequally represented?
PLOS Biol . , e2004956 (2018).11. RJ Abdill, R Blekhman, Tracking the popularity and outcomes of all bioRxiv preprints. eLife (2019).12. L Santamaría, H Mihaljevi´c, Comparison and benchmark of name-to-gender inference ser-vices. PeerJ Comput. Sci . , e156 (2018).13. J Huang, AJ Gates, R Sinatra, AL Barabási, Historical comparison of gender inequality inscientific careers across countries and disciplines. Proc. Natl. Acad. Sci . (2020).(2020).