Historical comparison of gender inequality in scientific careers across countries and disciplines
Junming Huang, Alexander J. Gates, Roberta Sinatra, Albert-Laszlo Barabasi
HHistorical comparison of gender inequality inscientific careers across countries anddisciplines
Junming Huang, , , , ∗ Alexander J. Gates, , ∗ Roberta Sinatra ,Albert-L´aszl ´o Barab´asi , , , † Center for Complex Network Research, Northeastern University, Boston, Massachusetts 02115, USA CompleX Lab, School of Computer Science and Engineering, University of Electronic Science andTechnology of China, Chengdu 611731, China Paul and Marcia Wythes Center on Contemporary China, Princeton University, Princeton, New Jersey08540, USA Department of Computer Science, IT University of Copenhagen, Copenhagen 2300, Denmark Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02115, USA Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston,Massachusetts 02115, USA ∗ These authors contributed equally to this work. † To whom correspondence should be addressed: [email protected] a r X i v : . [ c s . D L ] J u l here is extensive, yet fragmented, evidence of gender differences in academiasuggesting that women are under-represented in most scientific disciplines,publish fewer articles throughout a career, and their work acquires fewer ci-tations. Here, we offer a comprehensive picture of longitudinal gender dis-crepancies in performance through a bibliometric analysis of academic ca-reers by reconstructing the complete publication history of over 1.5 milliongender-identified authors whose publishing career ended between 1955 and2010, covering 83 countries and 13 disciplines. We find that, paradoxically,the increase of participation of women in science over the past 60 years wasaccompanied by an increase of gender differences in both productivity andimpact. Most surprisingly though, we uncover two gender invariants, findingthat men and women publish at a comparable annual rate and have equiva-lent career-wise impact for the same size body of work. Finally, we demon-strate that differences in dropout rates and career length explain a large por-tion of the reported career-wise differences in productivity and impact. Thiscomprehensive picture of gender inequality in academia can help rephrase theconversation around the sustainability of women’s careers in academia, withimportant consequences for institutions and policy makers. A ). Yet, these aggregatenumbers hide considerable disciplinary differences, as the fraction of women is as low as 15%in math, physics and computer science, and reaches 33% in psychology (Fig. 1 B ). We alsoobserve significant variations by country, finding that the proportion of female scientists can beas low as 28% in Germany, and reaches parity with 50% in Russia (Fig. 1 C ).The low proportion of women actively publishing in STEM captures only one aspect ofgender inequality. Equally important are the persistent productivity and impact differencesbetween the genders (Fig. 1 D ). We find that while on average male scientists publish 13.2 papersduring their career, female authors publish only 9.6, resulting in a 27% gender gap in totalproductivity (Fig. 2 A ). The difference is particularly pronounced among productive authors, asmale authors in the top 20% productivity bracket publish 37% more papers than female authors(Fig. 2 A ). Interestingly, the gender disparity disappears for median productive authors (middle20%), and reverses for the authors in the bottom 20%. The gender gap in total productivitypersists for all disciplines and all countries, with the exceptions of Cuba and Serbia (Fig. 2 B,C ).We also observe a large gender gap in total productivity for the highest ranked affiliations (Fig.2 D , determined from the 2019 Times Higher Education World University Rankings, SI S2.4).5e measure the total impact during an academic career by the number of citations accrued10 years after publication ( c ) by each paper published during a career (Fig. 1 D ), after remov-ing self-citations and re-scaling to account for citation inflation [31–33] (SI S2.6). We find thatmale scientists receive 30% more citations for their publications than female scientists (Fig.2 F ). Once again, the total impact difference is the largest for high impact authors, and reversesfor median and low impact authors: male authors in the top 20% in career impact receive 36%more citations than their female counterparts. The disparity in impact persists in all countriesand disciplines, Iran and Serbia serving as the only exceptions (Fig. 1 G,H ), and can be found,to a lesser extent, across all affiliations regardless of affiliation rank (Fig. 1 I ).Paradoxically, the gradual increase in the fraction of women in science [5] (Fig. 1 A ), isaccompanied by a steady increase in both the productivity and impact gender gaps (Fig. 2 E,J ).The gender gap in total productivity rose from near 10% in the 1950s, to a strong bias towardsmale productivity (35% gap) in the 2000s. The gender gap in total impact actually switchesfrom slightly more female impact in the 1950s to a 34% gap favoring male authors in the sametime frame. These observations disrupt the conventional wisdom that academia can achievegender equality simply by increasing the number of participating female authors.In summary, despite recent attempts to level the playing field, men continue to outnumber6omen 2 to 1 in the scientific workforce, and, on average, have more productive careers andaccumulate more impact. These results confirm, using a unified methodology spanning most ofscience, previous observations in specific disciplines and countries [2, 9, 11, 12, 16, 34–37], andsupport in a quantitative manner the perception that global gender differences in academia isa universal phenomenon persisting in every STEM discipline and in most geographic regions.Moreover, we find that the gender gaps in productivity and impact have increased significantlyover the last 60 years. The universality of the phenomenon prompts us to ask: What character-istics of academic careers drive the observed gender-based differences in total productivity andimpact?As total productivity and impact over a career represent a convolution of annual produc-tivity and career length, to identify the roots of the gender gap, we must separate these twofactors. Traditionally, the difficulty of reconstructing full careers has limited the study of an-nual productivity to a small subset of authors, or to career patterns observable during a fixedtime frame [38–45]. Access to the full career data allows us to decompose each author’s totalproductivity into his/her annual productivity and career length, defined as the time span betweena scientist’s first and last publication (Fig. 1 D , and SI, S2.1). We find that the annual produc-tivity differences between men and women are negligible: female authors publish on average7.33 papers per year, while male authors publish on average 1.32, a difference, that while statis-tically significant, is considerably smaller than other gender disparities (0.9%, p-value < − ,Fig. 2 K ). This result is observed in all countries, and disciplines (Fig. 2 L,M ) and we replicatedit in all three datasets (SI, S6). The gender difference in annual productivity is small even amongthe most productive authors (4% for the top 20%), and is reversed for authors of median andlow productivity.The average annual productivity of scientists has slightly decreased over time, yet, there isconsistently no fundamental difference between the genders (Fig. 2 O ). In other words, whenit comes to the number of publications per year, female and male authors are largely indistin-guishable, representing the first gender invariant quantity in performance metrics. As we shownext, this invariant, our key result, helps us probe the possible roots of the observed gender gaps.The comparable annual productivity of male and female scientists suggests that the largegender gap in total career productivity is determined by differences in career length. To test ifthis is the case, we measured the career length (time between first and last publication, Fig. 1 D )of each scientist in the database, finding that, on average, male authors reach an academic age of11.0 years before ceasing to publish, while the average terminal academic age of female authorsis only 9.3 years (Fig. 2 P ). This gap persists when authors are grouped by either discipline,8ountry, or affiliation (Fig. 2 Q,R,S ), and has been increasing over the past 60 years (Fig. 2 T ).Taken together, Fig. 2 K,T suggest that a significant fraction of the variation in total productivityis rooted in variations in career lengths. This conclusion is supported by a strong correlationbetween the career length gap and the career-wise productivity gap when we subdivide scientistsby discipline (Fig. 3 A , Pearson correlation 0.80) and country (Fig. 3 B , Pearson correlation0.56). For example, the gender gap in career length is smallest in applied physics (2.5%), asso is the gender gap in total productivity (7.8%). In contrast, in biology and chemistry, menhave 19.2% longer careers on average, resulting in a total productivity gender gap that exceeds35.1%.Given the largely indistinguishable annual productivity patterns, we next ask how muchof the total productivity and the total impact gender gaps observed above (Fig. 2 A,F ) couldbe explained by the variation in career length. For this, we perform a matching experimentdesigned to eliminate the gender gaps in career length. In the first population, for each femalescientist, we select a male scientist from the same country and discipline, and whose primaryaffiliation is ranked approximately the same (Fig. 3 C , and SI S4.2). In this matched population,the gender gap in total productivity increases significantly, from 27.4% to 47.0%, and the gendergap in total impact increases from 30.5% to 50.7%. This increase in both the total career9tatistics and gender gaps occurs because access to country and affiliation information is biasedtowards more recent and senior scientists. We then constructed a second matched population, asa subset of the first, in which each female scientist is matched to a male scientist from the samecountry, discipline, affiliation rank, and with exactly the same career length. In these careerlength matched samples the gender gap in total productivity reduces from 47.0% to 12.4% (Fig.3 D ). Furthermore, the gender gap in the total impact is also reduced from 50.7% to 13.1%(Fig. 3 E ). By matching pairs of authors based on observable confounding variables, such astheir country, discipline, and affiliation rank, we mitigate the influence of these variables onthe gender gaps. While matching cannot rule out that gender differences are influenced byunmatched variables that are unobserved here, the significant decrease in the productivity andimpact gender gaps when we control for career length suggests that publication career length isa significant correlate of gender differences in academia.Thus far, our analysis has correlated the career-wise gender gaps to systematic differencesin career lengths, prompting us to ask: Does the persistent gender differences in the cessationof academic publishing drive the career-wise gender gaps? While a full assessment of causalitywould require us to conduct a controlled intervention on the academic population (a whollyunfeasible scenario), our matching experiments suggest a counterfactual experiment to identify,10t the population level, the average causal effect of shorter careers on total productivity andimpact.To address the factors governing the end of a publishing career, we calculated the dropoutrate, defined as the yearly fraction of authors in the population who have just published theirlast paper [41, 46]. We find that on average 9.0% of active male scientists stop publishing eachyear, while the yearly dropout rate for women is nearly 10.8% (Fig. 4 A ). In other words, eachyear women scientists have a 19.5% higher risk to leave academia than male scientists, givingmale authors a major cumulative advantage over time. Moreover, this observation demonstratesthat the dropout gap is not limited to junior researchers, but persists at similar rates throughoutscientific careers.The average causal effect of this differential attrition is demonstrated through a counter-factual experiment in which we shorten the careers of male authors to simulate dropout ratesmatching their female counterparts at the same career stage (Fig. 4 C,D , and SI S4.5). We findthat under similar dropout rates, the differences in total productivity and total impact reduce byroughly two thirds, namely from 27.4% to 9.0% and from 30.5% to 12.1% respectively. Thisresult, combined with our previous matching experiment (Fig. 3
D,E ), suggests that the differ-ence in dropout rates is a key factor in the observed total productivity and impact differences,11ccounting for about 67% of the productivity and impact gaps. Yet, the differential dropout ratesdo not account for the whole effect, suggesting that auxiliary disruptive effects, from perceptionof talent to resource allocation [15, 21], may also play a potential role.The reduction of the gender gaps in both total productivity and total impact by similaramounts suggests that total impact, being the summation over individual articles, may be pri-marily dependent on productivity [15]. To test this hypothesis, we conducted a final matchingexperiment in which we selected a male author from the same country, discipline, approximatelythe same affiliation rank, and with exactly the same number of total publications as each femaleauthor (SI S4.3). In these matched samples the gender gap in the total impact is completelyeliminated, dropping from 50.7% in favor of male authors, to 1.9% in favor of female authors(Fig. 4 E ). This reveals a second gender invariant quantity—there is no discernible differencein impact between male and female scientists for the same size body of work. This secondgender invariant reinforces our main finding that it is career length differences which drive thetotal productivity gap, that consequently drive the impact gender gap in academia. Interestingly,controlling for productivity similarly flips the gender gap in the total number of collaboratorsthroughout a career (SI S4.4).Our ability to reconstruct the full careers of scientists allowed us to confirm the differences12n total productivity and impact between female and male scientists across disciplines and coun-tries since 1955. We showed that the gradual increase in the fraction of women in STEM wasaccompanied by an increase in the gender disparities in productivity and impact. It is par-ticularly troubling that the gender gap is the most pronounced among the highly productiveauthors—those that train the new generations of scientists and serve as role models for them.Yet, we also found two gender invariants, revealing that active female and male scientists havelargely indistinguishable yearly performance, and receive a comparable number of citations forthe same size body of work. These gender-invariant quantities allowed us to show that a largeportion of the observed gender gaps are rooted in gender-specific dropout rates and the subse-quent gender-gaps in career length and total productivity. This finding suggests that we mustrephrase the conversation about gender inequality around the sustainability of woman’s careersin academia, with important administrative and policy implications [16, 36, 47–52].It is often argued that in order to reduce the gender gap, the scientific community mustmake efforts to nurture junior female researchers. We find, however, that the academic system isloosing women at a higher rate at every stage of their careers, suggesting that focusing on juniorscientists alone may not be sufficient to reduce the observed career-wise gender imbalance.The cumulative impact of this career-wide effect dramatically increases the gender disparity13or senior mentors in academia, perpetuating the cycle of lower retention and advancement offemale faculty [10, 52–54].Our focus on closed careers limited our study to careers that ended by 2010, eliminatingcurrently active careers. Therefore, further work is needed to detect the impact of recent effortsby many institutions and funding agencies to support the participation of women and minorities[40, 55]. Our analysis of all careers and the factors that dominate the gender gap could offera base line for such experimental studies in the future. At the same time, our work suggeststhe importance of temporal controls for studying academic careers, and in particular, genderinequality in academia.It is important to emphasize that the end of a publishing career does not always imply an endof an academic career; authors who stopped publishing often retain teaching or administrativeduties, or conduct productive research in industry or governmental positions, with less pressureto communicate their findings through research publications. Scientific publications representonly one of the possible academic outputs; in some academic disciplines books and patentsare equally important, and all three of our data sources (WoS, MAG and DBLP) tend to over-represent STEM and English language publications [56], thereby possibly biasing our analysis.Furthermore, our bibliometric approach can draw deep insight into the large-scale statistical14atterns reflecting gender differences, yet we cannot observe and test potential variation in theorganizational context and resources available to individual researchers [13, 57]. However, ourresults do suggest important consequences for the organizational structures within academic de-partments. Namely, we find that a key component of the gender gaps in productivity and impactmay not be rooted in gender-specific processes through which academics conduct research andcontribute publications, but by the gender-specific sustainability of that effort over the courseof an entire academic career. References [1] T. J. Ley, B. H. Hamilton,
Science , 14721474 (2008).[2] V. Lariviere, C. Ni, Y. Gingras, B. Cronin, C. R. Sugimoto,
Nature , 211213 (2013).[3] H. Shen,
Nature News , 22 (2013).[4] A. E. Lincoln, S. Pincus, J. B. Koster, P. S. Leboy,
Social Studies of Science , 307(2012).[5] L. Holman, D. Stuart-Fox, C. E. Hauser, PLOS Biology , e2004956 (2018).[6] K. Zippel, Women in Global Science: Advancing Academic Careers through InternationalCollaboration (Stanford University Press, Stanford, 2017).[7] Gender in the global research landscape,
Tech. rep. , Elsevier (2017).[8] N. A. of Sciences, N. A. of Engineering, I. of Medicine,
Beyond Bias and Barriers:Fulfilling the Potential of Women in Academic Science and Engineering (The NationalAcademies Press, Washington, DC, 2007).[9] J. R. Cole, H. Zuckerman,
Advances in Motivation and Achievements , 217256 (1984).1510] J. S. Long, Social Forces , 12971316 (1990).[11] J. S. Long, Social Forces , 159178 (1992).[12] Y. Xie, K. A. Shauman, American Sociological Review , 847870 (1998).[13] M. F. Fox, K. Whittington, M. Linkova, Handbook of Science and Technology Studies.Mass MIT Press, Cambridge (2017).[14] G. Abramo, C. A. DAngelo, A. Caprasecca,
Scientometrics , 517539 (2009).[15] V. Lariviere, E. Vignola-Gagne, C. Villeneuve, P. Gelinas, Y. Gingras, Scientometrics ,483498 (2011).[16] Y. Xie, K. A. Shauman, Women in Science: Career Processes and Outcomes (HarvardUniversity Press, 2003).[17] P. L. Carr,
Annals of Internal Medicine , 532 (1998).[18] S. Stack,
Research in Higher Education , 891 (2004).[19] M. F. Fox, Social Studies of Science , 131 (2005).[20] E. Z. Cameron, A. M. White, M. E. Gray, BioScience , 245252 (2016).[21] J. Duch, et al. , PLOS ONE (2012).[22] R. M. Borsuk, et al. , BioScience , 985 (2009).[23] M. Jadidi, F. Karimi, H. Lietz, C. Wagner, Advances in Complex Systems p. 1750011(2017).[24] K. Uhly, L. Visser, K. Zippel,
Studies in Higher Education p. 123 (2015).[25] P. van den Besselaar, U. Sandstrm,
PLOS ONE , e0183301 (2017).[26] E. Leahey, Gender & Society , 754780 (2006).[27] P. Bronstein, L. Farnsworth, Research in Higher Education , 29 (1998).[28] S. Fortunato, et al. , Science , eaao0185 (2018).[29] A. Clauset, C. R. Shalizi, M. E. Newman,
SIAM review , 661 (2009).[30] A. Sinha, et al. , Proceedings of the 24th International Conference on World Wide Web pp.243–246.[31] R. Sinatra, D. Wang, P. Deville, C. Song, A.-L. Barab`asi,
Science , aaf5239 (2016).1632] D. Wang, C. Song, A.-L. Barabsi,
Science , 127132 (2013).[33] F. Radicchi, S. Fortunato, C. Castellano,
Proceedings of the National Academy of Sciences , 1726817272 (2008).[34] I. E. Broder,
Economic Inquiry , 116127 (1993).[35] G. Abramo, C. A. D’Angelo, A. Caprasecca, Scientometrics , 137 (2009).[36] D. Maliniak, R. Powers, B. F. Walter, International Organization , 889922 (2013).[37] J. D. West, J. Jacquet, M. M. King, S. J. Correll, C. T. Bergstrom, PLOS ONE , e66212(2013).[38] K. Rorstad, D. W. Aksnes, Journal of Informetrics , 317333 (2015).[39] M. R. E. Symonds, N. J. Gemmell, T. L. Braisher, K. L. Gorringe, M. A. Elgar, PLOSONE , e127 (2006).[40] P. van Arensbergen, I. van der Weijden, P. van den Besselaar, Scientometrics , 857868(2012).[41] D. Kaminski, C. Geisler, Science , 864866 (2012).[42] J. M. Box-Steffensmeier, et al. , PLOS ONE , e0143093 (2015).[43] S. F. Way, D. B. Larremore, A. Clauset, Proceedings of the 25th International Conferenceon World Wide Web (International World Wide Web Conferences Steering Committee,2016), pp. 1169–1179.[44] S. F. Way, A. C. Morgan, A. Clauset, D. B. Larremore,
Proceedings of the NationalAcademy of Sciences , E9216E9223 (2017).[45] L. A. Hechtman, et al. , Proceedings of the National Academy of Sciences p. 201800615(2018).[46] S. Milojevi´c, F. Radicchi, J. P. Walsh,
PNAS p. 201800478 (2018).[47] H. Etzkowitz, C. Kemelgor, B. Uzzi,
Athena Unbound: The Advancement of Women inScience and Technology (Cambridge University Press, 2000).[48] S. J. Ceci, W. M. Williams,
Proceedings of the National Academy of Sciences ,31573162 (2011).[49] J. M. Sheltzer, J. C. Smith,
Proceedings of the National Academy of Sciences ,1010710112 (2014). 1750] W. M. Williams, S. J. Ceci,
Proceedings of the National Academy of Sciences ,53605365 (2015).[51] M. W. Nielsen, et al. , Proceedings of the National Academy of Sciences , 17401742(2017).[52] E. A. Cech, M. Blair-Loy,
Proceedings of the National Academy of Sciences p. 6.[53] N. R. Council,
Gender Differences at Critical Transitions in the Careers of Science, Engi-neering, and Mathematics Faculty (The National Academies Press, 2010).[54] L. R. Martinez, K. R. OBrien, M. R. Hebl,
Journal of Women’s Health , 580586 (2017).[55] A. J. Stewart, V. Valian, An inclusive academy: Achieving diversity and excellence (MitPress, 2018).[56] P. Mongeon, A. Paul-Hus,
Scientometrics , 213 (2016).[57] C. Wennerc, A. Wold,
Nature , 3 (1997).[58] L. Liu, et al. , Nature , 396399 (2018).[59] B. K. AlShebli, T. Rahwan, W. L. Woon,
Nature Communications , 5163 (2018).[60] P. Reuther, B. Walter, M. Ley, A. Weber, S. Klink, Managing the Quality of Person Namesin DBLP , Lecture Notes in Computer Science (Springer, Berlin, Heidelberg, 2006), p.508511.[61] S. F. Way, A. C. Morgan, D. B. Larremore, A. Clauset,
Proceedings of the NationalAcademy of Sciences p. 201817431 (2019).[62] F. Karimi, C. Wagner, F. Lemmerich, M. Jadidi, M. Strohmaier,
Proceedings of the 25thInternational Conference Companion on World Wide Web , WWW ’16 Companion (Inter-national World Wide Web Conferences Steering Committee, 2016), pp. 53–54.[63] M. M. King, C. T. Bergstrom, S. J. Correll, J. Jacquet, J. D. West,
Socius ,2378023117738903 (2017). Acknowledgements
Special thanks to Alice Grishchenko for help with the visualizations. Also, thanks to the won-derful research community at the CCNR, and in particular Yasamin Khorramzadeh, for helpfuldiscussions, and to Kathrin Zippel at Northeastern University for valuable suggestions. A.J.G.and A.-L.B. were supported in part by the Templeton Foundation, contract
Data & Code Availability
The DBLP and MAG are publicly available from their source websites (see SI). Other relatedand relevant data and code are available from the corresponding author upon request.
Author contributions
J.H., A.J.G., R.S., and AL.B. collaboratively conceived and designed the study, and drafted,revised, and edited the manuscript. J.H. and A.J.G. analyzed the data and ran all simulations.
Supplementary Materials
Materials and MethodsTable S1 - S8Fig S1 - S8 19 ndiaUnitedStates Argentina South Africa AustraliaPortugal nu m b e r o f p a p e r s c average annual productivity yearly productivitydropout totalimpact career length: 11.53 years totalproductivity RussiaSwedenUnitedKingdomCanada
Figure 1:
Gender imbalance since 1955. A , The number of active female (orange) and male (blue)authors over time and (inset) the total proportions of authors. B , C , The proportion of female authors inseveral B , disciplines; and C , countries; for the full list see SI, Tables S3 & S4. D , The academic careerof a scientist is characterized by his or her temporal publication record. For each publication we identifythe date (gold dot) and number of citations after 10 years c (gold line, lower). The aggregation byyear provides the yearly productivity (light gold bars), while the aggregation over the entire career yieldsthe total productivity (solid yellow bar, right) and total impact (solid yellow bar, right). Career length iscalculated as the time between the first and last publication, the annual productivity (dashed gold line)represents the average yearly productivity. An author drops out from our data when he published his lastarticle.
20 40 60Number of papersLow20%Middle20%Top20%Overall +7.8%+0.9% -37.5%-27.4% A B C I n s t i t u t e r a n k D N u m b e r o f p a p e r s E F G H I n s t i t u t e r a n k I N u m b e r o f c i t a t i o n s J K L M I n s t i t u t e r a n k N A v e r a g e nu m b e r o f p a p e r s p e r y e a r O P Q R I n s t i t u t e r a n k S Y e a r s T T o t a l p r o d u c t i v i t y T o t a l i m p a c t A nnu a l p r o d u c t i v i t y C a r ee r l e n g t h Female Male
Figure 2:
Gender gap in scientific publishing careers.
The gender gap is quantified by the relativedifference between the mean for male (blue) and female (orange) authors. In all cases the relative genderdifferences are statistically significant as established by the two-sided t-test, with p-values less than 10 − unless otherwise stated (see SI S4.1 for test statistics). A - E , Total productivity broken down by: A ,percentile; B , discipline; C , country; D , affiliation rank; and E , decade. The gender gap in productivityhas been increasing from the 1950s to the 2000s. F - J , Total impact subdivided by: F , percentile; G ,discipline; H , country; I , affiliation rank; and J , decade. K - O , Annual productivity is nearly identicalfor male and female authors when subdivided by: K , percentile; L , discipline; M , country; N , affiliationrank; and O , decade. P - T , Career length broken down by: P , percentile; Q , discipline; R , country; S ,affiliation rank; and T , decade. G e n d e r g a p i n t o t a l p r o d u c t i v i t y A Mathematics ComputersciencePhysicsChemistryBiologyAgronomy EngineeringEnvironmentHealthscience AppliedphysicsPoliticalsciencePsychologyOthers -40% -30% -20% -10% 0% 10% 20% 30% 40%Gender gap in career length-60%-40%-20%0%20%40%60%80%100% G e n d e r g a p i n t o t a l p r o d u c t i v i t y B GermanySpainFrance UnitedKingdomItaly RussiaUnitedStatesArmenia CameroonColombia EcuadorIndonesiaLuxembourg QatarSerbia Ukraine Uzbekistan
MariaAmerican physicist with a career of 10 years,mostly working at an institute ranked 200th ~ 250thAngela Italian mathematician with a career of 15 years,mostly working at a top-20 institute Christiana German psychologist with a career of 30 years,mostly working at an institute ranked 50th ~ 90th... C MarioAmerican physicist with a career of 10 years,mostly working at an institute ranked 200th ~ 250thAngelo Italian mathematician with a career of 15 years,mostly working at a top-20 institute Christopher German psychologist with a career of 30 years,mostly working at an institute ranked 50th ~ 90th...
Population Controldiscipline,country, andinstitute rank Controldiscipline,country,institute rank,and careerlength05101520253035 T o t a l p r o d u c t i v i t y -27.4% -47.0% -12.4% D Population Controldiscipline,country, andinstitute rank Controldiscipline,country,institute rank,and careerlength0100200300400500600700800 T o t a l i m p a c t -30.5% -50.7% -13.1% E Female Male
Figure 3:
Controlling for career length. A , B , The gender gap in career length strongly correlates withthe productivity gap across A , disciplines (Pearson correlation 0.80) and B , countries (Pearson correlation0.56). C , In a matching experiment, equal samples are constructed by matching every female author witha male author having an identical discipline, country, and career length. D , The average productivityprovided by the matching experiment for career length compared to the population; the gender gap isreduced from 27.4% in the population to 12.4% in the matched samples. E , The average impact providedby the matching experiment for career length compared to the original unmatched sample. Where visible,error bars denote one std. D r o p o u t r a t e Female (overall)Male (overall) 0 5 10 15 20 25 30 35 40Academic age0.00.20.40.60.81.0 C u m u l a t i v e s u r v i v a l r a t e Population Age-dependentdropout controlled0246810121416 T o t a l p r o d u c t i v i t y -27.4% -9.0% Population Age-dependentdropout controlled050100150200250300 T o t a l i m p a c t -30.5% -12.1% Population Controldiscipline,country, andinstitute rank Controldiscipline,country,institute rank,and totalproductivity0100200300400500600700800 T o t a l i m p a c t -30.5% -50.7% +1.9% A BC D E
Female Male
Figure 4:
Author’s age-dependent dropout rate. A , Dropout rate for male (blue) and female (orange)authors over their academic ages. B , The cumulative survival rate for male and female authors over theiracademic ages. C , D , The effect of controlling for the age-dependent dropout rate on the gender gaps in C , total productivity and D , impact. E , The total impact gap is eliminated in the matched sample basedon total productivity. upplemental Information Historical comparison of gender inequality in scien-tific careers across countries and disciplines
Junming Huang , , , ∗ , Alexander J. Gates , ∗ , Roberta Sinatra , and Albert-L´aszl´o Barab´asi , , , †1 Center for Complex Network Research, Northeastern University, Boston, Massachusetts02115, USA CompleX Lab, School of Computer Science and Engineering, University of ElectronicScience and Technology of China, Chengdu 611731, China Paul and Marcia Wythes Center on Contemporary China, Princeton University, Princeton,New Jersey 08540, USA Department of Computer Science, IT University of Copenhagen, Copenhagen 2300, Den-mark Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, Massachusetts02115, USA Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston,Massachusetts 02115, USA* These authors contributed equally to this work.† To whom correspondence should be addressed: [email protected]
Contents
S1 Data sets 26
S1.1 Web of Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26S1.2 Microsoft Academic Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26S1.3 DBLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
S2 Data pre-processing 27
S2.1 Identifying scientific careers . . . . . . . . . . . . . . . . . . . . . . . . . . . 27S2.2 Career selection criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28S2.3 Country label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29S2.4 Affiliation rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30S2.5 Gender assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30S2.5.1 WoS and MAG authorship alignment . . . . . . . . . . . . . . . . . . 31S2.5.2 Gender label inference . . . . . . . . . . . . . . . . . . . . . . . . . . 31S2.5.3 Gender label accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 32S2.6 Citation count and normalization . . . . . . . . . . . . . . . . . . . . . . . . . 32S2.6.1 Citations within Web of Science . . . . . . . . . . . . . . . . . . . . . 32S2.6.2 Removing self-citations . . . . . . . . . . . . . . . . . . . . . . . . . 32242.6.3 Citation normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 32S2.7 Discipline hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33S2.8 Data summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
S3 Indicators 35
S3.1 Characterizing the scientific career . . . . . . . . . . . . . . . . . . . . . . . . 35S3.2 Characterizing the scientific population . . . . . . . . . . . . . . . . . . . . . 35
S4 Methods 36
S4.1 Statistical significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36S4.2 Career length matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36S4.3 Total productivity matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37S4.4 Relationship between productivity and number of collaborators . . . . . . . . . 38S4.5 Controlling for the dropout rate . . . . . . . . . . . . . . . . . . . . . . . . . . 38
S5 Detailed results on Web of Science 39
S5.1 Distributions of measurements . . . . . . . . . . . . . . . . . . . . . . . . . . 39S5.2 Statistics and gender gaps in each discipline, country, and year . . . . . . . . . 40
S6 Replication in other databases 40
S6.1 Microsoft Academic Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40S6.2 DBLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
S7 Tables and Figures 41 S1.1 Web of Science
The primary source of publication data for this project is the Clarivate Analytics’ Web of Sci-ence Core Collection (WoS) database, covering the Science Citation Index Expanded and theSocial Sciences Citation Index. We considered all articles, reviews, and letters published be-tween 1900 to 2016, and we excluded all other types of documents (e.g. editorials, letters to theeditor, and book reviews), published that are generally not peer-reviewed. In total, we considerthe publication history of 7,863,861 authors who contributed a total of 101,961,318 authorshipsto 53,788,499 publications. Additionally, we extracted the citation history for all publications,resulting in 694,439,758 citation relationships.The WoS dataset assigns each article to at least one scientific discipline in a three-layerhierarchy of 153 disciplines. For example, a paper is assigned to “Science & Technology”(top layer), “Life Sciences & Biomedicine” (middle layer) and “Biophysics” (leaf layer). Theassignment is primarily based on each publication’s journal information, but a select few mul-tidisciplinary journals (e.g.
Nature and
Science ) provide article-specific categories. For ourpurposes, the 153 disciplines in the leaf layer are too fine grained, while the other two layersdo not provide a detailed enough classification. Therefore, we grouped the leaf layer categoriesinto a coarser partition as described in Section S2.7.
S1.2 Microsoft Academic Graph
The Microsoft Academic Graph (MAG) is a comprehensive index of scientific publications inboth journals and conferences [30]. In November 2017, we downloaded 77,642,549 publica-tions through the authorized API, freely provided by Microsoft Research available at .26hese publications were produced by 88,223,538 authors who contributed a total of 211,897,481authorships.
S1.3 DBLP
The DBLP Computer Science Bibliography contains 4,181,940 publications from computerscience journals and conference proceedings (downloaded June 5th, 2018, https://dblp.uni-trier.de ). We consider all articles, review articles, proceedings, book chapters, anddissertations published between 1970 and 2010, and exclude all other types of documents (e.g.webpages and notes), that are generally not peer-reviewed. These publications were producedby 2,129,492 authors who contributed a total of 12,090,783 authorships.
S2 Data pre-processing
S2.1 Identifying scientific careers
While the problem of name disambiguation for scientific publications is notoriously difficult,the scientific community has recognized several disambiguation procedures that effectively cap-ture scientific careers. Here, to demonstrate the robustness of our results to database bias andauthor disambiguation errors, we replicated our analysis in several databases, each with its ownstrengths and weakness. All three of the data sets we used (WoS, MAG, and DBLP) maintainunique author identifiers based on a different name disambiguation procedure. The WoS andMAG use their own proprietary algorithms which have been successfully used to study scien-tific careers (for example, see WoS [58], and MAG [59]). While the specifics of the algorithmsare not available, it is reasonable to assume that both algorithms are on par, if not far betterthan prevailing methods developed by independent academic groups. For instance, the MAG27rocesses online CVs and Wikipedia profiles to associate individual authors with their papers.Additionally, both algorithms incorporate the self-curated career profiles provided by the OpenResearcher and Contributor ID (ORCID). On the other hand, the DBLP name disambiguationis based on a unique identifier assigned to authors when manuscripts are submitted to registeredComputer Science conferences or journals. Thus, the DBLP database has arguably the mostreliable name disambiguation available in a bibliometric database [60], and has also been usedin several peer-reviewed studies to study scientific careers [23, 44].While many of the name disambiguation algorithms are able to reconstruct the careers forauthors with European names, they often have difficulty disambiguating the careers of authorswith Asian names. This, combined with the known issues inferring the gender of Asian names(see below), motivates us to adapt a conservative approach and exclude all researchers fromChina (mainland, Hong Kong, Macau, & Taiwan), the Democratic People’s Republic of Korea,Japan, Malaysia, the Republic of Korea, and Singapore.Critically, by replicating our study in three different databases, each with an independentmethod for name disambiguation, we argue that any possible errors resulting from misappro-priated or missing publications are negligible.
S2.2 Career selection criteria
In order to study comprehensive scientific careers, we limit our analysis to authors that: (i) haveauthored at least two papers, (ii) their publication careers span more than one year (365 days),(iii) have an average annual publication rate of less than 20 papers per year, (iv) have publishedtheir last article on or before Dec 31st, 2010. Our main conclusions do not change if morestringent selection criteria or modified filters are used to select the subset of scientists.28
To facilitate the assignment of author gender (Section S2.5) and analyze national variations inthe gender gap, we associate each author to a single country as follows. In the WoS, manyauthorships are indexed along with an affiliation address, including an institution name, streetaddress, city, zipcode and country. For each author, we identify all authorships with a knownaffiliation address and keep only the country of the affiliation. We then assign a country label toan author based on the most frequently occurring country of affiliation. This frequency-basedmethod results in a country label for a total of 1,876,950 authors.We also considered an alternative method for country assignment in which the earliest coun-try affiliation was used for each author. This second method disagrees with the frequency-basedapproach for only 58,576 (3 . It has been suggested that the author’s primary affiliation contributes significantly towards theoverall productivity [61]. We collected the ranking information from The Times Higher Educa-tion World University Rankings 2019 , a global ranking that indexes more than 1,250 univer-sities. Then we associate authors with those universities by examining the affiliations in theirpublications. Considering university names could be spelled in multiple ways, such as abbrevi-ations, we queried every affiliation name in the Web of Science publication data, as well as alluniversity names in the Times Higher Education World University Rankings, with Google Mapsto disambiguate those variations into unique university names. Each author is then assigned therank of the highest ranked institute to which she or he is affiliated over the course of the career.Among 1,876,950 authors with at least one affiliation recorded, 1,296,995 authors have beenaligned to an institute rank. S2.5 Gender assignment
In the absence of gender information for authors in the WoS, MAG, and DBLP we infer au-thor gender based on author name and country. Specifically, we used a commercially avail-able service
Genderize.io which integrates publicly available census statistics to build a namedatabase mapping a country-specific first name to a binary gender label. Due to a low accuracyof the gender assignment algorithm for Asian names, we excluded all researchers from China(mainland, Hong Kong, Macau, & Taiwan), the Democratic People’s Republic of Korea, Japan,Malaysia, the Republic of Korea, and Singapore. We also excluded researchers from Brazil.This gender assignment strategy has been successfully employed in several academic research https://genderize.io/ S2.5.1 WoS and MAG authorship alignment
A practical challenge lies in the fact that the WoS dataset records the full first name of authorson most papers published after 2006, while most papers before 2006 authorships are recordedwith initials only. Among a total of 7,817,639 authors in the Web of Science dataset, only2,171,290 of them have the full first name recorded for at least one authorship. Therefore, weleveraged our access to multiple datasets to help complete the missing metadata from the papers.Specifically, we aligned papers in the WoS to MAG based on the following criteria: (a) bothpapers are published in the same year, (b) both papers have identical sets of author last names,(c) the two papers differ in title by no more than 25%, estimated by the Levenshtein distancebetween two titles divided by the length of the WoS paper title. Such matches were found for23,615,112 papers. We aligned authorships in each paper pair by comparing first initial andlast name. For example, if a WoS paper records an author “J. Smith” and its matched paperin MAG records “John Smith”, we complete the authorship “J Smith” with “John Smith”. Weskipped papers with multiple authors sharing the same last name. This procedure allowed us tocomplete the first name for additional 1,334,886 authors.Note that this procedure only filled in missing metadata at the level of individual papers.The alignment between WoS and MAG was not sued to infer an author’s career.
S2.5.2 Gender label inference
Out of the 3,427,232 WoS authors with full first name, we successfully inferred the gender of3,003,815 authors, including 2,146,926 male authors and 856,889 female authors.31
As reported in Karimi et al. (2016) [62], genderize.io achieves a minimum accuracy of 80%.To assess the accuracy of the gender assignment process for our data, we compared the inferredgender labels of authors in the WoS with a ground truth benchmark dataset consisting of 2,000male and female full names manually collected in Lariviere et al. (2013) [2]. Among the 1,512author names that overlap with our dataset, 1,425 have inferred gender labels that agree withthe ground truth, resulting in an accuracy of 94 . S2.6 Citation count and normalization
S2.6.1 Citations within Web of Science
We only count citations in which both the Citing paper and Cited paper appear within the WoSdatabase.
S2.6.2 Removing self-citations
It has previously been shown that male scientists are more likely to cite their own papers thanfemale scientists [63]. Therefore, in all measures of impact, we removed all self-citations basedon the overlap between authorships in the citing paper and cited papers. We also replicatedour analysis while keeping all self-citations and found no qualitative difference in our primaryconclusions.
S2.6.3 Citation normalization
Citation-based measures of impact are affected by two major problems: (1) citations followdifferent dynamics for different papers [32] and (2) the average number of citations changesover time [31]. To overcome the first problem, we focused on the total number of citations each32aper received within 10 years after its publication, c , as a measure of its scientific impact.We corrected for the second problem by normalizing the c for each paper by the average c of papers published in the same year, and multiplying by 12 (an arbitrary constant that doesnot quantitatively affect any of our analysis but restores the normalized citation count back toa realistic value). The resulting normalized c score thus provides a consistent measure ofimpact across decades. S2.7 Discipline hierarchy
We used a classification of scientific fields as defined in Wikipedia to re-organize 153 WoScategories into 75 disciplines. See S1 for the details of the mapping.Each paper is assigned one or more disciplines among the 75 Wikipedia disciplines basedon its original WoS category label(s). 3,117,710 (39.66%) authors have all papers assigned toa single discipline, while the remaining 4,742,941 (60.34%) authors are associated with at leasttwo disciplines. For each author with multiple disciplines, we assign with a single disciplinelabel as the most frequently occurring one. 3,728,442 (78.61%) of 4,742,941 authors withmultiple disciplines have the most frequent discipline occurring in more than half of his/herpapers.While some disciplines were associated with many authors (e.g. Heath Sciences has 584,628authors), many were only associated with a few authors. Therefore, we limit the majority of ouranalysis to the top 12 disciplines based on total population: Health Science, Biology, Chem-istry, Engineering,Physics, Computer Science, Psychology, Agronomy, Mathematics, En-vironmental science, Political Science, Applied physics . These 12 disciplines cover 90.3%of the population. The remaining 9.7% of the population are grouped into the 13th category Last accessed August 2018. Branches of science (Wikipedia), Outline of natural science (Wikipedia), Outlineof social science (Wikipedia), Outline of applied science (Wikipedia) thers containing 4 fields in Formal Sciences (Decision theory, Logic, Statistics, Systems the-ory), 9 fields in Natural Sciences (Botany, Earth science, Ecology, Geology, Human biology,Meteorology, Oceanography, Space Science and Astronomy, Zoology), 14 fields in AppliedSciences (Applied chemistry, Applied linguistics, Applied mathematics, Architecture, Comput-ing technology, Education, Electronics, Energy storage, Energy technology, Forensic science,Management, Microtechnology, Military science, Spatial science), 30 fields in Social Sciences(Anthropology, Business studies, Civics, Cognitive Science, Criminology, Cultural studies, De-mography, Development studies, Economics, Education, Environmental studies, Gender andsexuality studies, Geography, Gerontology, Industrial relations, Information science, Interna-tional studies, Law, Legal management, Library science, Linguistics, Management, Media stud-ies, Paralegal studies, Planning, Public administration, Social work, Sociology, Sustainabilitystudies, Sustainable development), 5 fields in Arts and Humanities (Arts, History, Languagesand literature, Philosophy, Theology), and one last field “Unknown” that we failed to map toany Wikipedia discipline. S2.8 Data summary
After all data processing steps were completed, we consider 1,523,002 WoS authors (1,110,194male, 412,808 female), contributing 18,750,502 authorships to 13,081,184 papers, across 13disciplines and 83 countries. 34
S3.1 Characterizing the scientific career Total productivity of a scientist is defined as the total number of authorships publishedby a specific author.2.
Career Length of a scientist is defined as the difference between the date of publicationfor their first and last publications. The career length is naturally found at the resolutionof days, while in coarser scenarios we report career length in years by dividing by 365and rounding to the nearest integer.3.
Annual Productivity of a scientist is calculated as the ratio of total productivity to careerlength, i.e., (the total number of papers) / (the days between the first and last publications/ 365).4.
Total impact is defined as the sum of normalized c scores for each paper published bya specific author.5. Academic Age of a scientist counts the number of years since his/her first publication.For example, a scientist whose first publication was in 1991, will have an academic ageof 5 in 1995.6.
Dropout of a scientist occurs when the scientist publishes their final paper recorded inthe data.
S3.2 Characterizing the scientific population Gender gap is calculated for each indicator as the relative difference, i.e., the differencebetween the mean female and male values divided by the value of the male indicator.35.
Dropout rate of a group of scientists (e.g., those at the same age etc.) is the proportionof scientists who dropout from the group in the next year.
S4 Methods
S4.1 Statistical significance
For each measurement of scientific performance, we report the gender gap as the differencebetween the mean value for female and male scientists. Additionally, we compute the statisti-cal significance of the gap using the unpaired two-tailed Welch’s t-test to detect whether twosamples with unequal size and unequal variance deviate from the null hypothesis that the twodistributions (female and male) have the same mean. The corresponding p-values, indicatingthe statistical significance of the test, are reported in Tables S3, S4, S5, S6,
S4.2 Career length matching
In order to assess the relationship between career length and total productivity, we conducted amatching experiment as follows. We first constructed a matched baseline population, in which,for each female author, we identified, without-replacement, a male author from the same coun-try, discipline, and with approximately the same affiliation rank. If multiple male authors werefound, we randomly selected one to match. This process consistently produced 32,782 matchedpairs. To account for the inherent randomness in this procedure, the experiment was replicated50 times, and the reported performance was averaged over all random trials. The standard devi-ation over the trials is near zero for both the productivity and impact gaps. For matches based onaffiliation, we binned the institutions by rank into 15 equal volume bins, and matched within thesame bin; no significant difference occurs for other choices of the affiliation binning. We then36reated our second experimental population, as a subset of the first, in which we matched eachfemale author to a male author from the same country, discipline, with approximately the sameaffiliation rank, and with exactly the same career length. This process consistently produced25,033 matched pairs.We also ran a similar experiment controlling for the annual productivity. Specifically, weconstructed another set of matched samples in which we identified for each female, a male au-thor from the same country and discipline, with a nearly identical annual productivity basedon grouping authors into bins by annual productivity: [0.1 papers/year, 0.2 papers/year), [0.2papers/year, 0.3 papers/year), etc. The approximation occurs because annual productivity isa real-valued number. As seen in Fig. S2
A,B , controlling for annual productivity actually in-creases gender gaps in both the total productivity and total impact, although the increase is small(1.6% and 0% respectively). The lack of a significant change in the total productivity gendergap further emphasizes the importance of career length as the dominating factor.
S4.3 Total productivity matching
Our third matching experiment controlled for the total productivity and explored the resultingchange in impact. Specifically, we constructed another set of matched samples in which weidentified for each female author, a male author from the same country, discipline, and approxi-mately the same affiliation rank. In this population, the gender gap in career impact was 50.7%in favor of male authors. We then created our second experimental population, as a subset ofthe first, in which we matched each female author to a male author from the same country,discipline, with approximately the same affiliation rank, and with exactly the same total pro-ductivity. With the addition of matching on total productivity, the impact gap actually flips infavor of female scientists who receives an average of 1.9% more citations. We report the meanimpact gap over 100 randomized trials and the standard deviation for the impact gap is nearly37ero.
S4.4 Relationship between productivity and number of collaborators
The gender gap in total productivity has an important implication for any reported gender gapsin collaboration and the subsequent structure of collaboration networks. Here, we test for thisrelationship by using a matching experiment in which we selected a male author from the samecountry, discipline, and affiliation rank. We then calculate the total number of collaborators thatco-authored at least one publication, and find a substantial gender gap (Fig. S1, left): while mencollaborate with an average of 36.6 co-authors, female authors collaborate with an average of23.5 co-authors, a gender gap of 35.8%. Next, a subset of this matched population was chosensuch that the male and female authors published exactly the same number of articles throughouttheir careers (Fig. S1, right). We see that in this final matched population, the gender gap innumber of collaborators actually switches to 4.1% in favor of female authors.
S4.5 Controlling for the dropout rate
We introduce an experiment that simulates an alternative scientific population in which wemanipulate the dropout rate of scientists. While it would be difficult to retroactively identifythe potential publications a scientist would have published if their career did not terminate in agiven year, we can more easily randomly terminate the careers of scientists earlier than reality.Here, we use this technique to eliminate the gender gap in dropout rate, and test for the effectson the productivity and impact gender gaps.As shown in the main text, Fig. 4 A , the age-dependent dropout rate for women is alwayshigher than the male dropout rate. To correct for this gender gap, we raise the dropout rate formale scientists to match that of the female scientists. Specifically, for a given year, we find38he difference between the male and female dropout rates, and identify how many more menwould need to dropout in order to equalize the rate. We then randomly select male scientistswho otherwise would not have left the population the following year (we do not consider theremainder of the career length when selecting scientists) and terminate their careers. A selectedmale scientist keeps all publications until this age, while his authorships on all later publicationsare discarded (only the authorships are removed from the data, the career termination of aselected scientist does not affect his collaborators or citations). To account for the inherentrandomness in this procedure, the experiment has been replicated 100 times and we report themean gender gaps, while the standard deviation is near zero. S5 Detailed results on Web of Science
S5.1 Distributions of measurements
Fig. S3
A-D reports the rank distributions of the four major indicators for male and female sci-entists. For each indicator type, we rank scientists from highest to lowest (denoted as the per-centile of scientists with higher performance), and report the performance against percentiles.The difference between the rank distributions shows that, on average, male scientists have morepublications and citations, and have longer careers compared to the female scientists. The gen-der inequality is most significant among top scientists (insets in all four panels). In contrast,male and female scientists look very similar when measured by annual productivity and citationrate. 39
The gender gaps in scientific measurements across all countries (Fig. 2
B,G,L,Q from the maintext) is reproduced and fully labeled in Fig. S4
A-D . Tables S3 and S4 report the statistics of maleand female scientists broken down by discipline and country. Each row reports the populationsize and mean performance indicators of male (in blue) and female (in orange) authors. Thestandard error is reported as one standard deviation. Table S5 and S6 report the statistics ofmale and female scientists grouped by the year they start and finish their scientific careers,respectively.The detailed relationship between the gender gap in career length and total productivityacross all countries is shown in Fig. S5 as a fully labeled version of Fig. 3 B from the main text. S6 Replication in other databases
S6.1 Microsoft Academic Graph
Following the procedure for the Web of Science (Section S1), we identified the genders of5,856,109 male and 2,622,594 female authors who published a total of 77,642,549 articles inthe MAG. Fig. S6
A-C shows the gender gaps in total productivity, annual productivity andcareer length in the MAG. Similar to the findings reported for the WoS in the main text, we findlarge gender gaps in total productivity and career length, while male and female scientists differonly slightly in annual productivity. Likewise, we find that female scientists consistently have ahigher dropout rate than male scientists (Fig. S7 A ) which results in a separation of the survivalcurves (Fig. S7 B ). 40 To prepare the DBLP data, we followed the procedure for the Web of Science (Section S1),with the following modification. Because affiliation information for the DBLP is largely ab-sent, we could not leverage location information to assist in the gender assignment. Instead, wecompiled a list of 107,675 unique Chinese first names from the Chinese Biographical DatabaseProject ( https://projects.iq.harvard.edu/cbdb/home ) and 564 unique Koreanfirst names from wikipedia ( https://en.wikipedia.org/wiki/List_of_Korean_given_names ) and removed any author with a matching name from the dataset. After clean-ing, we identified the genders of 301,150 male and 69,473 female authors who published a totalof 1,740,482 articles in the DBLP.
S7 Tables and Figures
Web of Science category Re-organized fieldMathematics a.c MathematicsComputer Science a.f Theoretical computer sciencePhysics, Thermodynamics, Mechanics,Acoustics, Crystallography b.a Physical science - PhysicsChemistry, Electrochemistry, Geochem-istry & Geophysics, Spectroscopy b.b Physical science - ChemistryOceanography b.e Physical science - OceanographyGeology b.f Physical science - GeologyMeteorology & Atmospheric Sciences b.g Physical science - Meteorology41stronomy & Astrophysics b.h Physical science - Space Science or As-tronomyBiochemistry & Molecular Biology, CellBiology, Plant Sciences, Microbiology,Developmental Biology, Evolutionary Bi-ology, Biophysics, Mathematical & Com-putational Biology, Genetics & Heredity,Reproductive Biology, Paleontology, Par-asitology, Virology, Mycology b.i Life science - BiologyZoology, Entomology b.j Life science - ZoologyAgriculture, Food Science & Technology,Forestry, Transplantation c.a AgronomyArchitecture, Construction & BuildingTechnology c.b. ArchitectureEducation & Educational Research c.e EducationEnergy & Fuels c.g Energy technologyMaterials Science, Engineering, PolymerScience, Automation & Control Systems,Mining & Mineral Processing, Miner-alogy, Marine & Freshwater Biology,Robotics, Metallurgy & Metallurgical En-gineering, Biotechnology & Applied Mi-crobiology, Instruments & Instrumenta-tion, Telecommunications c.i Engineering42nvironmental Sciences & Ecology, Fish-eries c.j Environmental science43eneral & Internal Medicine, Health CareSciences & Services, Integrative & Com-plementary Medicine, Legal Medicine,Radiology, Nuclear Medicine & Med-ical Imaging, Research & Experimen-tal Medicine, Tropical Medicine, CriticalCare Medicine, Dentistry, Oral Surgery &Medicine, Emergency Medicine, Toxicol-ogy, Surgery, Psychiatry, Physiology, Phar-macology & Pharmacy, Pediatrics, Pathol-ogy, Ophthalmology, Obstetrics & Gy-necology, Nutrition & Dietetics, Nurs-ing, Neurosciences & Neurology, Im-munology, Infectious Diseases, Gastroen-terology & Hepatology, Endocrinology &Metabolism, Dermatology, CardiovascularSystem & Cardiology, Biodiversity & Con-servation, Anatomy & Morphology, Urol-ogy & Nephrology, Veterinary Sciences,Oncology, Respiratory System, Hema-tology, Substance Abuse, Rheumatology,Otorhinolaryngology, Orthopedics, Anes-thesiology, Allergy, Audiology & Speech-Language Pathology, Medical Informat-ics, Medical Laboratory Technology, SportSciences c.l Health science44perations Research & Management Sci-ence c.n ManagementMathematical Methods In Social Sciences c.o Applied mathematicsNuclear Science & Technology, Optics c.r Applied physicsRemote Sensing c.s Spatial scienceAnthropology, Archaeology, Religion,Ethnic Studies d.a AnthropologyInternational Relations, Government &Law, Public, Environmental & Occupa-tional Health d.ab Political sciencePsychology, Behavioral Sciences d.ac PsychologyPublic Administration d.ad Public administrationSocial Work d.ae Social workSociology, Urban Studies, Social Issues d.af SociologyBusiness & Economics d.b Business studiesCriminology & Penology d.e CriminologyCultural Studies, Asian Studies d.f Cultural studiesDemography d.g DemographyWomen’s Studies d.l Gender and sexuality studiesGeography, Physical Geography, AreaStudies d.m GeographyGeriatrics & Gerontology d.n GerontologyInformation Science & Library Science d.q Information scienceLinguistics d.w Linguistics45ommunication, Film, Radio & Television d.y Media studiesArts & Humanities - Other Topics, LifeSciences & Biomedicine - Other Topics,Rehabilitation, Physical Sciences - OtherTopics, Water Resources, Technology -Other Topics, Imaging Science & Photo-graphic Technology, Microscopy, Trans-portation, Social Sciences - Other Topics,Biomedical Social Sciences, Family Stud-ies e.a UnfiledArt, Dance, Music, Theater f.a ArtsClassics, History f.b HistoryLiterature f.c Languages and literaturePhilosophy, History & Philosophy of Sci-ence, Medical Ethics f.d Philosophy
Table S1:
The discipline hierarchy ± ± < ± ± < ± ± < ± ± Table S2:
Academic performance.
In each row we report the average measurements ( ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table S3:
Academic performance in disciplines.
In each cell we report the average measurements ofmale (blue) and female (orange) scientists, with standard errors. A third row reports the gender gap inpercentage and p-value in parentheses. The p-value is calculated with two-tailed Welch’s t-test to detectwhether two samples with unequal size and unequal variance have identical mean. ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table S4:
Academic performance in countries.
In each cell we report the average measurements ofmale (blue) and female (orange) scientists, with standard errors. A third row reports the gender gap inpercentage and p-value in parentheses. The p-value is calculated with two-tailed Welch’s t-test to detectwhether two samples with unequal size and unequal variance have identical mean. ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± able S5: Academic performance given career start decade.
In each cell we report the average mea-surements of male (blue) and female (orange) scientists, with standard errors. A third row reports thegender gap in percentage and p-value in parentheses. The p-value is calculated with two-tailed Welch’st-test to detect whether two samples with unequal size and unequal variance have identical mean. ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table S6:
Academic performance given career end decade.
In each cell we report the average mea-surements of male (blue) and female (orange) scientists, with standard errors. A third row reports thegender gap in percentage and p-value in parentheses. The p-value is calculated with two-tailed Welch’st-test to detect whether two samples with unequal size and unequal variance have identical mean. ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table S7:
Academic performance given primary affiliation rank.
In each cell we report the averagemeasurements of male (blue) and female (orange) scientists, with standard errors. A third row reports thegender gap in percentage and p-value in parentheses. The p-value is calculated with two-tailed Welch’st-test to detect whether two samples with unequal size and unequal variance have identical mean. ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Table S8:
Academic performance given number of unique collaborators.
In each cell we report theaverage measurements of male (blue) and female (orange) scientists, with standard errors. A third rowreports the gender gap in percentage and p-value in parentheses. The p-value is calculated with two-tailedWelch’s t-test to detect whether two samples with unequal size and unequal variance have identical mean. ontrollingdiscipline,country, rank Controllingdiscipline,country, rank,pub010203040 N u m b e r o f c o ll a b o r a t o r s -35.8% +4.1% FemaleMale
Figure S1:
Matched samples explain the average number of collaborators . The gender gap in thenumber of collaborators in the matched samples when controlling for the discipline, country and affilia-tion rank, and when controlling for he discipline, country, affiliation rank, and number of publications. opulation Annual productivitycontrolled0.02.55.07.510.012.515.017.520.0 T o t a l p r o d u c t i v i t y -27.4% -28.1% a Population Annual productivitycontrolled050100150200250300350400 T o t a l i m p a c t -30.5% -30.2% b Figure S2:
Matched samples when controlling annual productivity . Gender gaps in a total productiv-ity and b total impact, before and after we control annual productivity between genders. The correctiondoes not reduce gender gaps in performance. % 20% 40% 60% 80% 100%Scientist percentile10 T o t a l p r o d u c t i v i t y A
0% 20% 40% 60% 80% 100%Scientist percentile10 T o t a l i m p a c t B
0% 20% 40% 60% 80% 100%Scientist percentile10 A nnu a l p r o d u c t i v i t y C
0% 20% 40% 60% 80% 100%Scientist percentile10 C a r ee r l e n g t h D
0% 20% 40% 60% 80% 100%Scientist percentile10 I n s t i t u t e r a n k E
0% 20% 40% 60% 80% 100%Scientist percentile10 N u m b e r o f c o ll a b o r a t o r s F Female Male
Figure S3:
Basic distributions.
Distributions of a total productivity, b total impact, c career length, d annual productivity, e primary institute rank, f number of unique collaborators. r a n c e R u ss i a B e l g i u m I t a l y U n i t e d K i n gd o m N e t h e r l a n d s I s r a e l S l o v a k i a A u s t r i a G e r m a n y F i n l a n d U k r a i n e S w i t z e r l a n d U n i t e d S t a t e s U g a n d a S w e d e n D e n m a r k H un g a r y C z e c h R e p u b li c S o u t h A f r i c a C a n a d a A u s t r a li a I r e l a n d K a z a k h s t a n J a m a i c a N e w Z e a l a n d N o r w a y L u x e m b o u r g B e l a r u s R o m a n i a B u l g a r i a G r ee c e I n d i a Sp a i n P o l a n d C r o a t i a Q a t a r A r g e n t i n a S e r b i a K u w a i t L a t v i a T u r k e y I c e l a n d E s t o n i a V e n e z u e l a C u b a S a u d i A r a b i a C h il e U n i t e d A r a b E m i r a t e s L e b a n o n K e n y a A r m e n i a G a b o n U z b e k i s t a n S l o v e n i a E g y p t M a c e d o n i a P o r t u g a l T h a il a n dS r i L a n k a P a k i s t a n C a m e r oo n M a d a g a s c a r J o r d a n T un i s i a M o r o cc o C o s t a R i c a S e n e g a l B a n g l a d e s h M e x i c o U r u g u a y I n d o n e s i a L i t hu a n i a P h ili pp i n e s I r a n N i g e r i a P e r u C o l o m b i a C y p r u s E c u a d o r T a n z a n i a A l g e r i a B o li v i a T o t a l p r o d u c t i v i t y I c e l a n dS w i t z e r l a n d N e t h e r l a n d s D e n m a r k F r a n c e U n i t e d S t a t e s U n i t e d K i n gd o m B e l g i u m S w e d e n L u x e m b o u r g F i n l a n d G e r m a n y C a n a d a A u s t r a li a I s r a e l U g a n d a N o r w a y A u s t r i a I t a l y N e w Z e a l a n d I r e l a n d G a b o n K e n y a S o u t h A f r i c a T a n z a n i a Sp a i n H un g a r y J a m a i c a E s t o n i a T h a il a n d P h ili pp i n e s P e r u C z e c h R e p u b li c G r ee c e Q a t a r U n i t e d A r a b E m i r a t e s P o r t u g a l C a m e r oo n I n d i a E c u a d o r S a u d i A r a b i a I n d o n e s i a U r u g u a y S l o v a k i a A r g e n t i n a S r i L a n k a C o s t a R i c a S l o v e n i a C h il e C o l o m b i a K u w a i t L e b a n o n P o l a n d T u r k e y B u l g a r i a M e x i c o J o r d a n R u ss i a B a n g l a d e s h V e n e z u e l a S e n e g a l E g y p t M o r o cc o C r o a t i a M a d a g a s c a r C u b a C y p r u s I r a n P a k i s t a n T un i s i a M a c e d o n i a L a t v i a L i t hu a n i a S e r b i a K a z a k h s t a n R o m a n i a A l g e r i a A r m e n i a B e l a r u s B o li v i a U k r a i n e N i g e r i a U z b e k i s t a n T o t a l i m p a c t A u s t r i a B e l g i u m G e r m a n y N e t h e r l a n d s S w i t z e r l a n d I t a l y U n i t e d K i n gd o m I r a n G r ee c e U n i t e d S t a t e s I r e l a n d T u r k e y F r a n c e D e n m a r k G a b o n F i n l a n d L u x e m b o u r g U n i t e d A r a b E m i r a t e s U g a n d a C z e c h R e p u b li c I s r a e l L e b a n o n S w e d e n A u s t r a li a C a n a d a H un g a r y Sp a i n R o m a n i a C y p r u s S l o v a k i a K e n y a K a z a k h s t a n I n d i a P o l a n d N o r w a y P o r t u g a l S o u t h A f r i c a L i t hu a n i a Q a t a r N e w Z e a l a n d I c e l a n d R u ss i a M a c e d o n i a T h a il a n dS a u d i A r a b i a J o r d a n K u w a i t L a t v i a C a m e r oo n S r i L a n k a U k r a i n e S e r b i a S l o v e n i a T un i s i a C r o a t i a E s t o n i a P a k i s t a n C u b a J a m a i c a I n d o n e s i a M o r o cc o B e l a r u s B u l g a r i a C o l o m b i a P h ili pp i n e s C h il e A r g e n t i n a V e n e z u e l a P e r u A r m e n i a M e x i c o N i g e r i a E g y p t C o s t a R i c a U z b e k i s t a n M a d a g a s c a r S e n e g a l T a n z a n i a B a n g l a d e s h A l g e r i a U r u g u a y E c u a d o r B o li v i a A nnu a l p r o d u c t i v i t y R u ss i a U k r a i n e B e l a r u s S l o v a k i a J a m a i c a B u l g a r i a K a z a k h s t a n F r a n c e H un g a r y U z b e k i s t a n A r g e n t i n a A r m e n i a B a n g l a d e s h E g y p t I s r a e l I n d i a I t a l y C o s t a R i c a P a k i s t a n N e w Z e a l a n d C r o a t i a S e n e g a l U r u g u a y S e r b i a V e n e z u e l a S o u t h A f r i c a C z e c h R e p u b li c P o l a n d C h il e M a d a g a s c a r R o m a n i a L a t v i a N i g e r i a N o r w a y A u s t r a li a S w e d e n D e n m a r k U n i t e d K i n gd o m C a n a d a E s t o n i a F i n l a n d K u w a i t M o r o cc o U n i t e d S t a t e s M e x i c o I c e l a n d C u b a Sp a i n B e l g i u m E c u a d o r K e n y a P e r u N e t h e r l a n d s S l o v e n i a T un i s i a A u s t r i a S a u d i A r a b i a A l g e r i a I r e l a n d G e r m a n y G r ee c e T a n z a n i a S w i t z e r l a n d B o li v i a P o r t u g a l I n d o n e s i a P h ili pp i n e s C o l o m b i a U n i t e d A r a b E m i r a t e s C a m e r oo n Q a t a r J o r d a n L u x e m b o u r g M a c e d o n i a T h a il a n dS r i L a n k a U g a n d a L e b a n o n L i t hu a n i a G a b o n T u r k e y C y p r u s I r a n C a r ee r l e n g t h abcd Figure S4:
The gender gap in scientific performance across countries . The average a total productiv-ity, b total impact, c annual productivity, and d career length among all individuals in each country.
30% -20% -10% 0% 10% 20% 30% 40%Gender gap in career length-60%-40%-20%0%20%40%60%80%100% G e n d e r g a p i n t o t a l p r o d u c t i v i t y AfricaAsiaEuropeNorth AmericaOceaniaSouth AmericaUnitedArabEmirates Armenia ArgentinaAustriaAustralia BangladeshBelgium BulgariaBolivia BelarusCanadaSwitzerland ChileCameroon Colombia CostaRicaCubaCyprusCzechia GermanyDenmarkAlgeria EcuadorEstonia EgyptSpainFinlandFranceGabon UnitedKingdom GreeceCroatiaHungaryIndonesiaIrelandIsrael IndiaIranIcelandItaly JamaicaJordanKenya KuwaitKazakhstanLebanon SriLankaLithuaniaLuxembourg LatviaMorocco MadagascarMacedoniaMexicoNigeriaNetherlands NorwayNewZealand Peru PhilippinesPakistanPolandPortugal QatarRomaniaSerbiaRussiaSaudiArabiaSwedenSloveniaSlovakia SenegalThailandTunisiaTurkeyTanzania UkraineUnitedStatesUruguay UzbekistanVenezuelaSouthAfrica
Figure S5:
The aligned gender gaps in scientific performance and career length across countries . Afull version of Fig. 3 B , demonstrating that the gender gap in career length is highly correlated with theproductivity gap across countries. verall Top20% Middle20% Low20%01020304050 T o t a l p r o d u c t i v i t y -26.1% -34.0% -6.3% +0.0% a MaleFemale Overall Top20% Middle20% Low20%0.00.51.01.52.02.53.03.54.0 A nnu a l p r o d u c t i v i t y +1.1% -5.0% +6.3% +21.9% b MaleFemale Overall Top20% Middle20% Low20%05101520253035 C a r ee r l e n g t h -20.4% -23.9% -15.1% -3.0% c MaleFemale
Figure S6:
The Gender Gaps in Microsoft Academic Graph . The gender gaps in a , total productivity, b , annual productivity, and c , career length. All three gaps mirror the results for the WoS reported in themain text. D r o p o u t r a t e Malefemale 0 10 20 30 40Academic age0.00.20.40.60.81.0 C u m u l a t i v e s u r v i v a l r a t e a b Figure S7:
Dropout and survival rates in Microsoft Academic Graph . a , the dropout rate of male andfemale scientists at each academic age. b , the cumulative survival rate of male and female scientists ateach academic age. ean T op20 M i dd l e20 B o tt o m -19.9 -27.0 -8.6 0.0 a Productivity M ean T op20 M i dd l e20 B o tt o m b Annual productivity M ean T op20 M i dd l e20 B o tt o m -20.9 -22.8 -27.1 -0.8 c Career length M ean t o t a l p r odu c t i v i t y d d r opou t r a t e e differenced-maledifferenced-female M ean T op20 M i dd l e20 B o tt o m f Figure S8:
The Gender Gaps in DBLP . a , The productivity puzzle as demonstrated by the differencein total productivity of an author during his/her career. b , the annual productivity is nearly identical formale and female authors. c , the difference in career length for male and female authors. d , the gendergap in productivity is growing over that last 40 years. e , female authors have higher dropout rate thanmale authors at all stages of their careers. f , a matching experiment eliminates the productivity gap. Allconclusions qualitatively mirror the results for the WoS reported in the main text., a matching experiment eliminates the productivity gap. Allconclusions qualitatively mirror the results for the WoS reported in the main text.