Female scholars need to achieve more for equal public recognition
FFemale scholars need to achieve more for equalpublic recognition
Menno H. Schellekens a , Floris Holstege b , and Taha Yasseri a,c,1 a Oxford Internet Institute, University of Oxford; b Leiden University College; c Alan Turing Institute for Data Science and Artificial IntelligenceThis manuscript was compiled on April 17, 2019
Different kinds of "gender gap" have been reported in different walksof the scientific life, almost always favouring male scientists over fe-males. In this work, for the first time, we present a large-scale empir-ical analysis to ask whether female scientists with the same level ofscientific accomplishment are as likely as males to be recognised.We particularly focus on Wikipedia, the open online encyclopediathat its open nature allows us to have a proxy of community recog-nition. We calculate the probability of appearing on Wikipedia as ascientist for both male and female scholars in three different fields.We find that women in Physics, Economics and Philosophy are con-siderable less likely than men to be recognised on Wikipedia acrossall levels of achievement.
Gender gap | Wikipedia | Scientometrics
Female scholars face many more barriers in their profes-sional path compared to their male colleagues (1, 2). Theyhave been found to be discriminated against in the workplace(3), in grant applications (4, 5), and as students (6–8), andthey experience sexism on daily basis (9). Apart from system-atic biases and barriers against female scholars, that mightbe a reflection of a wider societal issue, the community ofacademics themselves might be suffering from prejudice andnegative perceptions on female scholars. The question thatwe ask in this work is if female scholars are less likely to berecognised by their communities for their accomplishments.Wikipedia, the largest crowd-based knowledge repositoryhas been studied from different angles. There are argumentsabout its accuracy (10), coverage (11), and neutrality (12), inboth directions. Previous work has reported that the level ofattention given to scholars measured by the level of activityand the traffic to the Wikipedia articles that are dedicatedto them is not in proportion to their scholarly achievementsevaluated by scientometrics measures (13). Wikipedia traffichas been used as a proxy for collective attention (14) and col-lective memory (15). However, in this work we use Wikipediaas a proxy for community recognition of academics and sim-ply measure the difference of chances of being featured onWikipedia for males and females who have similar scientificachievements. We build on previous work that generally re-ported that Wikipedia suffers from the lack of entries aboutfemale scientists (16, 17) and lower quality and informationreach of articles about women in general (18–20). However, inaddition to the existing case reports, here, we take a system-atic approach by analysing the data on 15,049 scholars fromthree different disciplines. See Table 1 for details.The fundamental question that we are addressing here isif there are fewer Wikipedia entries about female scientists(18) because few women enter the sciences, or because theyare less likely to contribute groundbreaking research, or dothey face additional hurdles in attaining public recognition fortheir work for the same level of achievement? To answer this
Table 1. Overview of the scholars in the dataset
Field Gender Count % on WikipediaPhysics female 642 8.26male 5448 14.12Economics female 1586 8.32male 5477 17.89Philosophy female 467 15.42male 1429 28.06Total female 2695 9.54male 12354 17.4Grand Total 15049 15.99 question, we analyse a dataset that is collected from GoogleScholar and check it against Wikipedia entries. This datasetallows us to compare whether gender influences the chance ofhaving a dedicated Wikipedia entry, controlling for scientificachievement. We find strong evidence of discrimination inpublic recognition of scientific achievement gauged by inclusionin Wikipedia at any level of success.While barriers for women in science at different stages oftheir careers have been reported and discussed intensively,there is little empirical work on the recognition of scientificachievement by the general public. This paper contributes thefirst large scale empirical analysis of gender bias in recognitionof scientific accomplishments.
Results
We employ logistic regression to test the relationship betweengender and Wikipedia recognition, controlling for h-index. Fordetails of data collection and gender detection see Materialsand Methods. We start with a simple model that has thefollowing structure p ( W ) = B g + B h, [1]where W denotes the existence of a Wikipedia page, g denotesthe gender of the scientist, h their h-index, and B and B arethe model constants. This model assumes that being male orfemale changes the chance of recognition irrespective of aca-demic achievement. In the nest step, considering that scientificaccomplishments by females might be viewed differently, weadd a new term to the model with an interaction betweenh-index and gender, p ( W ) = B g + B h + B gh. [2]The results from the fit of the model to the data presentedin Table 2 point towards structural discrimination in the To whom correspondence should be addressed. E-mail: [email protected] a r X i v : . [ c s . D L ] A p r ig. 1. Probability of male (purple) and female (green) scholars getting a Wikipedia page at different levels of scientific standing for Physics, Economics, and Philosophy, fromleft to right. Error bars are too small to be visible.
Table 2
Dependent variable: Wikipedia page exists (1) (2) (3)Male 0.480 ∗∗∗ ∗∗∗ ∗∗∗ (0.071) (0.073) (0.313)Logged h-index 0.581 ∗∗∗ ∗∗∗ ∗∗∗ (0.032) (0.037) (0.098)Field: Economics 0.807 ∗∗∗ ∗∗∗ (0.055) (0.055)Field: Philosophy 2.053 ∗∗∗ ∗∗∗ (0.081) (0.081)Male:Logged h-index − ∗∗∗ (0.102)Constant − ∗∗∗ − ∗∗∗ − ∗∗∗ (0.112) (0.147) (0.310)Observations 15,049 15,049 15,049Log Likelihood − − − Note: ∗ p < ∗∗ p < ∗∗∗ p < recognition of scientific achievement. Regardless of field ofstudy, being male significantly increases the chance of beingrecognised and featured on Wikipedia.The negative interaction effect between gender and h-index suggests that Wikipedia’s bias towards men is strongestamongst scientists with relatively low indexes. Gender plays asmaller role in the recognition of academics with exceptionalacademic standing.Logistic regression produces log-odds as coefficients; usingthose, we have plotted the probability that an economist ofboth genders is recognised with a Wikipedia page at differenth-index levels (Figure 1). A female economist with an aver-age h-index has a probability of 0.11 of being recognised byWikipedia, while an average male economist has a probabil-ity of 0.18. A male economist has to achieve an h-index of11 for a similar probability of public recognition as a femaleeconomist with an h-index of 19. Similar patterns are observedfor Physics and Philosophy. Women are 19%, 37% and 50%less likely to receive recognition than male peers when both have an average h-index in Physics, Economics and Philosophyrespectively. We calculate these percentages by dividing thepredicted probability of a women with an average h-index ofhaving a Wikipedia page by the predicted probability for aman with the same h-index to have a Wikipedia page in thesame field of research.To check the robustness, we provide a number of variationsof this model to see if the effects hold. To control for cross-discipline differences, we run separate models per field toinvestigate the differences between fields (see Table S1). Tofurther check the robustness of the results, we use alternativemeasures for scientific achievement such as raw number ofcitations and h5 index to test if that changes the outcomes(see Table S2). This finding holds when controlling for field(see Table 2), when run separately for every field (see Table S1)and when alternative measures are used (see Table S2). It isstatistically significant at p < .
01 in all analyses.
Discussions
We report on evidence of a bias against recognising the scien-tific accomplishments of women on Wikipedia. Men are morelikely to be awarded a page in the world’s most influentialencyclopedia than women with similar scientometric records.This finding is replicated in Physics (natural sciences), Eco-nomics (social sciences), and Philosophy (humanities). Themagnitude of male advantage is remarkably similar across thedisparate fields.It is beyond the scope of this paper to establish the causalmechanism behind the gender gap in recognition. Is researchfrom women taken less seriously? Are males more easily givenaccess to public fora to discuss their findings? And one shouldnote that the biases reported in this work are on top of thereported biases on research funding allocations (21), publishingpractices and hiring exercises (22, 23).We must also note that a portion of the reported bias mightbe due to the known gender gap among Wikipedia editors. Itis notable that there are few female editors amongst the ranksof Wikipedia editors (17, 24). The Wikimedia Foundationmight want to consider policy changes to give women equalrecognition for equal work as an starting point to battle thissocietal malfunction in a wider scope.
Materials and Methods
The analysis is conducted with three measures: scientificaccomplishment (retrieved with Google Scholar), gender (re-trieved from genderize.io ) and recognition from Wikipedia Schellekens et al. retrieved from the Wikipedia API). We will cover each mea-sure in the following sections. The summary statistics areavailable in Table 1 and in Table S3.
A. Scientific Accomplishment.
While scientists receive manyforms of recognition, the most common measure is the citation.Citation metrics have increased in importance in the scientificrealm. The h-index is widely preferred over raw citation counts,because it accounts for both the number of publications andcitations (25). Many universities set minimum h-index valuesfor new hires, and some universities base promotions on h-index thresholds (26).The source of our dataset is Google Scholar. We querieda particular field and collected names in the order GoogleScholar presents them, which is ordered by citation count.For every scientist, we retrieved citation counts, their nameand their institution. Google Scholar has been found to havethe largest coverage as compared to other databases, with upto 33% more authors than its direct competitors and morediverse publications, such as conference papers and books(27–29). Thus, we are satisfied that Google Scholar gives anaccurate and comprehensive overview of active scholars andtheir citations.Collecting data from Google Scholar is laborious, so wesampled scholars from three fields in different parts of theacademic world: Physics (natural sciences), Economics (socialsciences) and Philosophy (humanities) (30). As reported inTable 1, the number of scholars in our sample, the genderbalance and proportion of scholars with Wikipedia pages differsper field.Our sample of scholars is non-random, because scientistswere ordered by h-index. However, we collected the top 10,000available scholars from a field. The ‘bottom’ of our samplecontains scholars with h-indices as low as 1, so we cover a widerange of achievement. If we missed scholars, they must havevery few citations and publications. This does not compromiseour analysis, because these scholars are not likely to receiverecognition from Wikipedia and thus not relevant. All threecitation measures are not normally distributed (See FiguresS1-S3) and transformed for the regression analysis.
B. Gender.
Google Scholar does not list the gender of a scien-tist. Therefore, we must detect the gender of a scholar basedon their first name. This technique is widely used and accu-rate (2, 31). We use genderize.io API, which makes use of adatabase of 216286 names from 79 countries and 89 languagesto make prediction. Conveniently, genderize.io reports thenumber of times a given name appears in their database andthe proportion of the two sexes. We applied strict filters: onlypredictions with a confidence greater than 90% based on aminimum sample size of 10 were accepted. This measure cutour sample down to 15,049 from 23.000 scholars collected viaGoogle Scholar.Genderize makes the assumption that persons who are awoman identify as female. However, both sexes can identify asmany genders. The analysis would be superior if we could usethe identified genders of every scientist, but this possibility isnot available. Given that it is common for women to identify asfemale and men as male, we use the Genderize categorizationas the closest available proxy.
C. Recognition by Wikipedia.
We queried the Wikipedia APIwith the names of scholars to check for Wikipedia pages un-der their name. When the Wikipedia page is listed under aslightly different name or a known alias, the Wikipedia APIautomatically refers us to the correct page. We checked asample of 30 codings manually and found no miscodings.
ACKNOWLEDGMENTS.
We thank Jop Flameling for discussionon the research design and data collection. TY was partially sup-ported by the Alan Turing Institute under the EPSRC grant no.EP/N510129/1.
1. Raymond J (2013) Most of us are biased.
Nature
Nature
PLoS ONE
Journal of Informetrics
Nature
Proceedings of the National Academy ofSciences
Proceedings of the National Academy of Sciences
Frontiers in Digital Humanities
Journal ofcomputer-mediated communication
American Economic Review
EPJ data science
Royal Society Open Science
Science advances
Nature .18. Reagle J, Rhue L (2011) Gender bias in wikipedia and britannica.
International Journal ofCommunication
Proceedings of the 13th International Symposium on Open Collaboration . (ACM),p. 19.20. Graells-Garrido E, Lalmas M, Menczer F (2015) First women, second sex: Gender bias inwikipedia in
Proceedings of the 26th ACM Conference on Hypertext & Social Media . (ACM),pp. 165–174.21. Head MG, Fitchett JR, Cooke MK, Wurie FB, Atun R (2013) Differences in research fundingfor women scientists: a systematic comparison of uk investments in global infectious diseaseresearch during 1997–2010.
BMJ open
Journal of Women’s Health
Science advances
PloS one
Trends inEcology & Evolution
Nature
Journal of the american society forinformation science and technology
Journal of the American Society for InformationScience and Technology
Scientometrics
Proceedingsof the 25th International Conference Companion on World Wide Web . (International WorldWide Web Conferences Steering Committee), pp. 53–54.
Schellekens et al. upplementary Information Table S1. Regression Results for (1) Physics, (2) Economics and (3) Philosophy
Dependent variable: wiki_bool(1) (2) (3)gendermale 0.493 ∗∗∗ ∗∗∗ ∗∗∗ (0.150) (0.101) (0.148)log(h.index) 0.746 ∗∗∗ ∗∗∗ ∗∗∗ (0.064) (0.057) (0.076)Constant − ∗∗∗ − ∗∗∗ − ∗∗∗ (0.262) (0.189) (0.215)Observations 6,090 7,063 1,896Log Likelihood − − − Note: ∗ p < ∗∗ p < ∗∗∗ p < Table S2. Robustness Checks
Dependent variable: wiki_bool(1) (2)gendermale 0.546 ∗∗∗ ∗∗∗ (0.071) (0.071)H5 index 0.548 ∗∗∗ (0.034)Citation Count 0.325 ∗∗∗ (0.016)Constant − ∗∗∗ − ∗∗∗ (0.110) (0.133)Observations 15,049 15,049Log Likelihood − − Note: ∗ p < ∗∗ p < ∗∗∗ p < Table S3. Summary Statistics for Continuous Variables
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Maxh.index 16,098 23.224 20.606 1 11 29 258h5.index 16,098 16.943 14.965 0 8 21 191n.citations 16,098 5,090.149 15,786.540 1 537 3,949.8 911,692 Schellekens et al. f e m a l e m a l e h.index c oun t Fig. S1.
Distributions of h-index by gender and field. 1 = Physics, 2 = Economics, 3 = Philosophy.Schellekens et al. f e m a l e m a l e h5.index c oun t Fig. S2.
Distributions of h5-index by gender and field. 1 = Physics, 2 = Economics, 3 = Philosophy. f e m a l e m a l e n.citations c oun t Fig. S3.