Featured Researches

Digital Libraries

Gender Gap in Natural Language Processing Research: Disparities in Authorship and Citations

Disparities in authorship and citations across gender can have substantial adverse consequences not just on the disadvantaged genders, but also on the field of study as a whole. Measuring gender gaps is a crucial step towards addressing them. In this work, we examine female first author percentages and the citations to their papers in Natural Language Processing (1965 to 2019). We determine aggregate-level statistics using an existing manually curated author--gender list as well as first names strongly associated with a gender. We find that only about 29% of first authors are female and only about 25% of last authors are female. Notably, this percentage has not improved since the mid 2000s. We also show that, on average, female first authors are cited less than male first authors, even when controlling for experience and area of research. Finally, we discuss the ethical considerations involved in automatic demographic analysis.

Read more
Digital Libraries

Gender Inequality in Research Productivity During the COVID-19 Pandemic

We study the disproportionate impact of the lockdown as a result of the COVID-19 outbreak on female and male academics' research productivity in social science. The lockdown has caused substantial disruptions to academic activities, requiring people to work from home. How this disruption affects productivity and the related gender equity is an important operations and societal question. We collect data from the largest open-access preprint repository for social science on 41,858 research preprints in 18 disciplines produced by 76,832 authors across 25 countries over a span of two years. We use a difference-in-differences approach leveraging the exogenous pandemic shock. Our results indicate that, in the 10 weeks after the lockdown in the United States, although the total research productivity increased by 35%, female academics' productivity dropped by 13.9% relative to that of male academics. We also show that several disciplines drive such gender inequality. Finally, we find that this intensified productivity gap is more pronounced for academics in top-ranked universities, and the effect exists in six other countries. Our work points out the fairness issue in productivity caused by the lockdown, a finding that universities will find helpful when evaluating faculty productivity. It also helps organizations realize the potential unintended consequences that can arise from telecommuting.

Read more
Digital Libraries

Gender disparity in the authorship of biomedical research publications during the COVID-19 pandemic

Preliminary evidence suggests that women, including female researchers, are disproportionately affected by the COVID-19 pandemic in terms of unequal distribution of childcare, elderly care and other kinds of domestic and emotional labor. Sudden lockdowns and abrupt shifts in daily routines have disproportionate consequences on their productivity, which is reflected by a sudden drop in research output in biomedical research, consequently affecting the number of female authors of scientific publications. We investigate the proportion of male and female researchers who published scientific papers during the COVID-19 pandemic, using bibliometric data from biomedical preprint servers and selected Springer-Nature journals. Our findings document a decrease in the number of publications by female authors in biomedical field during the global pandemic. This effect is particularly pronounced for papers related to COVID-19, indicating that women are producing fewer publications related to COVID-19 research. This sudden increase in the gender gap is persistent across the ten countries with the highest number of researchers. These results should be used to inform the scientific community of the worrying trend in COVID-19 research and the disproportionate effect that the pandemic has on female academics.

Read more
Digital Libraries

Gender trends in computer science authorship

A large-scale, up-to-date analysis of Computer Science literature (11.8M papers through 2019) reveals that, if trends from the last 50 years continue, parity between the number of male and female authors will not be reached in this century. In contrast, parity is projected to be reached within two to three decades or may have already been reached in other fields of study like Medicine or Sociology. Our analysis of collaboration trends in Computer Science reveals shifts in the size of the collaboration gap between authors of different perceived genders. The gap is persistent but shrinking, corresponding to a slow increase in the rate of cross-gender collaborations over time. Together, these trends describe a persistent gender gap in the authorship of Computer Science literature that may not close without systematic intervention.

Read more
Digital Libraries

Gender-Based Homophily in Research: A Large-Scale Study of Man-Woman Collaboration

We examined the male-female collaboration practices of all internationally visible Polish university professors (N = 25,463) based on their Scopus-indexed publications from 2009-2018 (158,743 journal articles). We merged a national registry of 99,935 scientists (with full administrative and biographical data) with the Scopus publication database, using probabilistic and deterministic record linkage. Our unique biographical, administrative, publication, and citation database (The Polish Science Observatory) included all professors with at least a doctoral degree employed in 85 research-involved universities. We determined what we term an individual publication portfolio for every professor, and we examined the respective impacts of biological age, academic position, academic discipline, average journal prestige, and type of institution on the same-sex collaboration ratio. The gender homophily principle (publishing predominantly with scientists of the same sex) was found to apply to male scientists - but not to females. The majority of male scientists collaborate solely with males; most female scientists, in contrast, do not collaborate with females at all. Across all age groups studied, all-female collaboration is marginal, while all-male collaboration is pervasive. Gender homophily in research-intensive institutions proved stronger for males than for females. Finally, we used a multi-dimensional fractional logit regression model to estimate the impact of gender and other individual-level and institutional-level independent variables on gender homophily in research collaboration.

Read more
Digital Libraries

Gendered impact of COVID-19 pandemic on research production: a cross-country analysis

The massive shock of the COVID-19 pandemic is already showing its negative effects on economies around the world, unprecedented in recent history. COVID-19 infections and containment measures have caused a general slowdown in research and new knowledge production. Because of the link between R&D spending and economic growth, it is to be expected then that a slowdown in research activities will slow in turn the global recovery from the pandemic. Many recent studies also claim an uneven impact on scientific production across gender. In this paper, we investigate the phenomenon across countries, analysing preprint depositions. Differently from other works, that compare the number of preprint depositions before and after the pandemic outbreak, we analyse the depositions trends across geographical areas, and contrast after-pandemic depositions with expected ones. Differently from common belief and initial evidence, in few countries female scientists increased their scientific output while males plunged.

Read more
Digital Libraries

Generate FAIR Literature Surveys with Scholarly Knowledge Graphs

Reviewing scientific literature is a cumbersome, time consuming but crucial activity in research. Leveraging a scholarly knowledge graph, we present a methodology and a system for comparing scholarly literature, in particular research contributions describing the addressed problem, utilized materials, employed methods and yielded results. The system can be used by researchers to quickly get familiar with existing work in a specific research domain (e.g., a concrete research question or hypothesis). Additionally, it can be used to publish literature surveys following the FAIR Data Principles. The methodology to create a research contribution comparison consists of multiple tasks, specifically: (a) finding similar contributions, (b) aligning contribution descriptions, (c) visualizing and finally (d) publishing the comparison. The methodology is implemented within the Open Research Knowledge Graph (ORKG), a scholarly infrastructure that enables researchers to collaboratively describe, find and compare research contributions. We evaluate the implementation using data extracted from published review articles. The evaluation also addresses the FAIRness of comparisons published with the ORKG.

Read more
Digital Libraries

Generating automatically labeled data for author name disambiguation: An iterative clustering method

To train algorithms for supervised author name disambiguation, many studies have relied on hand-labeled truth data that are very laborious to generate. This paper shows that labeled training data can be automatically generated using information features such as email address, coauthor names, and cited references that are available from publication records. For this purpose, high-precision rules for matching name instances on each feature are decided using an external-authority database. Then, selected name instances in target ambiguous data go through the process of pairwise matching based on the rules. Next, they are merged into clusters by a generic entity resolution algorithm. The clustering procedure is repeated over other features until further merging is impossible. Tested on 26,566 instances out of the population of 228K author name instances, this iterative clustering produced accurately labeled data with pairwise F1 = 0.99. The labeled data represented the population data in terms of name ethnicity and co-disambiguating name group size distributions. In addition, trained on the labeled data, machine learning algorithms disambiguated 24K names in test data with performance of pairwise F1 = 0.90 ~ 0.92. Several challenges are discussed for applying this method to resolving author name ambiguity in large-scale scholarly data.

Read more
Digital Libraries

Geographical Distribution of Biomedical Research in the USA and China

We analyze nearly 20 million geocoded PubMed articles with author affiliations. Using K-means clustering for the lower 48 US states and mainland China, we find that the average published paper is within a relatively short distance of a few centroids. These centroids have shifted very little over the past 30 years, and the distribution of distances to these centroids has not changed much either. The overall country centroids have gradually shifted south (about 0.2° for the USA and 1.7° for China), while the longitude has not moved significantly. These findings indicate that there are few large scientific hubs in the USA and China and the typical investigator is within geographical reach of one such hub. This sets the stage to study centralization of biomedical research at national and regional levels across the globe, and over time.

Read more
Digital Libraries

Getting Insights from a Large Corpus of Scientific Papers on Specialisted Comprehensive Topics -- the Case of COVID-19

COVID-19 is one of the most important topic these days, specifically on search engines and news. While fake news are easily shared, scientific papers are reliable sources where information can be extracted. With about 24,000 scientific publications on COVID-19 and related research on PUBMED, automatic computer-assisted analysis is required. In this paper, we develop two methodologies to get insights on specific sub-topics of interest and latest research sub-topics. They rely on natural language processing and graph-based visualizations. We run these methodologies on two cases: the virus origin and the uses of existing drugs.

Read more

Ready to get started?

Join us today