Featured Researches

Digital Libraries

A Glimpse of the First Eight Months of the COVID-19 Literature on Microsoft Academic Graph: Themes, Citation Contexts, and Uncertainties

As scientists worldwide search for answers to the overwhelmingly unknown behind the deadly pandemic, the literature concerning COVID-19 has been growing exponentially. Keeping abreast of the body of literature at such a rapidly advancing pace poses significant challenges not only to active researchers but also to the society as a whole. Although numerous data resources have been made openly available, the analytic and synthetic process that is essential in effectively navigating through the vast amount of information with heightened levels of uncertainty remains a significant bottleneck. We introduce a generic method that facilitates the data collection and sense-making process when dealing with a rapidly growing landscape of a research domain such as COVID-19 at multiple levels of granularity. The method integrates the analysis of structural and temporal patterns in scholarly publications with the delineation of thematic concentrations and the types of uncertainties that may offer additional insights into the complexity of the unknown. We demonstrate the application of the method in a study of the COVID-19 literature.

Read more
Digital Libraries

A Novel Approach to Predicting Exceptional Growth in Research

The prediction of exceptional or surprising growth in research is an issue with deep roots and few practical solutions. In this study we develop and validate a novel approach to forecasting growth in highly specific research communities. Each research community is represented by a cluster of papers. Multiple indicators were tested, and a composite indicator was created that predicts which research communities will experience exceptional growth over the next three years. The accuracy of this predictor was tested using hundreds of thousands of community-level forecasts and was found to exceed the performance benchmarks established in Intelligence Advanced Research Projects Activity's (IARPA) Foresight Using Scientific Exposition (FUSE) program in six of nine major fields in science. Furthermore, ten of eleven disciplines within the Computing Technologies field met the benchmarks. Specific detailed forecast examples are given and evaluated, and a critical evaluation of the forecasting approach is also provided.

Read more
Digital Libraries

A Quantitative History of A.I. Research in the United States and China

Motivated by recent interest in the status and consequences of competition between the U.S. and China in A.I. research, we analyze 60 years of abstract data scraped from Scopus to explore and quantify trends in publications on A.I. topics from institutions affiliated with each country. We find the total volume of publications produced in both countries grows with a remarkable regularity over tens of years. While China initially experienced faster growth in publication volume than the U.S., growth slowed in China when it reached parity with the U.S. and the growth rates of both countries are now similar. We also see both countries undergo a seismic shift in topic choice around 1990, and connect this to an explosion of interest in neural network methods. Finally, we see evidence that between 2000 and 2010, China's topic choice tended to lag that of the U.S. but that in recent decades the topic portfolios have come into closer alignment.

Read more
Digital Libraries

A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility

Data makes science possible. Sharing data improves visibility, and makes the research process transparent. This increases trust in the work, and allows for independent reproduction of results. However, a large proportion of data from published research is often only available to the original authors. Despite the obvious benefits of sharing data, and scientists' advocating for the importance of sharing data, most advice on sharing data discusses its broader benefits, rather than the practical considerations of sharing. This paper provides practical, actionable advice on how to actually share data alongside research. The key message is sharing data falls on a continuum, and entering it should come with minimal barriers.

Read more
Digital Libraries

A Review of Geospatial Content in IEEE Visualization Publications

Geospatial analysis is crucial for addressing many of the world's most pressing challenges. Given this, there is immense value in improving and expanding the visualization techniques used to communicate geospatial data. In this work, we explore this important intersection -- between geospatial analytics and visualization -- by examining a set of recent IEEE VIS Conference papers (a selection from 2017-2019) to assess the inclusion of geospatial data and geospatial analyses within these papers. After removing the papers with no geospatial data, we organize the remaining literature into geospatial data domain categories and provide insight into how these categories relate to VIS Conference paper types. We also contextualize our results by investigating the use of geospatial terms in IEEE Visualization publications over the last 30 years. Our work provides an understanding of the quantity and role of geospatial subject matter in recent IEEE VIS publications and supplies a foundation for future meta-analytical work around geospatial analytics and geovisualization that may shed light on opportunities for innovation.

Read more
Digital Libraries

A SIR epidemic model for citation dynamics

The study of citations in the scientific literature crosses the boundaries between the traditional branches of science and stands on its own as a most profitable research field dubbed the `science of science'. Although the understanding of the citation histories of individual papers involves many intangible factors, the basic assumption that citations beget citations can explain most features of the empirical citation patterns. Here we use the SIR epidemic model as a mechanistic model for the citation dynamics of well-cited papers published in selected journals of the American Physical Society. The estimated epidemiological parameters offer insight on unknown quantities as the size of the community that could cite a paper and its ultimate impact on that community. We find a good, though imperfect, agreement between the rank of the journals obtained using the epidemiological parameters and the impact factor rank.

Read more
Digital Libraries

A Scalable Hybrid Research Paper Recommender System for Microsoft Academic

We present the design and methodology for the large scale hybrid paper recommender system used by Microsoft Academic. The system provides recommendations for approximately 160 million English research papers and patents. Our approach handles incomplete citation information while also alleviating the cold-start problem that often affects other recommender systems. We use the Microsoft Academic Graph (MAG), titles, and available abstracts of research papers to build a recommendation list for all documents, thereby combining co-citation and content based approaches. Tuning system parameters also allows for blending and prioritization of each approach which, in turn, allows us to balance paper novelty versus authority in recommendation results. We evaluate the generated recommendations via a user study of 40 participants, with over 2400 recommendation pairs graded and discuss the quality of the results using P@10 and nDCG scores. We see that there is a strong correlation between participant scores and the similarity rankings produced by our system but that additional focus needs to be put towards improving recommender precision, particularly for content based recommendations. The results of the user survey and associated analysis scripts are made available via GitHub and the recommendations produced by our system are available as part of the MAG on Azure to facilitate further research and light up novel research paper recommendation applications.

Read more
Digital Libraries

A Semantically Enriched Dataset based on Biomedical NER for the COVID19 Open Research Dataset Challenge

Research into COVID-19 is a big challenge and highly relevant at the moment. New tools are required to assist medical experts in their research with relevant and valuable information. The COVID-19 Open Research Dataset Challenge (CORD-19) is a "call to action" for computer scientists to develop these innovative tools. Many of these applications are empowered by entity information, i. e. knowing which entities are used within a sentence. For this paper, we have developed a pipeline upon the latest Named Entity Recognition tools for Chemicals, Diseases, Genes and Species. We apply our pipeline to the COVID-19 research challenge and share the resulting entity mentions with the community.

Read more
Digital Libraries

A System Dynamics Analysis of National R&D Performance Measurement System in Korea

Peer review is one of useful and powerful performance measurement process. In Korea, it needs to increase quality of R&D performance, but bibliometric evaluation and lack of peers have opposite effect. We used system dynamics to describe Korean R&D performance measurement system and ways to increase performance quality. To meet a desired R&D performance quality, increasing fairness and quality of evaluation is needed. Size of peer pool decreased because of the specialization of R&D projects and the Sangpi process both, and it is critical to acquire both fairness and quality. Also, shortening evaluation period affect to R&D performance quality, by causing workloads increase, limiting long-term and innovative R&D projects, and decreasing evaluation quality. Previous evaluation policies do a role like micro-controlling the R&D's activities, but increasing the size of peer pool and changing evaluation period would make a change to quality and fairness of evaluation.

Read more
Digital Libraries

A Tale of Two Referees

Success in academia hinges on publishing in top tier journals. This requires innovative results. And this requires clear and convincing presentation of said results. Presentation can make the difference of one tier in journal level. A lot of useful advice on this topic is available online from well-respected outlets; see, for example, El-Omar (2014); Gould (2014); Neiles et al. (2015); Notz and Kafadar (2011); or Sachdeva (2020). This text provides a different angle.

Read more

Ready to get started?

Join us today