Featured Researches

Digital Libraries

Nine Best Practices for Research Software Registries and Repositories: A Concise Guide

Scientific software registries and repositories serve various roles in their respective disciplines. These resources improve software discoverability and research transparency, provide information for software citations, and foster preservation of computational methods that might otherwise be lost over time, thereby supporting research reproducibility and replicability. However, developing these resources takes effort, and few guidelines are available to help prospective creators of registries and repositories. To address this need, we present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories. These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Citation Implementation Working Group during the years 2019-2020. We believe that putting in place specific policies such as those presented here will help scientific software registries and repositories better serve their users and their disciplines.

Read more
Digital Libraries

Nine Million Book Items and Eleven Million Citations: A Study of Book-Based Scholarly Communication Using OpenCitations

Books have been widely used to share information and contribute to human knowledge. However, the quantitative use of books as a method of scholarly communication is relatively unexamined compared to journal articles and conference papers. This study uses the COCI dataset (a comprehensive open citation dataset provided by OpenCitations) to explore books' roles in scholarly communication. The COCI data we analyzed includes 445,826,118 citations from 46,534,705 bibliographic entities. By analyzing such a large amount of data, we provide a thorough, multifaceted understanding of books. Among the investigated factors are 1) temporal changes to book citations; 2) book citation distributions; 3) years to citation peak; 4) citation half-life; and 5) characteristics of the most-cited books. Results show that books have received less than 4% of total citations, and have been cited mainly by journal articles. Moreover, 97.96% of books have been cited fewer than ten times. Books take longer than other bibliographic materials to reach peak citation levels, yet are cited for the same duration as journal articles. Most-cited books tend to cover general (yet essential) topics, theories, and technological concepts in mathematics and statistics.

Read more
Digital Libraries

Non-English language publications in Citation Indexes -- quantity and quality

We analyzed publications data in WoS and Scopus to compare publications in native languages vs publications in English and find any distinctive patterns. We analyzed their distribution by research areas, languages, type of access and citation patterns. The following trends were found: share of English publications increases over time; native-language publications are read and cited less than English-language outside the origin country; open access impact on views and citation is higher for native languages; journal ranking correlates with the share of English publications for multi-language journals. We conclude also that the role of non-English publications in research evaluation in non-English speaking countries is underestimated when research in social science and humanities is assessed only by publications in Web of Science and Scopus.

Read more
Digital Libraries

ORCID-linked labeled data for evaluating author name disambiguation at scale

How can we evaluate the performance of a disambiguation method implemented on big bibliographic data? This study suggests that the open researcher profile system, ORCID, can be used as an authority source to label name instances at scale. This study demonstrates the potential by evaluating the disambiguation performances of Author-ity2009 (which algorithmically disambiguates author names in MEDLINE) using 3 million name instances that are automatically labeled through linkage to 5 million ORCID researcher profiles. Results show that although ORCID-linked labeled data do not effectively represent the population of name instances in Author-ity2009, they do effectively capture the 'high precision over high recall' performances of Author-ity2009. In addition, ORCID-linked labeled data can provide nuanced details about the Author-ity2009's performance when name instances are evaluated within and across ethnicity categories. As ORCID continues to be expanded to include more researchers, labeled data via ORCID-linkage can be improved in representing the population of a whole disambiguated data and updated on a regular basis. This can benefit author name disambiguation researchers and practitioners who need large-scale labeled data but lack resources for manual labeling or access to other authority sources for linkage-based labeling. The ORCID-linked labeled data for Author-tiy2009 are publicly available for validation and reuse.

Read more
Digital Libraries

OSDG -- Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs)

Sustainable Development Goals (SDGs) bring together the diverse development community and provide a clear set of development targets for 2030. Given a large number of actors and initiatives related to these goals, there is a need to have a way to accurately and reliably assign text to different input: scientific research, research projects, technological output or documents to specific SDGs. In this paper we present Open Source SDG (OSDG) project and tool which does so by integrating existing research and previous classification into a robust and coherent framework. This integration is based on linking the features from the variety of previous approaches, like ontology items, keywords or features from machine-learning models, to the topics in Microsoft Academic Graph.

Read more
Digital Libraries

On how Cognitive Computing will plan your next Systematic Review

Systematic literature reviews (SLRs) are at the heart of evidence-based research, setting the foundation for future research and practice. However, producing good quality timely contributions is a challenging and highly cognitive endeavor, which has lately motivated the exploration of automation and support in the SLR process. In this paper we address an often overlooked phase in this process, that of planning literature reviews, and explore under the lenses of cognitive process augmentation how to overcome its most salient challenges. In doing so, we report on the insights from 24 SLR authors on planning practices, its challenges as well as feedback on support strategies inspired by recent advances in cognitive computing. We frame our findings under the cognitive augmentation framework, and report on a prototype implementation and evaluation focusing on further informing the technical feasibility.

Read more
Digital Libraries

On the Performance of Hybrid Search Strategies for Systematic Literature Reviews in Software Engineering

Context: When conducting a Systematic Literature Review (SLR), researchers usually face the challenge of designing a search strategy that appropriately balances result quality and review effort. Using digital library (or database) searches or snowballing alone may not be enough to achieve high-quality results. On the other hand, using both digital library searches and snowballing together may increase the overall review effort. Objective: The goal of this research is to propose and evaluate hybrid search strategies that selectively combine database searches with snowballing. Method: We propose four hybrid search strategies combining database searches in digital libraries with iterative, parallel, or sequential backward and forward snowballing. We simulated the strategies over three existing SLRs in SE that adopted both database searches and snowballing. We compared the outcome of digital library searches, snowballing, and hybrid strategies using precision, recall, and F-measure to investigate the performance of each strategy. Results: Our results show that, for the analyzed SLRs, combining database searches from the Scopus digital library with parallel or sequential snowballing achieved the most appropriate balance of precision and recall. Conclusion: We put forward that, depending on the goals of the SLR and the available resources, using a hybrid search strategy involving a representative digital library and parallel or sequential snowballing tends to represent an appropriate alternative to be used when searching for evidence in SLRs.

Read more
Digital Libraries

On the Persistence of Persistent Identifiers of the Scholarly Web

Scholarly resources, just like any other resources on the web, are subject to reference rot as they frequently disappear or significantly change over time. Digital Object Identifiers (DOIs) are commonplace to persistently identify scholarly resources and have become the de facto standard for citing them. We investigate the notion of persistence of DOIs by analyzing their resolution on the web. We derive confidence in the persistence of these identifiers in part from the assumption that dereferencing a DOI will consistently return the same response, regardless of which HTTP request method we use or from which network environment we send the requests. Our experiments show, however, that persistence, according to our interpretation, is not warranted. We find that scholarly content providers respond differently to varying request methods and network environments and even change their response to requests against the same DOI. In this paper we present the results of our quantitative analysis that is aimed at informing the scholarly communication community about this disconcerting lack of consistency.

Read more
Digital Libraries

On the Programmatic Generation of Reproducible Documents

Reproducible document standards, like R Markdown, facilitate the programmatic creation of documents whose content is itself programmatically generated. While these documents are generally not complete in the sense that they will not include prose content, generated by an author to provide context, a narrative, etc., programmatic generation can provide substantial efficiencies for structuring and constructing documents. This paper explores the programmatic generation of reproducible by distinguishing components than can be created by computational means from those requiring human-generated prose, providing guidelines for the generation of these documents, and identifying a use case in clinical trial reporting. These concepts and use case are illustrated through the listdown package for the R programming environment, which is is currently available on the Comprehensive R Archive Network (CRAN).

Read more
Digital Libraries

On the challenges ahead of spatial scientometrics focusing on the city level

Since the mid-1970s, it has become highly acknowledged to measure and evaluate changes in international research collaborations and the scientific performance of institutions and countries through the prism of bibliometric and scientometric data. Spatial bibliometrics and scientometrics (henceforward spatial scientometrics) have traditionally focused on examining both country and regional levels; however, in recent years, numerous spatial analyses on the city level have been carried out. While city-level scientometric analyses have gained popularity among policymakers and statistical/economic research organizations, researchers in the field of bibliometrics are divided regarding whether it is possible to observe the spatial unit 'city' through bibliometric and scientometric tools. After systematically scrutinizing relevant studies in the field, three major problems have been identified: 1) there is no standardized method of how cities should be defined and how metropolitan areas should be delineated, 2) there is no standardized method of how bibliometric and scientometric data on the city level should be collected and processed and 3) it is not clearly defined how cities can profit from the results of bibliometric and scientometric analysis focusing on them. This paper investigates major challenges ahead of spatial scientometrics, focusing on the city level and presents some possible solutions.

Read more

Ready to get started?

Join us today