Featured Researches

Digital Libraries

2020 NDSA Agenda for Digital Stewardship

The NDSA Agenda is a comprehensive overview of the state of global digital preservation. It casts its eye over current research trends, grants, projects, and various efforts spanning the preservation ecosystem. The agenda identifies successes and ongoing challenges in addition to providing some tangible recommendations to both researcher and practitioner alike. As both an overview and comprehensive dive into digital preservation issues, the audience ranges from high level to hands on experts. Funders can use this report as a signpost for the overall state of the profession.

Read more
Digital Libraries

A 25 Year Retrospective on D-Lib Magazine

In July, 1995 the first issue of D-Lib Magazine was published as an on-line, HTML-only, open access magazine, serving as the focal point for the then emerging digital library research community. In 2017 it ceased publication, in part due to the maturity of the community it served as well as the increasing availability of and competition from eprints, institutional repositories, conferences, social media, and online journals -- the very ecosystem that D-Lib Magazine nurtured and enabled. As long-time members of the digital library community and authors with the most contributions to D-Lib Magazine, we reflect on the history of the digital library community and D-Lib Magazine, taking its very first issue as guidance. It contained three articles, which described: the Dublin Core Metadata Element Set, a project status report from the NSF/DARPA/NASA-funded Digital Library Initiative (DLI), and a summary of the Kahn-Wilensky Framework (KWF) which gave us, among other things, Digital Object Identifiers (DOIs). These technologies, as well as many more described in D-Lib Magazine through its 23 years, have had a profound and continuing impact on the digital library and general web communities.

Read more
Digital Libraries

A Bayesian Two-part Hurdle Quantile Regression Model for Citation Analysis

Quantile regression is a technique to analyse the effects of a set of independent variables on the entire distribution of a continuous response variable. Quantile regression presents a complete picture of the effects on the location, scale, and shape of the dependent variable at all points, not just at the mean. This research focuses on two challenges for the analysis of citation counts by quantile regression: discontinuity and substantial mass points at lower counts, such as zero, one, two, and three. A Bayesian two-part hurdle quantile regression model was proposed by King and Song (2019) as a suitable candidate for modeling count data with a substantial mass point at zero. Their model allows the zeros and non-zeros to be modeled independently but simultaneously. It uses quantile regression for modeling the nonzero data and logistic regression for modeling the probability of zeros versus nonzeros. Nevertheless, the current paper shows that substantial mass points also at one, two, and three for citation counts will nearly certainly affect the estimation of parameters in the quantile regression part of the model in a similar manner to the mass point at zero. We update the King and Song model by shifting the hurdle point from zero to three, past the main mass points. The new model delivers more accurate quantile regression for moderately to highly cited articles, and enables estimates of the extent to which factors influence the chances that an article will be low cited. To illustrate the advantage and potential of this method, it is applied separately to both simulated citation counts and also seven Scopus fields with collaboration, title length, and journal internationality as independent variables.

Read more
Digital Libraries

A Bibliometric Analysis of Publications in Computer Networking Research

This study uses the article content and metadata of four important computer networking periodicals-IEEE Communications Surveys and Tutorials (COMST), IEEE/ACM Transactions on Networking (TON), ACM Special Interest Group on Data Communications (SIGCOMM), and IEEE International Conference on Computer Communications (INFOCOM)-obtained using ACM, IEEE Xplore, Scopus and CrossRef, for an 18-year period (2000-2017) to address important bibliometrics questions. All of the venues are prestigious, yet they publish quite different research. The first two of these periodicals (COMST and TON) are highly reputed journals of the fields while SIGCOMM and INFOCOM are considered top conferences of the field. SIGCOMM and INFOCOM publish new original research. TON has a similar genre and publishes new original research as well as the extended versions of different research published in the conferences such as SIGCOMM and INFOCOM, while COMST publishes surveys and reviews (which not only summarize previous works but highlight future research opportunities). In this study, we aim to track the co-evolution of trends in the COMST and TON journals and compare them to the publication trends in INFOCOM and SIGCOMM. Our analyses of the computer networking literature include: (a) metadata analysis; (b) content-based analysis; and (c) citation analysis. In addition, we identify the significant trends and the most influential authors, institutes and countries, based on the publication count as well as article citations. Through this study, we are proposing a methodology and framework for performing a comprehensive bibliometric analysis on computer networking research. To the best of our knowledge, no such study has been undertaken in computer networking until now.

Read more
Digital Libraries

A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2

The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature to refer to SARS-CoV-2 and COVID-19. We use a rule-based approach to iteratively generate new term variants, then locate these variants in a large text corpus. We compare our dictionary to an extensive collection of terminological resources, demonstrating that our resource provides a substantial number of additional terms. We use our dictionary to analyze the usage of SARS-CoV-2 and COVID-19 terms over time and show that the number of unique terms continues to grow rapidly. Our dictionary is freely available at this https URL.

Read more
Digital Libraries

A Computational Approach to Historical Ontologies

This paper presents a use case exploring the application of the Archival Resource Key (ARK) persistent identifier for promoting and maintaining ontologies. In particular, we look at improving computation with an in-house ontology server in the context of temporally aligned vocabularies. This effort demonstrates the utility of ARKs in preparing historical ontologies for computational archival science.

Read more
Digital Libraries

A Corpus of Adpositional Supersenses for Mandarin Chinese

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

Read more
Digital Libraries

A Decade of In-text Citation Analysis based on Natural Language Processing and Machine Learning Techniques: An overview of empirical studies

Citation analysis is one of the most frequently used methods in research evaluation. We are seeing significant growth in citation analysis through bibliometric metadata, primarily due to the availability of citation databases such as the Web of Science, Scopus, Google Scholar, Microsoft Academic, and Dimensions. Due to better access to full-text publication corpora in recent years, information scientists have gone far beyond traditional bibliometrics by tapping into advancements in full-text data processing techniques to measure the impact of scientific publications in contextual terms. This has led to technical developments in citation context and content analysis, citation classifications, citation sentiment analysis, citation summarisation, and citation-based recommendation. This article aims to narratively review the studies on these developments. Its primary focus is on publications that have used natural language processing and machine learning techniques to analyse citations.

Read more
Digital Libraries

A Digital Library for Research Data and Related Information in the Social Sciences

In the social sciences, researchers search for information on the Web, but this is most often distributed on different websites, search portals, digital libraries, data archives, and databases. In this work, we present an integrated search system for social science information that allows finding information around research data in a single digital library. Users can search for research data sets, publications, survey variables, questions from questionnaires, survey instruments, and tools. Information items are linked to each other so that users can see, for example, which publications contain data citations to research data. The integration and linking of different kinds of information increase their visibility so that it is easier for researchers to find information for re-use. In a log-based usage study, we found that users search across different information types, that search sessions contain a high rate of positive signals and that link information is often explored.

Read more
Digital Libraries

A Disciplinary View of Changes in Publications' Reference Lists After Peer Review

This paper provides insight into the changes manuscripts undergo during peer review, the potential reasons for these changes, and the differences between scientific fields. A growing body of literature is assessing the effect of peer review on manuscripts, however much of this research currently focuses on the social and medical sciences. We matched more than 6,000 preprint-publication pairs across multiple fields and quantified the changes in their reference lists. We also quantified the change in references per full-text section for 565 pairs from PLOS journals. In addition, we conducted manual checks of a randomly chosen sample of 98 pairs to validate our results, and undertook a qualitative analysis based on the context of the reference to investigate the potential reasons for reference changes. We found 10 disciplines, mostly in the natural sciences with high levels of removed references. Methods sections undergo the most relative change in the natural sciences, while in the medical and health sciences, the results and discussion sections underwent the most changes. Our qualitative analysis identified issues with our results due to incomplete preprint reference lists. In addition, we deduced 10 themes for changing references during peer review. This analysis suggested that manuscripts in the natural and medical sciences undergo more extensive reframing of the literature used to situate and interpret the results of studies than the social and agricultural sciences, which are further embedded in the existing literature through peer review. Peer review in engineering tends to focus on methodological details. Our results are useful to the body of literature examining the effectiveness of peer review in fulfilling its intended purposes.

Read more

Ready to get started?

Join us today