Featured Researches

Digital Libraries

Building Journal Impact Factor Quartile into the Assessment of Academic Performance: A Case Study

This study aims to provide information about the Q Concept defined as the division of journal impact factors into quartiles based on given field categories so that the disadvantages resulting from the direct use of journal impact factors can be eliminated. While the number of "Original articles published in the Web of Science (WoS) database-indexed journals like SCI, SSCI and A&HCI" is an important indicator for research assessment in Turkey, neither the journal impact factors nor the Q Concept of these papers have been taken into account. Present study analyzes the scientific production of the Amasya University researchers in journals indexed in WoS database in the period 2014-2018 using the Q concept. The share of publications by Q category journals as well as the average citations received by the works from Amasya University were compared to the average situation in Turkey and other different countries in the world. Results indicate that the articles published by Amasya University researchers were mostly published in low impact factor journals (Q4 journals) (36.49%), in fact, only a small share of papers were published in high impact journals (14.32% in Q1 journals). The share of papers published in low impact journals by researchers from Amasya University is higher than the Turkish average and much higher than the scientific leading countries. The average citations received by papers published in Q1 journals was around six times higher than papers published in Q4 journals (8.92 vs. 1.56), thus papers published in Q1 journals received 30.02% citations despite only 14.32% of the papers was published in these journals. The share of papers published which were never cited in WoS was 27.48%, increasing from 9.68% in Q1 to almost half (48.10%) in Q4. The study concludes with some suggestions on how and where the Q Concept can be used.

Read more
Digital Libraries

Building a PubMed knowledge graph

PubMed is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguated, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID, and identifying fine-grained affiliation data from MapAffil. Through the integration of the credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving a F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities. The PKG is freely available on Figshare (this https URL, simplified version that exclude PubMed raw data) and TACC website (this http URL, full version).

Read more
Digital Libraries

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (this http URL). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref. These citations are described in RDF by means of the newly extended version of the OpenCitations Data Model (OCDM). We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data via different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation. Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.

Read more
Digital Libraries

COVID-19 Literature Topic-Based Search via Hierarchical NMF

A dataset of COVID-19-related scientific literature is compiled, combining the articles from several online libraries and selecting those with open access and full text available. Then, hierarchical nonnegative matrix factorization is used to organize literature related to the novel coronavirus into a tree structure that allows researchers to search for relevant literature based on detected topics. We discover eight major latent topics and 52 granular subtopics in the body of literature, related to vaccines, genetic structure and modeling of the disease and patient studies, as well as related diseases and virology. In order that our tool may help current researchers, an interactive website is created that organizes available literature using this hierarchical structure.

Read more
Digital Libraries

COVID-19 publications: Database coverage, citations, readers, tweets, news, Facebook walls, Reddit posts

The COVID-19 pandemic requires a fast response from researchers to help address biological, medical and public health issues to minimize its impact. In this rapidly evolving context, scholars, professionals and the public may need to quickly identify important new studies. In response, this paper assesses the coverage of scholarly databases and impact indicators during 21 March to 18 April 2020. The results confirm a rapid increase in the volume of research, which particularly accessible through Google Scholar and Dimensions, and less through Scopus, the Web of Science, PubMed. A few COVID-19 papers from the 21,395 in Dimensions were already highly cited, with substantial news and social media attention. For this topic, in contrast to previous studies, there seems to be a high degree of convergence between articles shared in the social web and citation counts, at least in the short term. In particular, articles that are extensively tweeted on the day first indexed are likely to be highly read and relatively highly cited three weeks later. Researchers needing wide scope literature searches (rather than health focused PubMed or medRxiv searches) should start with Google Scholar or Dimensions and can use tweet and Mendeley reader counts as indicators of likely importance.

Read more
Digital Libraries

COVIDScholar: An automated COVID-19 research aggregation and analysis platform

The ongoing COVID-19 pandemic has had far-reaching effects throughout society, and science is no exception. The scale, speed, and breadth of the scientific community's COVID-19 response has lead to the emergence of new research literature on a remarkable scale -- as of October 2020, over 81,000 COVID-19 related scientific papers have been released, at a rate of over 250 per day. This has created a challenge to traditional methods of engagement with the research literature; the volume of new research is far beyond the ability of any human to read, and the urgency of response has lead to an increasingly prominent role for pre-print servers and a diffusion of relevant research across sources. These factors have created a need for new tools to change the way scientific literature is disseminated. COVIDScholar is a knowledge portal designed with the unique needs of the COVID-19 research community in mind, utilizing NLP to aid researchers in synthesizing the information spread across thousands of emergent research articles, patents, and clinical trials into actionable insights and new knowledge. The search interface for this corpus, this https URL, now serves over 2000 unique users weekly. We present also an analysis of trends in COVID-19 research over the course of 2020.

Read more
Digital Libraries

Can Google Scholar and Mendeley help to assess the scholarly impacts of dissertations?

Dissertations can be the single most important scholarly outputs of junior researchers. Whilst sets of journal articles are often evaluated with the help of citation counts from the Web of Science or Scopus, these do not index dissertations and so their impact is hard to assess. In response, this article introduces a new multistage method to extract Google Scholar citation counts for large collections of dissertations from repositories indexed by Google. The method was used to extract Google Scholar citation counts for 77,884 American doctoral dissertations from 2013-2017 via ProQuest, with a precision of over 95%. Some ProQuest dissertations that were dual indexed with other repositories could not be retrieved with ProQuest-specific searches but could be found with Google Scholar searches of the other repositories. The Google Scholar citation counts were then compared with Mendeley reader counts, a known source of scholarly-like impact data. A fifth of the dissertations had at least one citation recorded in Google Scholar and slightly fewer had at least one Mendeley reader. Based on numerical comparisons, the Mendeley reader counts seem to be more useful for impact assessment purposes for dissertations that are less than two years old, whilst Google Scholar citations are more useful for older dissertations, especially in social sciences, arts and humanities. Google Scholar citation counts may reflect a more scholarly type of impact than that of Mendeley reader counts because dissertations attract a substantial minority of their citations from other dissertations. In summary, the new method now makes it possible for research funders, institutions and others to systematically evaluate the impact of dissertations, although additional Google Scholar queries for other online repositories are needed to ensure comprehensive coverage.

Read more
Digital Libraries

Can pandemics transform scientific novelty? Evidence from COVID-19

Scientific novelty is important during the pandemic due to its critical role in generating new vaccines. Parachuting collaboration and international collaboration are two crucial channels to expand teams' search activities for a broader scope of resources required to address the global challenge. Our analysis of 58,728 coronavirus papers suggests that scientific novelty measured by the BioBERT model that is pre-trained on 29 million PubMed articles, and parachuting collaboration dramatically increased after the outbreak of COVID-19, while international collaboration witnessed a sudden decrease. During the COVID-19, papers with more parachuting collaboration and internationally collaborative papers are predicted to be more novel. The findings suggest the necessity of reaching out for distant resources, and the importance of maintaining a collaborative scientific community beyond established networks and nationalism during a pandemic.

Read more
Digital Libraries

Caution, DOI! Bibliographic detective story in the era of digitalization

An example of inconsistencies in information provided by popular bibliographic services is described and the reasons for these inconsistencies are discussed.

Read more
Digital Libraries

Characterisation of the χ -index and the rec -index

Axiomatic characterisation of a bibliometric index provides insight into the properties that the index satisfies and facilitates the comparison of different indices. A geometric generalisation of the h -index, called the χ -index, has recently been proposed to address some of the problems with the h -index, in particular, the fact that it is not scale invariant, i.e., multiplying the number of citations of each publication by a positive constant may change the relative ranking of two researchers. While the square of the h -index is the area of the largest square under the citation curve of a researcher, the square of the χ -index, which we call the rec -index (or {\em rectangle}-index), is the area of the largest rectangle under the citation curve. Our main contribution here is to provide a characterisation of the rec -index via three properties: {\em monotonicity}, {\em uniform citation} and {\em uniform equivalence}. Monotonicity is a natural property that we would expect any bibliometric index to satisfy, while the other two properties constrain the value of the rec -index to be the area of the largest rectangle under the citation curve. The rec -index also allows us to distinguish between {\em influential} researchers who have relatively few, but highly-cited, publications and {\em prolific} researchers who have many, but less-cited, publications.

Read more

Ready to get started?

Join us today