Featured Researches

Digital Libraries

How to interpret algorithmically constructed topical structures of research specialties? A case study comparing an internal and an external mapping of the topical structure of invasion biology

In our paper we seek to address a shortcoming in the scientometric literature, namely that, given the proliferation of algorithmic approaches to topic detection from bibliometric data, there is a relative lack of studies that validate and create a deeper understanding of the topical structures these algorithmic approaches generate. To take a closer look at this issue, we investigate the results of the new Leiden algorithm when applied to the direct citation network of a field-level data set. We compare this internal perspective which is constructed from the citation links within a data set of 30,000 publications in invasion biology, with an external perspective onto the topic structures in this research specialty, which is based on a global science map in form of the CWTS microfield classification underlying the Leiden Ranking. We present an initial comparative analysis of the results and lay out our next steps that will involve engaging with domain experts to examine how the algorithmically identified topics relate to understandings of topics and topical perspectives that operate within this research specialty.

Read more
Digital Libraries

How to use Software Heritage for archiving and referencing your source code: guidelines and walkthrough

Software source code is an essential research output, and many research communities strongly encourage making the source code of the artefact available by archiving it in publicly-accessible long-term archives.Software Heritage is a non profit, long term universal archive specifically designed for software source code, and able to store not only a software artifact, but also its full development history. It provides the ideal place to preserve research software artifacts, and offers powerful mechanisms to enhance research articles with precise references to relevant fragments of your source code.Using Software Heritage for your research software artifacts is straightforward and involves three simple steps. This document details each of these three steps, providing guidelines for making the most out of Software Heritage for your research.

Read more
Digital Libraries

How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators

Recently, the integrated impact indicator (I3) indicator was introduced where citations are weighted in accordance with the percentile rank class of each publication in a set of publications. I3 can also be used as a field-normalized indicator. Field-normalization is common practice in bibliometrics, especially when institutions and countries are compared. Publication and citation practices are so different among fields that citation impact is normalized for cross-field comparisons. In this study, we test the ability of the indicator to discriminate between quality levels of papers as defined by Faculty members at F1000Prime. F1000Prime is a post-publication peer review system for assessing papers in the biomedical area. Thus, we test the convergent validity of I3 (in this study, we test I3/N - the size-independent variant of I3 where I3 is divided by the number of papers) using assessments by peers as baseline and compare its validity with several other (field-normalized) indicators: the mean-normalized citation score (MNCS), relative-citation ratio (RCR), citation score normalized by cited references (CSNCR), characteristic scores and scales (CSS), source-normalized citation score (SNCS), citation percentile, and proportion of papers which belong to the x% most frequently cited papers (PPtop x%). The results show that the PPtop 1% indicator discriminates best among different quality levels. I3 performs similar as (slightly better than) most of the other field-normalized indicators. Thus, the results point out that the indicator could be a valuable alternative to other indicators in bibliometrics.

Read more
Digital Libraries

Identification, Tracking and Impact: Understanding the trade secret of catchphrases

Understanding the topical evolution in industrial innovation is a challenging problem. With the advancement in the digital repositories in the form of patent documents, it is becoming increasingly more feasible to understand the innovation secrets -- "catchphrases" of organizations. However, searching and understanding this enormous textual information is a natural bottleneck. In this paper, we propose an unsupervised method for the extraction of catchphrases from the abstracts of patents granted by the U.S. Patent and Trademark Office over the years. Our proposed system achieves substantial improvement, both in terms of precision and recall, against state-of-the-art techniques. As a second objective, we conduct an extensive empirical study to understand the temporal evolution of the catchphrases across various organizations. We also show how the overall innovation evolution in the form of introduction of newer catchphrases in an organization's patents correlates with the future citations received by the patents filed by that organization. Our code and data sets will be placed in the public domain soon.

Read more
Digital Libraries

Identifying Historical Travelogues in Large Text Corpora Using Machine Learning

Travelogues represent an important and intensively studied source for scholars in the humanities, as they provide insights into people, cultures, and places of the past. However, existing studies rarely utilize more than a dozen primary sources, since the human capacities of working with a large number of historical sources are naturally limited. In this paper, we define the notion of travelogue and report upon an interdisciplinary method that, using machine learning as well as domain knowledge, can effectively identify German travelogues in the digitized inventory of the Austrian National Library with F1 scores between 0.94 and 1.00. We applied our method on a corpus of 161,522 German volumes and identified 345 travelogues that could not be identified using traditional search methods, resulting in the most extensive collection of early modern German travelogues ever created. To our knowledge, this is the first time such a method was implemented for the bibliographic indexing of a text corpus on this scale, improving and extending the traditional methods in the humanities. Overall, we consider our technique to be an important first step in a broader effort of developing a novel mixed-method approach for the large-scale serial analysis of travelogues.

Read more
Digital Libraries

Identifying and Mapping the Global Research Output on Coronavirus Disease: A Scientometric Study

The paper explores and analyses the trend of world literature on "Coronavirus Disease" in terms of the output of research publications as indexed in the Science Citation Index Expanded (SCI-E) of Web of Science during the period from 2011 to 2020. The study found that 6071 research records have been published on Coronavirus Disease till March 20, 2020. The various scientometric components of the research records published in the study period were studied. The study reveals the various aspects of Coronavirus Disease literature such as year wise distribution, relative growth rate, doubling time of literature, geographical wise, organization wise, language wise, form wise , most prolific authors, and source wise. The highest number of articles was published in the year 2019, while lowest numbers of research article were reported in the year 2020. Further, the relative growth rate is gradually increases and on the other hand doubling time decreases. Most of the research publications are published in English language and most of the publications published in the form of research articles. USA is the highest contributor to the field of Coronavirus Disease literature.

Read more
Digital Libraries

Identifying the Development and Application of Artificial Intelligence in Scientific Text

We describe a strategy for identifying the universe of research publications relevant to the application and development of artificial intelligence. The approach leverages the arXiv corpus of scientific preprints, in which authors choose subject tags for their papers from a set defined by editors. We compose a functional definition of AI relevance by learning these subjects from paper metadata, and then inferring the arXiv-subject labels of papers in larger corpora: Clarivate Web of Science, Digital Science Dimensions, and Microsoft Academic Graph. This yields predictive classification F 1 scores between .75 and .86 for Natural Language Processing (cs.CL), Computer Vision (cs.CV), and Robotics (cs.RO). For a single model that learns these and four other AI-relevant subjects (cs.AI, cs.LG, stat.ML, and cs.MA), we see precision of .83 and recall of .85. We evaluate the out-of-domain performance of our classifiers against other sources of topic information and predictions from alternative methods. We find that a supervised solution can generalize to identify publications that belong to the high-level fields of study represented on arXiv. This offers a method for identifying AI-relevant publications that updates at the pace of research output, without reliance on subject-matter experts for query development or labeling.

Read more
Digital Libraries

Impact of HTTP Cookie Violations in Web Archives

Certain HTTP Cookies on certain sites can be a source of content bias in archival crawls. Accommodating Cookies at crawl time, but not utilizing them at replay time may cause cookie violations, resulting in defaced composite mementos that never existed on the live web. To address these issues, we propose that crawlers store Cookies with short expiration time and archival replay systems account for values in the Vary header along with URIs.

Read more
Digital Libraries

Impact of Web 2.0 Technologies on Academic Libraries: A Survey on Affiliated Colleges of Solapur University

The paper aims to present the results of a survey of academic libraries about the adoption and perceived impact of Web 2.0 technologies. A total of 26 college libraries affiliated with Solapur University participated among the members. It was found that each library was using some form of technology, such as RSS, blogs, social networking sites, wikis, and instant messaging. Analyzing the entire college web technology usages, it is observed from the results that most of the web technologies are not used by the mainstream of the users due to lack of awareness, training, etc. Accurate and appropriate training should be conducted by the colleges' Libraries according to the necessities of the users. Systematic training will inevitably help the user for the maximum utilization of e-resources of the library. The leading web technologies such as internet surfing, emails, search engines, wikis, photo sharing, etc. are used by a great number of users on a daily and weekly basis of frequency. On the other hand majority of the web technologies are never used by a great number of users.

Read more
Digital Libraries

Impact of h-index on authors ranking: A comparative analysis of Scopus and WoS

In academia, the research performance of the faculty members is either evaluated by the number of publications or the number of citations. Most of the time h-index is widely used during the hiring process or the faculty performance evaluation. The calculation of the h-index is shown in various databases; however, there is no recent or systematic evidence about the differences between them. In this study, we compare the difference in the h-index compiled with Scopus and Web of Science (WoS) with the aim of analyzing the ranking of the authors within a university. We analyze the publication records of 350 authors from Monash University (Australia). We also investigate the discipline wise variation in the authors ranking. 31% of the author's profiles show no variation in the two datasets whereas 55% of the author's profiles show a higher count in Scopus and 9% in WoS. The maximum difference in h-index count among Scopus and WoS is 3. On average 12.4% of publications per author are unique in Scopus and 4.1% in WoS. 53.5% of publications are common in both Scopus and WoS. Despite larger unique publications in Scopus, there is no difference shown in the Spearman correlation coefficient between WoS and Scopus citation counts and h-index.

Read more

Ready to get started?

Join us today