Featured Researches

Digital Libraries

Large publishing consortia produce higher citation impact research but co-author contributions are hard to evaluate

This paper introduces a simple agglomerative clustering method to identify large publishing consortia with at least 20 authors and 80% shared authorship between articles. Based on Scopus journal articles 1996-2018, under these criteria, nearly all (88%) of the large consortia published research with citation impact above the world average, with the exceptions being mainly the newer consortia for which average citation counts are unreliable. On average, consortium research had almost double (1.95) the world average citation impact on the log scale used (Mean Normalised Log Citation Score). At least partial alphabetical author ordering was the norm in most consortia. The 250 largest consortia were for nuclear physics and astronomy around expensive equipment, and for predominantly health-related issues in genomics, medicine, public health, microbiology and neuropsychology. For the health-related issues, except for the first and last few authors, authorship seem to primary indicate contributions to the shared project infrastructure necessary to gather the raw data. It is impossible for research evaluators to identify the contributions of individual authors in the huge alphabetical consortia of physics and astronomy, and problematic for the middle and end authors of health-related consortia. For small scale evaluations, authorship contribution statements could be used, when available.

Read more
Digital Libraries

Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic

We present a large-scale comparison of five multidisciplinary bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. The comparison considers scientific documents from the period 2008-2017 covered by these data sources. Scopus is compared in a pairwise manner with each of the other data sources. We first analyze differences between the data sources in the coverage of documents, focusing for instance on differences over time, differences per document type, and differences per discipline. We then study differences in the completeness and accuracy of citation links. Based on our analysis, we discuss strengths and weaknesses of the different data sources. We emphasize the importance of combining a comprehensive coverage of the scientific literature with a flexible set of filters for making selections of the literature.

Read more
Digital Libraries

Lest We Forget: A Dataset of Coronavirus-Related News Headlines in Swiss Media

We release our COVID-19 news dataset, containing more than 10,000 links to news articles related to the Coronavirus pandemic published in the Swiss media since early January 2020. This collection can prove beneficial in mining and analysis of the reaction of the Swiss media and the COVID-19 pandemic and extracting insightful information for further research. We hope this dataset helps researchers and the public deliver results that will help analyse the pandemic and potentially lead to a better understanding of the events.

Read more
Digital Libraries

Like-for-like bibliometric substitutes for peer review: advantages and limits of indicators calculated from the ep index

The use of bibliometric indicators would simplify research assessments. The 2014 Research Excellence Framework (REF) is a peer review assessment of UK universities, whose results can be taken as benchmarks for bibliometric indicators. In this study we use the REF results to investigate whether the ep index and a top percentile of most cited papers could substitute for peer review. The probability that a random university's paper reaches a certain top percentile in the global distribution of papers is a power of the ep index, which can be calculated from the citation-based distribution of university's papers in global top percentiles. Making use of the ep index in each university and research area, we calculated the ratios between the percentage of 4-star-rated outputs in REF and the percentages of papers in global top percentiles. Then, we fixed the assessment percentile so that the mean ratio between these two indicators across universities is 1.0. This method was applied to four units of assessment in REF: Chemistry, Economics & Econometrics joined to Business & Management Studies, and Physics. Some relevant deviations from the 1.0 ratio could be explained by the evaluation procedure in REF or by the characteristics of the research field; other deviations need specific studies by experts in the research area. The present results indicate that in many research areas the substitution of a top percentile indicator for peer review is possible. However, this substitution cannot be made straightforwardly; more research is needed to establish the conditions of the bibliometric assessment.

Read more
Digital Libraries

Linking Publications to Funding at Project Level: A curated dataset of publications reported by FP7 projects

Datasets explicitly linking publications to funding at project level are the basis of evaluative bibliometric analysis of funding programmes. Analysis of the impact of the EU funding programmes has been often frustrated by the lack of data on publications to which the funding has contributed. Here we present a dataset 2 of scholarly publications reported by the projects funded by the European Union under the 7th Framework Programme. The dataset was created by first consolidating data from different reporting channels and validating the records by systematically matching them to external authoritative sources and assigning them external identifiers. The initial dataset had 299.000 records linked to one or more projects out of which 68% had a digital object identify (doi). Through the data quality assurance, we validate 92% of the initial records (277000) and assign a doi to 89% of them of them (267000). The resulting dataset has 240000 unique dois. It is, to our knowledge, the first comprehensive and curated dataset of scholarly outputs of the Framework Programme. The dataset could only be created thanks to significant improvements and investments made in the reporting systems used by EU funded projects. The dataset is available on zenodo: this https URL

Read more
Digital Libraries

Looking Through Glass: Knowledge Discovery from Materials Science Literature using Natural Language Processing

Most of the knowledge in materials science literature is in the form of unstructured data such as text and images. Here, we present a framework employing natural language processing, which automates text and image comprehension and precision knowledge extraction from inorganic glasses' literature. The abstracts are automatically categorized using latent Dirichlet allocation (LDA), providing a way to classify and search semantically linked publications. Similarly, a comprehensive summary of images and plots are presented using the 'Caption Cluster Plot' (CCP), which provides direct access to the images buried in the papers. Finally, we combine the LDA and CCP with the chemical elements occurring in the manuscript to present an 'Elemental map', a topical and image-wise distribution of chemical elements in the literature. Overall, the framework presented here can be a generic and powerful tool to extract and disseminate material-specific information on composition-structure-processing-property dataspaces, allowing insights into fundamental problems relevant to the materials science community and accelerated materials discovery.

Read more
Digital Libraries

Lost or found? Discovering data needed for research

Finding data is a necessary precursor to being able to reuse data, although relatively little large-scale empirical evidence exists about how researchers discover, make sense of and (re)use data for research. This study presents evidence from the largest known survey investigating how researchers discover and use data that they do not create themselves. We examine the data needs and discovery strategies of respondents, propose a typology for data reuse and probe the role of social interactions and literature search in data discovery. We consider how data communities can be conceptualized according to data uses and propose practical applications of our findings for designers of data discovery systems and repositories. Specifically, we consider how to design for a diversity of practices, how communities of use can serve as an entry point for design and the role of metadata in supporting both sensemaking and social interactions.

Read more
Digital Libraries

Lotka's Law and Pattern of Author Productivity in the Field of Brain Concussion Research: A Scientometric Analysis

The present study deals a scientometric analysis of 8486 bibliometric publications retrieved from the Web of Science database during the period 2008 to 2017. Data is collected and analyzed using Bibexcel software. The study focuses on various aspect of the quantitative research such as growth of papers (year wise), Collaborative Index (CI), Degree of Collaboration (DC), Co-authorship Index (CAI), Collaborative Co-efficient (CC), Modified Collaborative Co-Efficient (MCC), Lotka's Exponent value, Kolmogorov-Smirnov test (K-S Test).

Read more
Digital Libraries

MITAO: a tool for enabling scholars in the Humanities to use Topic Modelling in their studies

Automatic text analysis methods, such as Topic Modelling, are gaining much attention in Humanities. However, scholars need to have extensive coding skills to use such methods appropriately. The need of having this technical expertise prevents the broad adoption of these methods in Humanities research. In this paper, to help scholars in the Humanities to use Topic Modelling having no or limited coding skills, we introduce MITAO, a web-based tool that allow the definition of a visual workflow which embeds various automatic text analysis operations and allows one to store and share both the workflow and the results of its execution to other researchers, which enables the reproducibility of the analysis. We present an example of an application of use of Topic Modelling with MITAO using a collection of English abstracts of the articles published in "Umanistica Digitale". The results returned by MITAO are shown with dynamic web-based visualizations, which allowed us to have preliminary insights about the evolution of the topics treated over the time in the articles published in "Umanistica Digitale". All the results along with the defined workflows are published and accessible for further studies.

Read more
Digital Libraries

MODMA dataset: a Multi-modal Open Dataset for Mental-disorder Analysis

According to the World Health Organization, the number of mental disorder patients, especially depression patients, has grown rapidly and become a leading contributor to the global burden of disease. However, the present common practice of depression diagnosis is based on interviews and clinical scales carried out by doctors, which is not only labor-consuming but also time-consuming. One important reason is due to the lack of physiological indicators for mental disorders. With the rising of tools such as data mining and artificial intelligence, using physiological data to explore new possible physiological indicators of mental disorder and creating new applications for mental disorder diagnosis has become a new research hot topic. However, good quality physiological data for mental disorder patients are hard to acquire. We present a multi-modal open dataset for mental-disorder analysis. The dataset includes EEG and audio data from clinically depressed patients and matching normal controls. All our patients were carefully diagnosed and selected by professional psychiatrists in hospitals. The EEG dataset includes not only data collected using traditional 128-electrodes mounted elastic cap, but also a novel wearable 3-electrode EEG collector for pervasive applications. The 128-electrodes EEG signals of 53 subjects were recorded as both in resting state and under stimulation; the 3-electrode EEG signals of 55 subjects were recorded in resting state; the audio data of 52 subjects were recorded during interviewing, reading, and picture description. We encourage other researchers in the field to use it for testing their methods of mental-disorder analysis.

Read more

Ready to get started?

Join us today