Petr Knoth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petr Knoth is active.

Explore More

Publication

Featured researches published by Petr Knoth.

Archive | 2014

Predicting Student Performance from Combined Data Sources

Annika Wolff; Zdenek Zdrahal; Drahomira Herrmannova; Petr Knoth

This chapter will explore the use of predictive modeling methods for identifying students who will benefit most from tutor interventions. This is a growing area of research and is especially useful in distance learning where tutors and students do not meet face to face. The methods discussed will include decision-tree classification, support vector machine (SVM), general unary hypotheses automaton (GUHA), Bayesian networks, and linear and logistic regression. These methods have been trialed through building and testing predictive models using data from several Open University (OU) modules. The Open University offers a good test-bed for this work, as it is one of the largest distance learning institutions in Europe. The chapter will discuss how the predictive capacity of the different sources of data changes as the course progresses. It will also highlight the importance of understanding how a student’s pattern of behavior changes during the course.

D-lib Magazine | 2012

Visual Search for Supporting Content Exploration in Large Document Collections

Drahomira Herrmannova; Petr Knoth

In recent years a number of new approaches for visualising and browsing document collections have been developed. These approaches try to address the problems associated with the growing amounts of content available and the changing patterns in the way people interact with information. Users now demand better support for exploring document collections to discover connections, compare and contrast information. Although visual search interfaces have the potential to improve the user experience in exploring document collections compared to textual search interfaces, they have not yet become as popular among users. The reasons for this range from the design of such visual interfaces to the way these interfaces are implemented and used. In this paper we study these reasons and determine the factors that contribute to an improved visual browsing experience. Consequently, by taking these factors into account, we propose a novel visual search interface that improves exploratory search and the discovery of document relations. We explain our universal approach, and how it could be applied to any document collection, such as news articles, cultural heritage artifacts or research papers.

D-lib Magazine | 2016

An Analysis of the Microsoft Academic Graph

Drahomira Herrmannova; Petr Knoth

In this paper we analyse a new dataset of scholarly publications, the Microsoft Academic Graph (MAG). The MAG is a heterogeneous graph comprised of over 120 million publication entities and related authors, institutions, venues and fields of study. It is also the largest publicly available dataset of citation data. As such, it is an important resource for scholarly communications research. As the dataset is assembled using automatic methods, it is important to understand its strengths and limitations, especially whether there is any noise or bias in the data, before applying it to a particular task. This article studies the characteristics of the dataset and provides a correlation analysis with other publicly available research publication datasets to answer these questions. Our results show that the citation data and publication metadata correlate well with external datasets. The MAG also has very good coverage across different domains with a slight bias towards technical disciplines. On the other hand, there are certain limitations to completeness. Only 30 million papers out of 127 million have some citation data. While these papers have a good mean total citation count that is consistent with expectations, there is some level of noise when extreme values are considered. Other current limitations of MAG are the availability of affiliation information, as only 22 million papers have these data, and the normalisation of institution names.

Proceedings of the 1st Workshop on Scholarly Web Mining | 2017

Building recommender systems for scholarly information

Maya Hristakeva; Daniel Kershaw; Marco Rossetti; Petr Knoth; Benjamin Pettit; Saúl Vargas; Kris Jack

The depth and breadth of research now being published is overwhelming for an individual researcher to keep track of let alone consume. Recommender systems have been developed to make it easier for researchers to discover relevant content. However, these have predominately taken the form of item-to-item recommendations using citation network features or text similarity features. This paper details how the Mendeley Suggest recommender system has been designed and developed. We show how implicit user feedback (based on activity data from the reference manager) and collaborative filtering (CF) are used to generate the recommendations for Mendeley Suggest. Because collaborative filtering suffers from the cold start problem (the inability to serve recommendations to new users), we developed additional recommendation methods based on user-defined attributes, such as discipline and research interests. Our off-line evaluation shows that where possible, recommendations based on collaborative filtering perform best, followed by recommendations based on recent activity. However, for cold users (for whom collaborative filtering was not possible) recommendations based on discipline performed best. Additionally, when we segmented users by career stages, we found that among senior academics, content-based recommendations from recent activity had comparable performance to collaborative filtering. This justifies our approach of developing a variety of recommendation methods, in order to serve a range of users across the academic spectrum.

acm/ieee joint conference on digital libraries | 2014

Design of Europeana cloud technical infrastructure

Pavel Kats; Petr Knoth; Georgios Mamakis; Marcin Mielnicki; Markus Muhr; Marcin Werla

In this paper, we present the overview of Europeana Cloud system, which is a new undertaking of Europeana Foundation and partnering institutions aimed to provide shared, cloud-based infrastructure for aggregation and exchange of cultural heritage metadata and content for European institutions.

international conference theory and practice digital libraries | 2017

Incidental or Influential? - Challenges in Automatically Detecting Citation Importance Using Publication Full Texts

David Pride; Petr Knoth

This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text. We analyse a range of features that have been previously used in this task. Our experimental results confirm that the number of in-text references are highly predictive of influence. Contrary to the work of Valenzuela et al. (2015) [1], we find abstract similarity one of the most predictive features. Overall, we show that many of the features previously described in literature are not particularly predictive. Consequently, we discuss challenges and potential improvements in the classification pipeline, provide a critical review of the performance of individual features and address the importance of constructing a large scale gold-standard reference dataset.

theory and practice of digital libraries | 2011

Connecting repositories in the open access domain using text mining and semantic data

Petr Knoth; Vojtech Robotka; Zdenek Zdrahal

This paper presents CORE (COnnecting REpositories), a system that aims to facilitate the access and navigation across scientific papers stored in Open Access repositories. This is being achieved by harvesting metadata and full-text content from Open Access repositories, by applying text mining techniques to discover semanticly related articles and by representing and exposing these relations as Linked Data. The information about associations between articles expressed in an interoperable format will enable the emergence of a wide range of applications. The potential of CORE can be demonstrated on two use-cases: (1) Improving the the navigation capabilities of digital libraries by the means of a CORE pluging, (2) Providing access to digital content from smart phones and tablet devices by the means of the CORE Mobile application.

Proceedings of the 1st Workshop on Scholarly Web Mining | 2017

Citations and Readership are Poor Indicators of Research Excellence: Introducing TrueImpactDataset, a New Dataset for Validating Research Evaluation Metrics

Drahomira Herrmannova; Robert M. Patton; Petr Knoth; Christopher G. Stahl

In this paper we show that citation counts and Mendeley readership are poor indicators of research excellence. Our experimental design builds on the assumption that a good evaluation metric should be able to distinguish publications that have changed a research field from those that have not. The experiment has been conducted on a new dataset for bibliometric research which we call TrueImpactDataset. TrueImpactDataset is a collection of research publications of two types -- research papers which are considered seminal work in their area and papers which provide a survey (a literature review) of a research area. The dataset also contains related metadata, which include DOIs, titles, authors and abstracts. We describe how the dataset was built and provide overview statistics of the dataset. We propose to use the dataset for validating research evaluation metrics. By using this data, we show that widely used research metrics only poorly distinguish excellent research.

acm/ieee joint conference on digital libraries | 2016

Semantometrics: Towards Fulltext-based Research Evaluation

Drahomira Herrmannova; Petr Knoth

Over the recent years, there has been a growing interest in developing new research evaluation methods that could go beyond the traditional citation-based metrics. This interest is motivated on one side by the wider availability or even emergence of new information evidencing research performance, such as article downloads, views and Twitter mentions, and on the other side by the continued frustrations and problems surrounding the application of purely citation-based metrics to evaluate research performance in practice. Semantometrics are a new class of research evaluation metrics which build on the premise that full-text is needed to assess the value of a publication. This paper reports on the analysis carried out with the aim to investigate the properties of the semantometric contribution measure [1], which uses semantic similarity of publications to estimate research contribution, and provides a comparative study of the contribution measure with traditional bibliometric measures based on citation counting.

international conference theory and practice digital libraries | 2018

Peer Review and Citation Data in Predicting University Rankings, a Large-Scale Analysis

David Pride; Petr Knoth

Most Performance-based Research Funding Systems (PRFS) draw on peer review and bibliometric indicators, two different methodologies which are sometimes combined. A common argument against the use of indicators in such research evaluation exercises is their low correlation at the article level with peer review judgments. In this study, we analyse 191,000 papers from 154 higher education institutes which were peer reviewed in a national research evaluation exercise. We combine these data with 6.95 million citations to the original papers. We show that when citation-based indicators are applied at the institutional or departmental level, rather than at the level of individual papers, surprisingly large correlations with peer review judgments can be observed, up to \({ r<= 0.802, n = 37, p < 0.001}\) for some disciplines. In our evaluation of ranking prediction performance based on citation data, we show we can reduce the mean rank prediction error by 25% compared to previous work. This suggests that citation-based indicators are sufficiently aligned with peer review results at the institutional level to be used to lessen the overall burden of peer review on national evaluation exercises leading to considerable cost savings.

Explore More