Martin Hlosta | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Martin Hlosta is active.

Explore More

Publication

Featured researches published by Martin Hlosta.

data warehousing and knowledge discovery | 2014

VGEN: Fast Vertical Mining of Sequential Generator Patterns

Philippe Fournier-Viger; Antonio Gomariz; Michal Šebek; Martin Hlosta

Sequential pattern mining is a popular data mining task with wide applications. However, the set of all sequential patterns can be very large. To discover fewer but more representative patterns, several compact representations of sequential patterns have been studied. The set of sequential generators is one the most popular representations. It was shown to provide higher accuracy for classification than using all or only closed sequential patterns. Furthermore, mining generators is a key step in several other data mining tasks such as sequential rule generation. However, mining generators is computationally expensive. To address this issue, we propose a novel mining algorithm named VGEN (Vertical sequential GENerator miner). An experimental study on five real datasets shows that VGEN is up to two orders of magnitude faster than the state-of-the-art algorithms for sequential generator mining.

international learning analytics knowledge conference | 2017

Implementing predictive learning analytics on a large scale: the teacher's perspective

Christothea Herodotou; Bart Rienties; Avinash Boroowa; Zdenek Zdrahal; Martin Hlosta; Galina Naydenova

In this paper, we describe a large-scale study about the use of predictive learning analytics data with 240 teachers in 10 modules at a distance learning higher education institution. The aim of the study was to illuminate teachers uses and practices of predictive data, in particular identify how predictive data was used to support students at risk of not completing or failing a module. Data were collected from statistical analysis of 17,033 students performance by the end of the intervention, teacher usage statistics, and five individual semi-structured interviews with teachers. Findings revealed that teachers endorse the use of predictive data to support their practice yet in diverse ways and raised the need for devising appropriate intervention strategies to support students at risk.

international learning analytics knowledge conference | 2017

Ouroboros: early identification of at-risk students without models based on legacy data

Martin Hlosta; Zdenek Zdrahal; Jaroslav Zendulka

This paper focuses on the problem of identifying students, who are at risk of failing their course. The presented method proposes a solution in the absence of data from previous courses, which are usually used for training machine learning models. This situation typically occurs in new courses. We present the concept of a self-learner that builds the machine learning models from the data generated during the current course. The approach utilises information about already submitted assessments, which introduces the problem of imbalanced data for training and testing the classification models. There are three main contributions of this paper: (1) the concept of training the models for identifying at-risk students using data from the current course, (2) specifying the problem as a classification task, and (3) tackling the challenge of imbalanced data, which appears both in training and testing data. The results show the comparison with the traditional approach of learning the models from the legacy course data, validating the proposed concept.

Scientific Data | 2017

Open University Learning Analytics dataset

Jakub Kuzilek; Martin Hlosta; Zdenek Zdrahal

Learning Analytics focuses on the collection and analysis of learners’ data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students’ interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

learning analytics and knowledge | 2016

Data literacy for learning analytics

Annika Wolff; John Moore; Zdenek Zdrahal; Martin Hlosta; Jakub Kuzilek

This workshop explores how data literacy impacts on learning analytics both for practitioners and for end users. The term data literacy is used to broadly describe the set of abilities around the use of data as part of everyday thinking and reasoning for solving real-world problems. It is a skill required both by learning analytics practitioners to derive actionable insights from data and by the intended end users, such that it affects their ability to accurately interpret and critique presented analysis of data. The latter is particularly important, since learning analytics outcomes can be targeted at a wide range of end users, some of whom will be young students and many of whom are not data specialists. Whilst data literacy is rarely an end goal of learning analytics projects, this workshop aims to find where issues related to data literacy have impacted on project outcomes and where important insights have been gained. This workshop will further encourage the sharing of knowledge and experience through practical activities with datasets and visualisations. This workshop aims to highlight the need for a greater understanding of data literacy as a field of study, especially with regard to communicating around large, complex, data sets.

International Journal of Machine Learning and Computing | 2013

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Martin Hlosta; Rostislav Stríž; Jan Kupčík; Jaroslav Zendulka; Tomáš Hruška

in data classification is a frequently discussed problem that is not well handled by classical classification techniques. The problem we tackled was to learn binary classification model from large data with accuracy constraint for the minority class. We propose a new meta-learning method that creates initial models using cost-sensitive learning by logistic regression and uses these models as initial chromosomes for genetic algorithm. The method has been successfully tested on a large real-world data set from our internet security research. Experiments prove that our method always leads to better results than usage of logistic regression or genetic algorithm alone. Moreover, this method produces easily understandable classification model.

artificial intelligence in education | 2018

Investigating Influence of Demographic Factors on Study Recommenders

Michal Huptych; Martin Hlosta; Zdenek Zdrahal; Jakub Kocvara

Recommender systems in e-learning platforms, can utilise various data about learners in order to provide them with the next best material to study. We build on our previous work, which defines the recommendations in terms of two measures (i.e. relevance and effort) calculated from data of successful students in the previous runs of the courses. In this paper we investigate the impact of students’ socio-demographic factors and analyse how these factors improved the recommendation. It has been shown that education and age were found to have a significant impact on engagement with materials.

Knowledge Based Systems | 2018

Are we meeting a deadline? classification goal achievement in time in the presence of imbalanced data

Martin Hlosta; Zdenek Zdrahal; Jaroslav Zendulka

Abstract This paper addresses the problem of a finite set of entities which are required to achieve a goal within a predefined deadline. For example, a group of students is supposed to submit a homework by a specified cutoff. Further, we are interested in predicting which entities will achieve the goal within the deadline. The predictive models are built based only on the data from that population. The predictions are computed at various time instants by taking into account updated data about the entities. The first contribution of the paper is a formal description of the problem. The important characteristic of the proposed method for model building is the use of the properties of entities that have already achieved the goal. We call such an approach “Self-Learning”. Since typically only a few entities have achieved the goal at the beginning and their number gradually grows, the problem is inherently imbalanced. To mitigate the curse of imbalance, we improved the Self-Learning method by tackling information loss and by several sampling techniques. The original Self-Learning and the modifications have been evaluated in a case study for predicting submission of the first assessment in distance higher education courses. The results show that the proposed improvements outperform the specified two base-line models and the original Self-Learner, and also that the best results are achieved if domain-driven techniques are utilised to tackle the imbalance problem. We also showed that these improvements are statistically significant using Wilcoxon signed rank test.

international learning analytics knowledge conference | 2017

Measures for recommendations based on past students' activity

Michal Huptych; Michal Bohuslavek; Martin Hlosta; Zdenek Zdrahal

This paper introduces two measures for the recommendation of study materials based on students past study activity. We use records from the Virtual Learning Environment (VLE) and analyse the activity of previous students. We assume that the activity of past students represents patterns, which can be used as a basis for recommendations to current students. The measures we define are Relevance, for description of a supposed VLE activity derived from previous students of the course, and Effort, that represents the actual effort of individual current students. Based on these measures, we propose a composite measure, which we call Importance. We use data from the previous course presentations to evaluate of the consistency of students behaviour. We use correlation of the defined measures Relevance and Average Effort to evaluate the behaviour of two different student cohorts and the Root Mean Square Error to measure the deviation of Average Effort and individual student Effort.

advanced data mining and applications | 2013

MLSP: Mining Hierarchically-Closed Multi-Level Sequential Patterns

Michal Šebek; Martin Hlosta; Jaroslav Zendulka; Tomáš Hruška

The problem of mining sequential patterns has been widely studied and many efficient algorithms used to solve this problem have been published. In some cases, there can be implicitly or explicitely defined taxonomies (hierarchies) over input items (e.g. product categories in a e-shop or sub-domains in the DNS system). However, how to deal with taxonomies in sequential pattern mining is marginally discussed. In this paper, we formulate the problem of mining hierarchically-closed multi-level sequential patterns and demonstrate its usefulness. The MLSP algorithm based on the on-demand generalization that outperforms other similar algorithms for mining multi-level sequential patterns is presented here.

Explore More