Mykola Pechenizkiy | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mykola Pechenizkiy is active.

Explore More

Publication

Featured researches published by Mykola Pechenizkiy.

ACM Computing Surveys | 2014

A survey on concept drift adaptation

João Gama; Indrė Žliobaitė; Albert Bifet; Mykola Pechenizkiy; Abdelhamid Bouchachia

Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.

Information Fusion | 2005

Diversity in search strategies for ensemble feature selection

Alexey Tsymbal; Mykola Pechenizkiy; Pádraig Cunningham

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of base classifiers that have diversity in their predictions. One technique, which proved to be effective for constructing an ensemble of diverse base classifiers, is the use of different feature subsets, or so-called ensemble feature selection. Many ensemble feature selection strategies incorporate diversity as an objective in the search for the best collection of feature subsets. A number of ways are known to quantify diversity in ensembles of classifiers, and little research has been done about their appropriateness to ensemble feature selection. In this paper, we compare five measures of diversity with regard to their possible use in ensemble feature selection. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the ensemble accuracy and other characteristics for the ensembles built with ensemble feature selection based on the considered measures of diversity. We consider four search strategies for ensemble feature selection together with the simple random subspacing: genetic search, hill-climbing, and ensemble forward and backward sequential selection. In the experiments, we show that, in some cases, the ensemble feature selection process can be sensitive to the choice of the diversity measure, and that the question of the superiority of a particular measure depends on the context of the use of diversity and on the data being processed. In many cases and on average, the plain disagreement measure is the best. Genetic search, kappa, and dynamic voting with selection form the best combination of a search strategy, diversity measure and integration method.

adaptive hypermedia conference | 2009

AH 12 years later: a comprehensive survey of adaptive hypermedia methods and techniques

E Evgeny Knutov; Pme Paul De Bra; Mykola Pechenizkiy

A hypermedia application offers its users much freedom to navigate through a large hyperspace. Adaptive hypermedia (AH) offers personalized content, presentation, and navigation support. Many adaptive hypermedia systems (AHS) are tightly integrated with one specific application and/or use a limited number of techniques and methods. This makes it difficult to capture all of them in one generic model. In this paper we examine adaptation questions stated in the very beginning of the AH era and elaborate on their recent interpretations. We will reconsider design issues for application independent generic AHS, review open questions of system extensibility introduced in adjacent research fields and try to come up with an up-to-date taxonomy of adaptation techniques and an extensive set of requirements for a new adaptive system reference model or architecture, to be developed in the future.

Information Fusion | 2008

Dynamic integration of classifiers for handling concept drift

Alexey Tsymbal; Mykola Pechenizkiy; Pádraig Cunningham; Seppo Puuronen

In the real world concepts are often not stable but change with time. A typical example of this in the biomedical context is antibiotic resistance, where pathogen sensitivity may change over time as new pathogen strains develop resistance to antibiotics that were previously effective. This problem, known as concept drift, complicates the task of learning a model from data and requires special approaches, different from commonly used techniques that treat arriving instances as equally important contributors to the final concept. The underlying data distribution may change as well, making previously built models useless. This is known as virtual concept drift. Both types of concept drifts make regular updates of the model necessary. Among the most popular and effective approaches to handle concept drift is ensemble learning, where a set of models built over different time periods is maintained and the best model is selected or the predictions of models are combined, usually according to their expertise level regarding the current concept. In this paper we propose the use of an ensemble integration technique that would help to better handle concept drift at an instance level. In dynamic integration of classifiers, each base classifier is given a weight proportional to its local accuracy with regard to the instance tested, and the best base classifier is selected, or the classifiers are integrated using weighted voting. Our experiments with synthetic data sets simulating abrupt and gradual concept drifts and with a real-world antibiotic resistance data set demonstrate that dynamic integration of classifiers built over small time intervals or fixed-sized data blocks can be significantly better than majority voting and weighted voting, which are currently the most commonly used integration techniques for handling concept drift with ensembles.

conference on advanced information systems engineering | 2011

Handling concept drift in process mining

R. P. Jagadeesh Chandra Bose; Wil M. P. van der Aalst; Indre Žliobaite; Mykola Pechenizkiy

Operational processes need to change to adapt to changing circumstances, e.g., new legislation, extreme variations in supply and demand, seasonal effects, etc.While the topic of flexibility is well-researched in the BPM domain, contemporary process mining approaches assume the process to be in steady state. When discovering a process model from event logs, it is assumed that the process at the beginning of the recorded period is the same as the process at the end of the recorded period. Obviously, this is often not the case due to the phenomenon known as concept drift. While cases are being handled, the process itself may be changing. This paper presents an approach to analyze such second-order dynamics. The approach has been implemented in ProM1 and evaluated by analyzing an evolving process.

international conference on data mining | 2010

Discrimination Aware Decision Tree Learning

Faisal Kamiran; Tgk Toon Calders; Mykola Pechenizkiy

Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find a classifier with high predictive accuracy that at the same time does not discriminate on the basis of the given attribute B. This problem is motivated by the fact that often available historic data is biased due to discrimination, e.g., when B denotes ethnicity. Using the standard learners on this data may lead to wrongfully biased classifiers, even if the attribute B is removed from training data. Existing solutions for this problem consist in “cleaning away” the discrimination from the dataset before a classifier is learned. In this paper we study an alternative approach in which the non-discrimination constraint is pushed deeply into a decision tree learner by changing its splitting criterion and pruning strategy. Experimental evaluation shows that the proposed approach advances the state-of-the-art in the sense that the learned decision trees have a lower discrimination than models provided by previous methods, with little loss in accuracy.

IEEE Transactions on Neural Networks | 2014

Dealing With Concept Drifts in Process Mining

R. P. Jagadeesh Chandra Bose; Wil M. P. van der Aalst; Indre Zliobaite; Mykola Pechenizkiy

Although most business processes change over time, contemporary process mining techniques tend to analyze these processes as if they are in a steady state. Processes may change suddenly or gradually. The drift may be periodic (e.g., because of seasonal influences) or one-of-a-kind (e.g., the effects of new legislation). For the process management, it is crucial to discover and understand such concept drifts in processes. This paper presents a generic framework and specific techniques to detect when a process changes and to localize the parts of the process that have changed. Different features are proposed to characterize relationships among activities. These features are used to discover differences between successive populations. The approach has been implemented as a plug-in of the ProM process mining framework and has been evaluated using both simulated event data exhibiting controlled concept drifts and real-life event data from a Dutch municipality.

european conference on machine learning | 2006

Dynamic integration with random forests

Alexey Tsymbal; Mykola Pechenizkiy; Pádraig Cunningham

Random Forests (RF) are a successful ensemble prediction technique that uses majority voting or averaging as a combination function. However, it is clear that each tree in a random forest may have a different contribution in processing a certain instance. In this paper, we demonstrate that the prediction performance of RF may still be improved in some domains by replacing the combination function with dynamic integration, which is based on local performance estimates. Our experiments also demonstrate that the RF Intrinsic Similarity is better than the commonly used Heterogeneous Euclidean/Overlap Metric in finding a neighbourhood for local estimates in the context of dynamic integration of classification random forests.

acm symposium on applied computing | 2009

Using minimum description length for process mining

Tgk Toon Calders; Cw Christian Günther; Mykola Pechenizkiy; A Anne Rozinat

In the field of process mining, the goal is to automatically extract process models from event logs. Recently, many algorithms have been proposed for this task. For comparing these models, different quality measures have been proposed. Most of these measures, however, have several disadvantages; they are model-dependent, assume that the model that generated the log is known, or need negative examples of event sequences. In this paper we propose a new measure, based on the minimal description length principle, to evaluate the quality of process models that does not have these disadvantages. To illustrate the properties of the new measure we conduct experiments and discuss the trade-off between model complexity and compression.

computer based medical systems | 2013

Stress detection from speech and Galvanic Skin Response signals

H Hindra Kurniawan; Alexandr V. Maslov; Mykola Pechenizkiy

The problem of stress-management has been receiving an increasing attention in related research communities due to a wider recognition of potential problems caused by chronic stress and due to the recent developments of technologies providing non-intrusive ways of collecting continuously objective measurements to monitor persons stress level. Experimental studies have shown already that stress level can be judged based on the analysis of Galvanic Skin Response (GSR) and speech signals. In this paper we investigate how classification techniques can be used to automatically determine periods of acute stress relying on information contained in GSR and/or speech of a person.

Explore More