Krystyna Napierala | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Krystyna Napierala is active.

Explore More

Publication

Featured researches published by Krystyna Napierala.

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing | 2010

Learning from imbalanced data in presence of noisy and borderline examples

Krystyna Napierala; Jerzy Stefanowski; Szymon Wilk

In this paper we studied re-sampling methods for learning classifiers from imbalanced data. We carried out a series of experiments on artificial data sets to explore the impact of noisy and borderline examples from the minority class on the classifier performance. Results showed that if data was sufficiently disturbed by these factors, then the focused re-sampling methods - NCR and our SPIDER2 - strongly outperformed the oversampling methods. They were also better for real-life data, where PCA visualizations suggested possible existence of noisy examples and large overlapping ares between classes.

intelligent information systems | 2012

BRACID: a comprehensive approach to learning rules from imbalanced data

Krystyna Napierala; Jerzy Stefanowski

In this paper we consider induction of rule-based classifiers from imbalanced data, where one class (a minority class) is under-represented in comparison to the remaining majority classes. The minority class is usually of primary interest. However, most rule-based classifiers are biased towards the majority classes and they have difficulties with correct recognition of the minority class. In this paper we discuss sources of these difficulties related to data characteristics or to an algorithm itself. Among the problems related to the data distribution we focus on the role of small disjuncts, overlapping of classes and presence of noisy examples. Then, we show that standard techniques for induction of rule-based classifiers, such as sequential covering, top-down induction of rules or classification strategies, were created with the assumption of balanced data distribution, and we explain why they are biased towards the majority classes. Some modifications of rule-based classifiers have been already introduced, but they usually concentrate on individual problems. Therefore, we propose a novel algorithm, BRACID, which more comprehensively addresses the issues associated with imbalanced data. Its main characteristics includes a hybrid representation of rules and single examples, bottom-up learning of rules and a local classification strategy using nearest rules. The usefulness of BRACID has been evaluated in experiments on several imbalanced datasets. The results show that BRACID significantly outperforms the well known rule-based classifiers C4.5rules, RIPPER, PART, CN2, MODLEM as well as other related classifiers as RISE or K-NN. Moreover, it is comparable or better than the studied approaches specialized for imbalanced data such as generalizations of rule algorithms or combinations of SMOTE + ENN preprocessing with PART. Finally, it improves the support of minority class rules, leading to better recognition of the minority class examples.

intelligent information systems | 2016

Types of minority class examples and their influence on learning classifiers from imbalanced data

Krystyna Napierala; Jerzy Stefanowski

Many real-world applications reveal difficulties in learning classifiers from imbalanced data. Although several methods for improving classifiers have been introduced, the identification of conditions for the efficient use of the particular method is still an open research problem. It is also worth to study the nature of imbalanced data, characteristics of the minority class distribution and their influence on classification performance. However, current studies on imbalanced data difficulty factors have been mainly done with artificial datasets and their conclusions are not easily applicable to the real-world problems, also because the methods for their identification are not sufficiently developed. In our paper, we capture difficulties of class distribution in real datasets by considering four types of minority class examples: safe, borderline, rare and outliers. First, we confirm their occurrence in real data by exploring multidimensional visualizations of selected datasets. Then, we introduce a method for an identification of these types of examples, which is based on analyzing a class distribution in a local neighbourhood of the considered example. Two ways of modeling this neighbourhood are presented: with k-nearest examples and with kernel functions. Experiments with artificial datasets show that these methods are able to re-discover simulated types of examples. Next contributions of this paper include carrying out a comprehensive experimental study with 26 real world imbalanced datasets, where (1) we identify new data characteristics basing on the analysis of types of minority examples; (2) we demonstrate that considering the results of this analysis allow to differentiate classification performance of popular classifiers and pre-processing methods and to evaluate their areas of competence. Finally, we highlight directions of exploiting the results of our analysis for developing new algorithms for learning classifiers and pre-processing methods.

hybrid artificial intelligence systems | 2012

Identification of different types of minority class examples in imbalanced data

Krystyna Napierala; Jerzy Stefanowski

The characteristics of the minority class distribution in imbalanced data is studied. Four types of minority examples --- safe, borderline, rare and outlier --- are distinguished and analysed. We propose a new method for identification of these examples in the data, based on analysing the local neighbourhoods of examples. Its application to UCI imbalanced datasets shows that the minority class is often scattered without too many safe examples. This characteristics of data distributions is also confirmed by another analysis with Multidimensional Scaling visualization. We examine the influence of these types of examples on 6 different classifiers learned over various real-world datasets. Results of experiments show that the particular classifiers reveal different sensitivity to the type of examples.

parallel processing and applied mathematics | 2011

Efficient isosurface extraction using marching tetrahedra and histogram pyramids on multiple GPUs

Milosz Ciznicki; Michal Kierzynka; Krzysztof Kurowski; Bogdan Ludwiczak; Krystyna Napierala; Jarosław Palczyński

The algorithms for isosurface extraction have become crucial in petroleum industry, medicine and many other fields over the last years. Nowadays market demands engender a need for methods that not only construct accurate 3D models but also deal with the problem efficiently. Recently, a few highly optimized approaches taking advantage of modern graphics processing units (GPUs) have been published in the literature. However, despite their satisfactory speed, they all may be unsuitable in real-life applications due to limits on maximum domain size they can process. In this paper we present a novel approach to surface extraction by combining the algorithm of Marching Tetrahedra with the idea of Histogram Pyramids. Our GPU-based application can process CT and MRI scan data. Thanks to domain decomposition, the only limiting factor for the size of input instance is the amount of memory needed to store the resulting model. The solution is also immensely fast achieving up to 107-fold speedup comparing to a serial CPU code. Moreover, multiple GPUs support makes it very scalable. Provided tool enables the user to visualize generated model and to modify it in an interactive manner.

Expert Systems With Applications | 2015

Addressing imbalanced data with argument based rule learning

Krystyna Napierala; Jerzy Stefanowski

We improve learning rules from imbalanced data by using the expert knowledge.An expert explains the decision for some critical examples, giving arguments.Three methods of identifying critical examples are proposed and compared.Induced rules reflect the expert knowledge and better classify the minority examples.Trade-off between the recognition of the minority and majority classes is maintained. In this paper we focus on improving rule based classifiers learned from class imbalanced data by incorporating expert knowledge into the learning process. Applying expert knowledge should overcome limitations of standard methods for imbalanced data when minority classes contain many rare examples and outliers. It should also improve the minority class while maintaining better classification accuracy of the majority classes than the standard methods. Unlike existing proposals for integrating global expert knowledge into rule induction, the class imbalance requires considering local characteristics of class distributions. Therefore, we consider argument based learning, where a domain expert can annotate (explain) some of learning examples to describe reasons for assigning them to specific classes. Using local arguments should improve the interpretability of rules and their consistency with the domain knowledge, and should also result in a better recognition of the minority class. The main aim of our study is to show how argument based learning can be adapted to learn rules from imbalanced data. To achieve it, we introduce a new argument based rule induction algorithm ABMODLEM with a specialized classification strategy for imbalanced classes. Then, we propose new methods for identifying the examples which should be explained by an expert. They exploit the idea of active learning with the query by an ensemble. The proposed approach has been evaluated in an extensive computational experiment. Results show that argument based learning improves the minority class recognition, especially for difficult data distributions with rare examples and outliers. Moreover, ABMODLEM is compared against standard rule classifiers and their extensions with SMOTE preprocessing.

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing | 2010

Argument based generalization of MODLEM rule induction algorithm

Krystyna Napierala; Jerzy Stefanowski

Argument based learning allows experts to express their domain, local knowledge about the circumstances of making classification decisions for some learning examples. In this paper we have incorporated this idea in rule induction as a generalization of the MODLEM algorithm. To adjust the algorithm to the redefined task, a new measure for evaluating rule conditions and a new classification strategy with rules had to be introduced. Experimental studies showed that using arguments improved classification accuracy and structure of rules. Moreover the proper argumentation improved recognition of the minority class in imbalanced data without essential decreasing recognition of the majority classes.

international syposium on methodologies for intelligent systems | 2014

Local Characteristics of Minority Examples in Pre-processing of Imbalanced Data

Jerzy Stefanowski; Krystyna Napierala; Małgorzata Trzcielińska

Informed pre-processing methods for improving classifiers learned from class-imbalanced data are considered. We discuss different ways of analyzing the characteristics of local distributions of examples in such data. Then, we experimentally compare main informed pre-processing methods and show that identifying types of minority examples depending on their k nearest neighbourhood may help in explaining differences in performance of these methods. Finally, we exploit the information about the local neighbourhood to modify the oversampling ratio in a SMOTE–related method.

International Workshop on New Frontiers in Mining Complex Patterns | 2016

Increasing the Interpretability of Rules Induced from Imbalanced Data by Using Bayesian Confirmation Measures

Krystyna Napierala; Jerzy Stefanowski; Izabela Szczȩch

Approaches to support an interpretation of rules induced from imbalanced data are discussed. In this paper, the rule learning algorithm BRACID dedicated to class imbalance is considered. As it may induce too many rules, which hinders their interpretation, their filtering is applied. We introduce three different strategies, which aim at selecting rules having good descriptive characteristics. The strategies are based on combining Bayesian confirmation measures with rule support, which have not yet been studied in the class imbalance context. Experimental results show that these strategies reduce the number of rules and improve values of rule interestingness measures at the same time, without considerable losses of prediction abilities, especially for the minority class.

Fundamenta Informaticae | 2016

Post-processing of BRACID Rules Induced from Imbalanced Data

Krystyna Napierala; Jerzy Stefanowski

Rule-based classifiers constructed from imbalanced data fail to correctly classify instances from the minority class. Solutions to this problem should deal with data and algorithmic difficulty factors. The new algorithm BRACID addresses these factors more comprehensively than other proposals. The experimental evaluation of classification abilities of BRACID shows that it significantly outperforms other rule approaches specialized for imbalanced data. However, it may generate too high a number of rules, which hinder the human interpretation of the discovered rules. Thus, the method for post-processing of BRACID rules is presented. It aims at selecting rules characterized by high supports, in particular for the minority class, and covering diversified subsets of examples. Experimental studies confirm its usefulness.

Explore More