Bartosz Krawczyk
Virginia Commonwealth University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bartosz Krawczyk.
Progress in Artificial Intelligence | 2016
Bartosz Krawczyk
Despite more than two decades of continuous development learning from imbalanced data is still a focus of intense research. Starting as a problem of skewed distributions of binary tasks, this topic evolved way beyond this conception. With the expansion of machine learning and data mining, combined with the arrival of big data era, we have gained a deeper insight into the nature of imbalanced learning, while at the same time facing new emerging challenges. Data-level and algorithm-level methods are constantly being improved and hybrid approaches gain increasing popularity. Recent trends focus on analyzing not only the disproportion between classes, but also other difficulties embedded in the nature of data. New real-life problems motivate researchers to focus on computationally efficient, adaptive and real-time methods. This paper aims at discussing open issues and challenges that need to be addressed to further develop the field of imbalanced learning. Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision. This paper provides a discussion and suggestions concerning lines of future research for each of them.
Applied Soft Computing | 2014
Bartosz Krawczyk; Michał Woniak; Gerald Schaefer
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.
Information Sciences | 2014
Bartosz Krawczyk; Michał Woniak; Bogusław Cyganek
This paper presents a novel multi-class classifier based on weighted one-class support vector machines (OCSVM) operating in the clustered feature space. We show that splitting the target class into atomic subsets and using these as input for one-class classifiers leads to an efficient and stable recognition algorithm. The proposed system extends our previous works on combining OCSVM classifiers to solve both one-class and multi-class classification tasks. The main contribution of this work is the novel architecture for class decomposition and combination of classifier outputs. Based on the results of a large number of computational experiments we show that the proposed method outperforms both the OCSVM for a single class, as well as the multi-class SVM for multi-class classification problems. Other advantages are the highly parallel structure of the proposed solution, which facilitates parallel training and execution stages, and the relatively small number of control parameters.
Neurocomputing | 2014
Bartosz Krawczyk; Michał Woniak
One-class classification is one of the most challenging topics in contemporary machine learning and not much attention had been paid to the task of creating efficient one-class ensembles. The paper deals with the problem of designing combined recognition system based on the pools of individual one-class classifiers. We propose the new model dedicated to the one-class classification and introduce novel diversity measures dedicated to it. The proposed model of an one class classifier committee may be used for single-class and multi-class classification tasks. The proposed measures and classification models were evaluated on the basis of computer experiments which were carried out on diverse set of benchmark datasets. Their results confirm that introducing diversity measures dedicated to one-class ensembles is a worthwhile research direction and prove that the proposed models are valuable propositions which can outperform the traditional methods for one-class classification.
International Journal of Applied Mathematics and Computer Science | 2012
Michał Woźniak; Bartosz Krawczyk
This paper presents a significant modification to the AdaSS (Adaptive Splitting and Selection) algorithm, which was developed several years ago. The method is based on the simultaneous partitioning of the feature space and an assignment of a compound classifier to each of the subsets. The original version of the algorithm uses a classifier committee and a majority voting rule to arrive at a decision. The proposed modification replaces the fairly simple fusion method with a combined classifier, which makes a decision based on a weighted combination of the discriminant functions of the individual classifiers selected for the committee. The weights mentioned above are dependent not only on the classifier identifier, but also on the class number. The proposed approach is based on the results of previous works, where it was proven that such a combined classifier method could achieve significantly better results than simple voting systems. The proposed modification was evaluated through computer experiments, carried out on diverse benchmark datasets. The results are very promising in that they show that, for most of the datasets, the proposed method outperforms similar techniques based on the clustering and selection approach.
Neurocomputing | 2015
Bartosz Krawczyk
Abstract This paper introduces a novel technique for forming efficient one-class classifier ensembles. It combines an ensemble pruning algorithm with weighted classifier fusion module. The ensemble pruning is realized as a search problem and implemented through a swarm intelligence approach. A firefly algorithm is selected as the framework for managing the process of reducing the size of the classifier pool. Input classifiers are coded as population members. The interactions between fireflies are realized through the consistency measure, which describes the effectiveness of individual one-class classifiers. A new pairwise diversity measure, based on calculating the intersections between spherical one-class classifiers, is used for controlling the movements of fireflies. With this, we indirectly implement a multi-objective optimization, as selected classifiers have at the same time high individual accuracy and are mutually diverse. The fireflies form groups and for each group the best representative is selected – thus realizing the pruning task. Additionally, a classifier weight calculation scheme based on the brightness of fireflies is applied for weighted fusion. Experimental analysis, backed-up with statistical tests, proves the quality of the proposed method and its ability to outperform state-of-the-art algorithms for selecting one-class classifiers for the classification committees.
Neurocomputing | 2017
Sergio Ramrez-Gallego; Bartosz Krawczyk; Salvador Garca; Micha Woniak; Francisco Herrera
Data preprocessing and reduction have become essential techniques in current knowledge discovery scenarios, dominated by increasingly large datasets. These methods aim at reducing the complexity inherent to real-world datasets, so that they can be easily processed by current data mining solutions. Advantages of such approaches include, among others, a faster and more precise learning process, and more understandable structure of raw data. However, in the context of data preprocessing techniques for data streams have a long road ahead of them, despite online learning is growing in importance thanks to the development of Internet and technologies for massive data collection. Throughout this survey, we summarize, categorize and analyze those contributions on data preprocessing that cope with streaming data. This work also takes into account the existing relationships between the different families of methods (feature and instance selection, and discretization). To enrich our study, we conduct thorough experiments using the most relevant contributions and present an analysis of their predictive performance, reduction rates, computational time, and memory usage. Finally, we offer general advices about existing data stream preprocessing algorithms, as well as discuss emerging future challenges to be faced in the domain of data stream preprocessing.
International Journal of Neural Systems | 2014
Konrad Jackowski; Bartosz Krawczyk; Michał Woźniak
Currently, methods of combined classification are the focus of intense research. A properly designed group of combined classifiers exploiting knowledge gathered in a pool of elementary classifiers can successfully outperform a single classifier. There are two essential issues to consider when creating combined classifiers: how to establish the most comprehensive pool and how to design a fusion model that allows for taking full advantage of the collected knowledge. In this work, we address the issues and propose an AdaSS+, training algorithm dedicated for the compound classifier system that effectively exploits local specialization of the elementary classifiers. An effective training procedure consists of two phases. The first phase detects the classifier competencies and adjusts the respective fusion parameters. The second phase boosts classification accuracy by elevating the degree of local specialization. The quality of the proposed algorithms are evaluated on the basis of a wide range of computer experiments that show that AdaSS+ can outperform the original method and several reference classifiers.
Pattern Recognition | 2016
José A. Sáez; Bartosz Krawczyk; Michał Woźniak
Canonical machine learning algorithms assume that the number of objects in the considered classes are roughly similar. However, in many real-life situations the distribution of examples is skewed since the examples of some of the classes appear much more frequently. This poses a difficulty to learning algorithms, as they will be biased towards the majority classes. In recent years many solutions have been proposed to tackle imbalanced classification, yet they mainly concentrate on binary scenarios. Multi-class imbalanced problems are far more difficult as the relationships between the classes are no longer straightforward. Additionally, one should analyze not only the imbalance ratio but also the characteristics of the objects within each class. In this paper we present a study on oversampling for multi-class imbalanced datasets that focuses on the analysis of the class characteristics. We detect subsets of specific examples in each class and fix the oversampling for each of them independently. Thus, we are able to use information about the class structure and boost the more difficult and important objects. We carry an extensive experimental analysis, which is backed-up with statistical analysis, in order to check when the preprocessing of some types of examples within a class may improve the indiscriminate preprocessing of all the examples in all the classes. The results obtained show that oversampling concrete types of examples may lead to a significant improvement over standard multi-class preprocessing that do not consider the importance of example types. Graphical abstractDisplay Omitted HighlightsA thorough analysis of oversampling for handling multi-class imbalanced datasets.Proposition to detect underlying structures and example types in considered classes.Smart oversampling based on extracted knowledge about imbalance distribution types.In-depth insight into the importance of selecting proper examples for oversampling.Guidelines that allow to design efficient classifiers for multi-class imbalanced data.
Applied Soft Computing | 2016
Bartosz Krawczyk; Mikel Galar; Łukasz Jeleń; Francisco Herrera
Graphical abstractDisplay Omitted HighlightsAutomatic clinical decision support system for breast cancer malignancy grading.Different methodologies for segmentation and feature extraction from FNA slides.An efficient classifier ensemble for imbalanced problems with difficult data.Ensemble combines boosting with evolutionary undersampling.Extensive computational experiments on a large database collected by authors. In this paper, we propose a complete, fully automatic and efficient clinical decision support system for breast cancer malignancy grading. The estimation of the level of a cancer malignancy is important to assess the degree of its progress and to elaborate a personalized therapy. Our system makes use of both Image Processing and Machine Learning techniques to perform the analysis of biopsy slides. Three different image segmentation methods (fuzzy c-means color segmentation, level set active contours technique and grey-level quantization method) are considered to extract the features used by the proposed classification system. In this classification problem, the highest malignancy grade is the most important to be detected early even though it occurs in the lowest number of cases, and hence the malignancy grading is an imbalanced classification problem. In order to overcome this difficulty, we propose the usage of an efficient ensemble classifier named EUSBoost, which combines a boosting scheme with evolutionary undersampling for producing balanced training sets for each one of the base classifiers in the final ensemble. The usage of the evolutionary approach allows us to select the most significant samples for the classifier learning step (in terms of accuracy and a new diversity term included in the fitness function), thus alleviating the problems produced by the imbalanced scenario in a guided and effective way. Experiments, carried on a large dataset collected by the authors, confirm the high efficiency of the proposed system, shows that level set active contours technique leads to an extraction of features with the highest discriminative power, and prove that EUSBoost is able to outperform state-of-the-art ensemble classifiers in a real-life imbalanced medical problem.