Marcin Czajkowski
Bialystok University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcin Czajkowski.
Artificial Intelligence in Medicine | 2014
Marcin Czajkowski; Marek Grześ; Marek Kretowski
OBJECTIVE The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. METHODS We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. RESULTS Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. CONCLUSION This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts.
soft computing | 2010
Marek Kretowski; Marcin Czajkowski
In the paper a new evolutionary algorithm for induction of univariate regression trees is proposed. In contrast to typical top-down approaches it globally searches for the best tree structure and tests in internal nodes. The population of initial trees is created with diverse top-down methods on randomly chosen sub-samples of the training data. Specialized genetic operators allow the algorithm to efficiently evolve regression trees. The complexity term introduced in the fitness function helps to mitigate the over-fitting problem. The preliminary experimental validation is promising as the resulting trees can be significantly less complex with at least comparable performance to the classical top-down counterpart.
Information Sciences | 2014
Marcin Czajkowski; Marek Kretowski
Metaheuristics, such as evolutionary algorithms ( EA s), have been successfully applied to the problem of decision tree induction. Recently, an EA was proposed to evolve model trees, which are a particular type of decision tree that is employed to solve regression problems. However, there is a need to specialize the EA s in order to exploit the full potential of evolutionary induction. The main contribution of this paper is a set of solutions and techniques that incorporates knowledge about the inducing problem for the global model tree into the evolutionary search. The objective of this paper is to demonstrate that specialized EA can find more accurate and less complex solutions to the traditional greedy-induced counterparts and the straightforward application of EA .This paper proposes a novel solution for each step of the evolutionary process and presents a new specialized EA for model tree induction called the Global Model Tree ( GMT ). An empirical investigation shows that trees induced by the GMT are one order of magnitude less complex than trees induced by popular greedy algorithms, and they are equivalent in terms of predictive accuracy with output models from straightforward implementations of evolutionary induction and state-of-the-art methods.
parallel problem solving from nature | 2010
Marcin Czajkowski; Marek Kretowski
In the paper we propose a new evolutionary algorithm for induction of univariate regression trees that associate leaves with simple linear regression models. In contrast to typical top-down approaches it globally searches for the best tree structure, tests in internal nodes and models in leaves. The population of initial trees is created with diverse top-down methods on randomly chosen subsamples of the training data. Specialized genetic operators allow the algorithm to efficiently evolve regression trees. Akaikes information criterion (AIC) as the fitness function helps to mitigate the overfitting problem. The preliminary experimental validation is promising as the resulting trees can be significantly less complex with at least comparable performance to the classical top-down counterparts.
soft computing | 2017
Krzysztof Jurczuk; Marcin Czajkowski; Marek Kretowski
Evolutionary induction of decision trees is an emerging alternative to greedy top-down approaches. Its growing popularity results from good prediction performance and less complex output trees. However, one of the major drawbacks associated with the application of evolutionary algorithms is the tree induction time, especially for large-scale data. In the paper, we design and implement a graphics processing unit (GPU)-based parallelization of evolutionary induction of decision trees. We apply a Compute Unified Device Architecture programming model, which supports general-purpose computation on a GPU (GPGPU). The selection and genetic operators are performed sequentially on a CPU, while the evaluation process for the individuals in the population is parallelized. The data-parallel approach is applied, and thus, the parts of a dataset are spread over GPU cores. Each core processes the assigned chunk of the data. Finally, the results from all GPU cores are merged and the sought tree metrics are sent to the CPU. Computational performance of the proposed approach is validated experimentally on artificial and real-life datasets. A comparison with the traditional CPU version shows that evolutionary induction of decision trees supported by GPGPU can be accelerated significantly (even up to 800 times) and allows for processing of much larger datasets.
Applied Soft Computing | 2016
Marcin Czajkowski; Marek Kretowski
Graphical abstractDisplay Omitted HighlightsWe investigate the role of regression tree representation.A new EA for decision tree induction with heterogeneous representation is studied.Inducer that is capable to self-adapt to the analyzed data is proposed. A regression tree is a type of decision tree that can be applied to solve regression problems. One of its characteristics is that it may have at least four different node representations; internal nodes can be associated with univariate or oblique tests, whereas the leaves can be linked with simple constant predictions or multivariate regression models. The objective of this paper is to demonstrate the impact of particular representations on the induced decision trees. As it is difficult if not impossible to choose the best representation for a particular problem in advance, the issue is investigated using a new evolutionary algorithm for the decision tree induction with a structure that can self-adapt to the currently analyzed data. The proposed solution allows different leaves and internal nodes representation within a single tree. Experiments performed using artificial and real-life datasets show the importance of tree representation in terms of error minimization and tree size. In addition, the presented solution managed to outperform popular tree inducers with defined homogeneous representations.
SIDE'12 Proceedings of the 2012 international conference on Swarm and Evolutionary Computation | 2012
Marcin Czajkowski; Marek Kretowski
Memetic algorithms are popular approaches to improve pure evolutionary methods. But were and when in the system the local search should be applied and does it really speed up evolutionary search is a still an open question. In this paper we investigate the influence of the memetic extensions on globally induced regression and model trees. These evolutionary induced trees in contrast to the typical top-down approaches globally search for the best tree structure, tests at internal nodes and models at the leaves. Specialized genetic operators together with local greedy search extensions allow to the efficient tree evolution. Fitness function is based on the Bayesian information criterion and mitigate the over-fitting problem. The proposed method is experimentally validated on synthetical and real-life datasets and preliminary results show that to some extent memetic approach successfully improve evolutionary induction.
international syposium on methodologies for intelligent systems | 2011
Marcin Czajkowski; Marek Kretowski
In the paper we present a new evolutionary algorithm for induction of regression trees. In contrast to the typical top-down approaches it globally searches for the best tree structure, tests at internal nodes and models at the leaves. The general structure of proposed solution follows a framework of evolutionary algorithms with an unstructured population and a generational selection. Specialized genetic operators efficiently evolve regression trees with multivariate linear models. Bayesian information criterion as a fitness function mitigate the over-fitting problem. The preliminary experimental validation is promising as the resulting trees are less complex with at least comparable performance to the classical top-down counterpart.
international conference on artificial intelligence and soft computing | 2015
Marcin Czajkowski; Krzysztof Jurczuk; Marek Kretowski
One of the important and still not fully addressed issues in evolving decision trees is the induction time, especially for large datasets. In this paper, the authors propose a parallel implementation for Global Decision Tree system that combines shared memory (OpenMP) and message passing (MPI) paradigms to improve the speed of evolutionary induction of decision tree. The proposed solution is based on the classical master-slave model. The population is evenly distributed to available nodes and cores, and the time consuming operations like fitness evaluation and genetic operators are executed in parallel on slaves. Only the selection is performed on the master node. Efficiency and scalability of the proposed implementation is validated experimentally on artificial datasets. It shows noticeable speedup and possibility to efficiently process large datasets.
International Journal of Data Mining, Modelling and Management | 2013
Marcin Czajkowski; Marek Kretowski
Most tree-based algorithms are typical top-down approaches that search only for locally optimal decisions at each node and does not guarantee the globally optimal solution. In this paper, we would like to propose a new evolutionary algorithm for global induction of univariate regression trees and model trees that associate leaves with simple linear regression models. The general structure of our solution follows a typical framework of evolutionary algorithms with an unstructured population and a generational selection. We propose specialised genetic operators to mutate and cross-over individuals (trees), fitness function that base on the Bayesian information criterion and smoothing process that improves the prediction accuracy of the model tree. Performed experiments on 15 real-life datasets show that proposed solution can be significantly less complex with at least comparable performance to the classical top-down counterparts.