Leon Bobrowski
Bialystok University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Leon Bobrowski.
Archive | 2011
Leon Bobrowski; Tomasz Łukaszuk
Feature selection is one of active research area in pattern recognition or data mining methods (Duda et al., 2001). The importance of feature selection methods becomes apparent in the context of rapidly growing amount of data collected in contemporary databases (Liu & Motoda, 2008). Feature subset selection procedures are aimed at neglecting as large as possible number of such features (measurements) which are irrelevant or redundant for a given problem. The feature subset resulting from feature selection procedure should allow to build a model on the base of available learning data sets that generalizes better to new (unseen) data. For the purpose of designing classification or prediction models, the feature subset selection procedures are expected to produce higher classification or prediction accuracy. Feature selection problem is particularly important and challenging in the case when the number of objects represented in a given database is low in comparison to the number of features which have been used to characterise these objects. Such situation appears typically in exploration of genomic data sets where the number of features can be thousands of times greater than the number of objects. Here we are considering the relaxed linear separability (RLS) method of feature subset selection (Bobrowski & Łukaszuk, 2009). Such approach to feature selection problem refers to the concept of linear separability of the learning sets (Bobrowski, 2008). The term “relaxation” means here deterioration of the linear separability due to the gradual neglect of selected features. The considered approach to feature selection is based on repetitive minimization of the convex and piecewise-linear (CPL) criterion functions. These CPL criterion functions, which have origins in the theory of neural networks, include the cost of various features (Bobrowski, 2005). Increasing the cost of individual features makes these features falling out of the feature subspace. Quality the reduced feature subspaces is assessed by the accuracy of the CPL optimal classifiers built in this subspace. The article contains a new theoretical and experimental results related to the RLS method of feature subset selection. The experimental results have been achieved through the analysis, inter alia, two sets of genetic data.
industrial conference on data mining | 2014
Leon Bobrowski
Data mining problems and tools are linked to the task of extracting important regularities (patterns) from multivariate data sets. In some cases, flat patterns can be located on vertexical planes in a multidimensional data space. Vertexical planes are linked to vertices in parameter space. Patterns located on vertexical planes can be discovered in large data sets through minimization of the convex and piecewise linear (CPL) criterion functions.
computer recognition systems | 2013
Leon Bobrowski; Magdalena Topczewska
Layers of binary classifiers can be used in transformation of data sets composed of multivariate feature vectors. A new representation of data sets is obtained this way that depends on parameters of the classifiers in the layer. By a special, data driven choice of these parameters the ranked layer can be designed. The ranked layer has a important property of data sets linearization. It means that the data sets become linearly separable after transformation by ranked layer. The ranked layer can be built, inter alia, from radial or nearest neighbors binary classifiers.
international conference on bioinformatics and biomedical engineering | 2017
Leon Bobrowski
Data mining technique based on minimization of the convex and piecewise linear (CPL) criterion functions can be used to extract collinear (flat) patterns from large, multidimensional data sets. Flat patterns consist of data vectors located on planes in a multidimensional feature space. Data subsets located on such planes can represent linear interactions between multiple variables (features). New method of collinear biclustering can also be developed through this technique.
IFAC Proceedings Volumes | 2012
Leon Bobrowski
Abstract Clustering algorithms are the basic tools for solving pattern recognition or data mining problems. The most popular iterative clustering algorithm is the K-means. The basic idea behind the K-means algorithm is to divide a given set of vectors into subsets around the central points (class prototypes). In the case of the K-lines clustering algorithms vectors are partitioned into K subsets by using the central lines in the n -dimensional feature space. The proposed K-lines clustering is based on minimizations of the CPL criterion functions.
PLOS ONE | 2014
Leon Bobrowski; Tomasz Łukaszuk; Bengt Lindholm; Peter Stenvinkel; Olof Heimbürger; Jonas Axelsson; Peter Bárány; Juan Jesus Carrero; Abdul Rashid Qureshi; Karin Luttropp; Malgorzata Debowska; Louise Nordfors; Martin Schalling; Jacek Waniewski
Identification of risk factors in patients with a particular disease can be analyzed in clinical data sets by using feature selection procedures of pattern recognition and data mining methods. The applicability of the relaxed linear separability (RLS) method of feature subset selection was checked for high-dimensional and mixed type (genetic and phenotypic) clinical data of patients with end-stage renal disease. The RLS method allowed for substantial reduction of the dimensionality through omitting redundant features while maintaining the linear separability of data sets of patients with high and low levels of an inflammatory biomarker. The synergy between genetic and phenotypic features in differentiation between these two subgroups was demonstrated.
international conference on data mining | 2012
Leon Bobrowski; Tomasz Łukaszuk
Designing linear prognostic models on the base of multivariate learning set with censored dependent variable is considered in the paper. The task of linear regression model designing has been reformulated here as a problem of testing the linear separability of two sets. The convex and piecewise linear (CPL) criterion functions are used here both for estimation of the model parameters and for the feature selection task. The feature selection is aimed on neglecting a possibly large amount of independent variables while improving resulting model quality. Particular attention is paid to modeling censored data used in survival analysis. Experiments with the use of the RLS method of gene subset selection in prognostic model selection with the censored dependent variable is also described in the paper.
international conference on computational collective intelligence | 2017
Leon Bobrowski
Aggregation of large data sets is one of the current topics of exploratory analysis and pattern recognition. Integration of data sets is a useful and necessary step towards knowledge extraction from large data sets. The possibility of separable integration of multidimensional data sets by one dimensional binary classifiers is analyzed in the paper, as well as designing a layer of binary classifiers for separable aggregation. The optimization problem of separable layer designing is formulated. A dipolar strategy aimed at optimizing separable aggregation of large data sets is proposed in the presented paper.
asian conference on intelligent information and database systems | 2016
Leon Bobrowski; Pawel Zabielski
Collinear (flat) pattern appears in a given set of multidimensional feature vectors when many of these vectors are located on (or near) some plane in the feature space. Flat pattern discovered in a given data set can give indications for creating a model of interaction between selected features. Patterns located on planes can be discovered even in large and multidimensional data sets through minimization of the convex and piecewise linear (CPL) criterion functions. Discovering flat patterns can be based on the search for degenerated vertices in the parameter space. The possibility of using learning algorithms for this purpose is examined in this paper.
International Conference on Intelligent Decision Technologies | 2016
Leon Bobrowski
Data mining algorithms are used for discovering general regularities based on the observed patterns in data sets. Flat (multicollinear) patterns can be observed in data sets when many feature vectors are located on a planes in the multidimensional feature space. Collinear patterns can be useful in modeling linear interactions between multiple variables (features) and can be used also in a decision support process. Flat patterns can be efficiently discovered in large, multivariate data sets through minimization of the convex and piecewise linear (CPL) criterion functions.