Oleg Okun
University of Oulu
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Oleg Okun.
international conference on artificial neural networks | 2003
Dick de Ridder; Olga Kouropteva; Oleg Okun; Matti Pietikäinen; Robert P. W. Duin
Locally linear embedding (LLE) is a recently proposed method for unsupervised nonlinear dimensionality reduction. It has a number of attractive features: it does not require an iterative algorithm, and just a few parameters need to be set. Two extensions of LLE to supervised feature extraction were independently proposed by the authors of this paper. Here, both methods are unified in a common framework and applied to a number of benchmark data sets. Results show that they perform very well on high-dimensional data which exhibits a manifold structure.
Pattern Recognition | 2005
Olga Kouropteva; Oleg Okun; Matti Pietikäinen
The locally linear embedding (LLE) algorithm belongs to a group of manifold learning methods that not only merely reduce data dimensionality, but also attempt to discover a true low dimensional structure of the data. In this paper, we propose an incremental version of LLE and experimentally demonstrate its advantages in terms of topology preservation. Also compared to the original (batch) LLE, the incremental LLE needs to solve a much smaller optimization problem.
iberian conference on pattern recognition and image analysis | 2003
Olga Kouropteva; Oleg Okun; Matti Pietikäinen
The dimensionality of the input data often far exceeds their intrinsic dimensionality. As a result, it may be difficult to recognize multidimensional data, especially if the number of samples in a dataset is not large. In addition, the more dimensions the data have, the longer the recognition time is. This leads to the necessity of performing dimensionality reduction before pattern recognition. Locally linear embedding (LLE) 5,6 is one of the methods intended for this task. In this paper, we investigate its extension, called supervised locally linear embedding (SLLE), using class labels of data points in their mapping into a low-dimensional space. An efficient eigendecomposition scheme for SLLE is derived. Two variants of SLLE are analyzed coupled with a k nearest neighbor classifier and tested on real-world images. Preliminary results demonstrate that both variants yield identical best accuracy, despite of being conceptually different.
Signal Processing | 2007
Oleg Okun; Helen Priisalu
We propose a data reduction method based on fuzzy clustering and nonnegative matrix factorisation. In contrast to different variants of data set editing typically used for data reduction, our method is completely unsupervised, i.e., it does not need class labels to eliminate examples from a data set. Thus, it is useful in exploratory data analysis when class labels of examples are unknown or unavailable in order to gain insight into structure of different groups of patterns. Also unlike many types of unsupervised clustering relating a single example (cluster centroid) to each cluster, our method associates a set of the most representative examples with each cluster. Hence, it makes cluster structure more transparent to a data analyst.
Archive | 2009
Oleg Okun; Giorgio Valentini
This book contains the extended papers presented at the 2nd Workshop on Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA) held on 21-22 July, 2008 in Patras, Greece, in conjunction with the 18th European Conference on Artificial Intelligence (ECAI2008). This workshop was a successor of the smaller event held in 2007 in conjunction with 3rd Iberian Conference on Pattern Recognition and Image Analysis, Girona, Spain. The success of that event as well as the publication of workshop papers in the edited book Supervised and Unsupervised Ensemble Methods and their Applications, published by Springer-Verlag in Studies in Computational Intelligence Series in volume 126, encouraged us to continue a good tradition. The purpose of this book is to support practitioners in various branches of science and technology to adopt the ensemble approach for their daily research work. We hope that fourteen chapters composing the book will be a good guide in the sea of numerous opportunities for ensemble methods.
EURASIP Journal on Advances in Signal Processing | 2006
Oleg Okun; Helen Priisalu
Linear and unsupervised dimensionality reduction via matrix factorization with nonnegativity constraints is studied. Because of these constraints, it stands apart from other linear dimensionality reduction methods. Here we explore nonnegative matrix factorization in combination with three nearest-neighbor classifiers for protein fold recognition. Since typically matrix factorization is iteratively done, convergence, can be slow. To speed up convergence, we perform feature scaling (normalization) prior to the beginning of iterations. This results in a significantly (more than 11 times) faster algorithm. Justification of why it happens is provided. Another modification of the standard nonnegative matrix factorization algorithm is concerned with combining two known techniques for mapping unseen data. This operation is typically necessary before classifying the data in low-dimensional space. Combining two mapping techniques can yield better accuracy than using either technique alone. The gains, however, depend on the state of the random number generator used for initialization of iterations, a classifier, and its parameters. In particular, when employing the best out of three classifiers and reducing the original dimensionality by around 30%, these gains can reach more than 4%, compared to the classification in the original, high-dimensional space.
international conference on document analysis and recognition | 2001
Matti Pietikäinen; Oleg Okun
Detection of text from documents in which text is embedded in complex colored and textured backgrounds is a very challenging problem. In this paper, we propose a simple texture-based approach based on edge information for this task. The performance of our method is compared to that obtained by a method based on the discrete cosine transform which was recently proposed by Y. Zhong et al. (2000) for text localization in compressed digital video. In our experiments, both methods performed about equally well for small-sized text, but our method was better in the case of large-sized text. The principal advantage of our approach is that in addition to the text detection problem, the same edge representation can also be used for other image interpretation tasks.
iberian conference on pattern recognition and image analysis | 2007
Oleg Okun; Helen Priisalu
Random forest is a collection (ensemble) of decision trees. It is a popular ensemble technique in pattern recognition. In this article, we apply random forest for cancer classification based on gene expression and address two issues that have been so far overlooked in other works. First, we demonstrate on two different real-world datasets that the performance of random forest is strongly influenced by dataset complexity. When estimated before running random forest, this complexity can serve as a useful performance indicator and it can explain a difference in performance on different datasets. Second, we show that one should rely with caution on feature importance used to rank genes: two forests, generated with the different number of features per node split, may have very similar classification errors on the same dataset, but the respective lists of genes ranked according to feature importance can be weakly correlated.
International Journal on Document Analysis and Recognition | 1999
Oleg Okun; Matti Pietikäinen; Jaakko J. Sauvola
Abstract. The existing skew estimation techniques usually assume that the input image is of high resolution and that the detectable angle range is limited. We present a more generic solution for this task that overcomes these restrictions. Our method is based on determination of the first eigenvector of the data covariance matrix. The solution comprises image resolution reduction, connected component analysis, component classification using a fuzzy approach, and skew estimation. Experiments on a large set of various document images and performance comparison with two Hough transform-based methods show a good accuracy and robustness for our method.
Artificial Intelligence in Medicine | 2009
Oleg Okun; Helen Priisalu
OBJECTIVE We explore the link between dataset complexity, determining how difficult a dataset is for classification, and classification performance defined by low-variance and low-biased bolstered resubstitution error made by k-nearest neighbor classifiers. METHODS AND MATERIAL Gene expression based cancer classification is used as the task in this study. Six gene expression datasets containing different types of cancer constitute test data. RESULTS Through extensive simulation coupled with the copula method for analysis of association in bivariate data, we show that dataset complexity and bolstered resubstitution error are associated in terms of dependence. As a result, we propose a new scheme for generating ensembles of classifiers that selects subsets of features of low complexity for ensemble members, which constitutes the accurate members according to the found dependence relation. CONCLUSION Experiments with six gene expression datasets demonstrate that our ensemble generating scheme based on the dependence of dataset complexity and classification error is superior to a single best classifier in the ensemble and to the traditional ensemble construction scheme that is ignorant of dataset complexity.