Jakramate Bootkrajang

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jakramate Bootkrajang is active.

Explore More

Publication

Featured researches published by Jakramate Bootkrajang.

european conference on machine learning | 2012

Label-Noise robust logistic regression and its applications

Jakramate Bootkrajang; Ata Kabán

The classical problem of learning a classifier relies on a set of labelled examples, without ever questioning the correctness of the provided label assignments. However, there is an increasing realisation that labelling errors are not uncommon in real situations. In this paper we consider a label-noise robust version of the logistic regression and multinomial logistic regression classifiers and develop the following contributions: (i) We derive efficient multiplicative updates to estimate the label flipping probabilities, and we give a proof of convergence for our algorithm. (ii) We develop a novel sparsity-promoting regularisation approach which allows us to tackle challenging high dimensional noisy settings. (iii) Finally, we throughly evaluate the performance of our approach in synthetic experiments and we demonstrate several real applications including gene expression analysis, class topology discovery and learning from crowdsourcing data.

Bioinformatics | 2013

Classification of mislabelled microarrays using robust sparse logistic regression

Jakramate Bootkrajang; Ata Kabán

MOTIVATION Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. RESULTS In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. AVAILABILITY The code is available from http://cs.bham.ac.uk/∼jxb008. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Evolutionary Computation | 2016

Toward large-scale continuous eda: A random matrix theory perspective

Ata Kabán; Jakramate Bootkrajang; Robert J. Durrant

Estimations of distribution algorithms (EDAs) are a major branch of evolutionary algorithms (EA) with some unique advantages in principle. They are able to take advantage of correlation structure to drive the search more efficiently, and they are able to provide insights about the structure of the search space. However, model building in high dimensions is extremely challenging, and as a result existing EDAs may become less attractive in large-scale problems because of the associated large computational requirements. Large-scale continuous global optimisation is key to many modern-day real-world problems. Scaling up EAs to large-scale problems has become one of the biggest challenges of the field. This paper pins down some fundamental roots of the problem and makes a start at developing a new and generic framework to yield effective and efficient EDA-type algorithms for large-scale continuous global optimisation problems. Our concept is to introduce an ensemble of random projections to low dimensions of the set of fittest search points as a basis for developing a new and generic divide-and-conquer methodology. Our ideas are rooted in the theory of random projections developed in theoretical computer science, and in developing and analysing our framework we exploit some recent results in nonasymptotic random matrix theory.

Neurocomputing | 2016

A generalised label noise model for classification in the presence of annotation errors

Jakramate Bootkrajang

Abstract Supervised learning from annotated data is becoming more challenging due to inherent imperfection of training labels. Previous studies of learning in the presence of label noise have been focused on label noise which occurs randomly, while the study of label noise that is influenced by input features, which is intuitively more realistic, is still lacking. In this paper, we propose a new, generalised label noise model which is able to withstand the negative effect of random label noise and a wide range of non-random label noises. Empirical studies using a battery of synthetic data and four real-world datasets with inherent annotation errors demonstrate that the proposed generalised label noise model improves, in terms of classification accuracy, upon existing label noise modelling approaches.

intelligent data engineering and automated learning | 2013

Learning a Label-Noise Robust Logistic Regression: Analysis and Experiments

Jakramate Bootkrajang; Ata Kabán

Label-noise robust logistic regression rLR is an extension of logistic regression that includes a model of random mislabelling. This paper attempts a theoretical analysis of rLR. By decomposing and interpreting the gradient of the likelihood objective of rLR as employed in gradient ascent optimisation, we get insights into the ability of the rLR learning algorithm to counteract the negative effect of mislabelling as a result of an intrinsic re-weighting mechanism. We also give an upper-bound on the error of rLR using Rademacher complexities.

Pattern Analysis and Applications | 2018

Towards instance-dependent label noise-tolerant classification: a probabilistic approach

Jakramate Bootkrajang; Jeerayut Chaijaruwanich

Learning from labelled data is becoming more and more challenging due to inherent imperfection of training labels. Existing label noise-tolerant learning machines were primarily designed to tackle class-conditional noise which occurs at random, independently from input instances. However, relatively less attention was given to a more general type of label noise which is influenced by input features. In this paper, we try to address the problem of learning a classifier in the presence of instance-dependent label noise by developing a novel label noise model which is expected to capture the variation of label noise rate within a class. This is accomplished by adopting a probability density function of a mixture of Gaussians to approximate the label flipping probabilities. Experimental results demonstrate the effectiveness of the proposed method over existing approaches.

International Journal on Document Analysis and Recognition | 2018

Recognition-based character segmentation for multi-level writing style

Papangkorn Inkeaw; Jakramate Bootkrajang; Phasit Charoenkwan; Sanparith Marukatat; Shinn-Ying Ho; Jeerayut Chaijaruwanich

Character segmentation is an important task in optical character recognition (OCR). The quality of any OCR system is highly dependent on character segmentation algorithm. Despite the availability of various character segmentation methods proposed to date, existing methods cannot satisfyingly segment characters belonging to some complex writing styles such as the Lanna Dhamma characters. In this paper, a new character segmentation method named graph partitioning-based character segmentation is proposed to address the problem. The proposed method can deal with multi-level writing style as well as touching and broken characters. It is considered as a generalization of existing approaches to multi-level writing style. The proposed method consists of three phases. In the first phase, a newly devised over-segmentation technique based on morphological skeleton is used to obtain redundant fragments of a word image. The fragments are then used to form a segmentation hypotheses graph. In the last phase, the hypotheses graph is partitioned into subgraphs each corresponding to a segmented character using the partitioning algorithm developed specifically for character segmentation purpose. Experimental results based on handwritten Lanna Dhamma characters datasets showed that the proposed method achieved high correct segmentation rate and outperformed existing methods for the Lanna Dhamma alphabet.

intelligent data engineering and automated learning | 2017

Predicting Physical Activities from Accelerometer Readings in Spherical Coordinate System

Kittikawin Lehsan; Jakramate Bootkrajang

Recent advances in mobile computing devices enable smartphone an ability to sense and collect various possibly useful data from a wide range of its sensors. Combining these data with current data mining and machine learning techniques yields interesting applications which were not conceivable in the past. One of the most interesting applications is user activities recognition accomplished by analysing information from an accelerometer. In this work, we present a novel framework for classifying physical activities namely, walking, jogging, push-up, squatting and sit-up using readings from mobile phone’s accelerometer. In contrast to the existing methods, our approach first converts the readings which are originally in Cartesian coordinate system into representations in spherical coordinate system prior to a classification step. Experimental results demonstrate that the activities involving rotational movements can be better differentiated by the spherical coordinate system.

international conference on asian digital libraries | 2016

Rule-Based Page Segmentation for Palm Leaf Manuscript on Color Image

Papangkorn Inkeaw; Jakramate Bootkrajang; Phasit Charoenkwan; Sanparith Marukatat; Shinn-Ying Ho; Jeerayut Chaijaruwanich

Palm leaf manuscripts are important source of history and ancient wisdom. Large number of manuscripts have been already digitized in the form of folio images. To extract useful information, an optical character recognition (OCR) is often considered to be the first step towards text mining. Unfortunately, folio images contain multiple unsegmented palm leaf images, making it difficult to manage in OCR process. This motivates us to propose a new page segmentation method for palm leaf manuscripts. This method consists of two main steps, first of which is the detection of objects in folio images using Connected Component Labeling method in a transformed L*a*b* color space. The second step is rule-based selection of objects as either palm leaf or not palm leaf. The experiments performed on 20 publicly available palm leaf manuscripts composed of 384 folio images demonstrated that the proposed method effectively segmented folio images into separate palm leaf images, with 99.86 % precision and 96.67 % recall scores.

simulated evolution and learning | 2010

An XML format for sharing evolutionary algorithm output and analysis

Dharani Punithan; Jerome Marhic; Kangil Kim; Jakramate Bootkrajang; Robert I. McKay; Naoki Mori

Analysis of artificial evolutionary systems uses post-processing to extract information from runs. Many effective methods have been developed, but format incompatibilities limit their adoption. We propose a solution combining XML and compression, which imposes modest overhead. We describe the steps to integrate our schema in existing systems and tools, demonstrating a realistic application. We measure the overhead relative to current methods, and discuss the extension of this approach into a community-wide standard representation.

Explore More