Topon Kumar Paul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Topon Kumar Paul is active.

Explore More

Publication

Featured researches published by Topon Kumar Paul.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009

Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data

Topon Kumar Paul; Hitoshi Iba

In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.

genetic and evolutionary computation conference | 2006

Identification of weak motifs in multiple biological sequences using genetic algorithm

Topon Kumar Paul; Hitoshi Iba

Recognition of motifs in multiple unaligned sequences provides an insight into protein structure and function. The task of discovering these motifs is very challenging because most of these motifs exist in different sequences in different mutated forms of the original consensus motif and thus have weakly conserved regions. Different score metrics and algorithms have been proposed for motif recognition. In this paper, we propose a new genetic algorithm based method for identification of multiple motifs instances in multiple biological sequences. The experimental results on simulated and real data show that our algorithm can identify multiple occurrences of a weak motif in single sequences as well as in multiple sequences. Moreover, it can identify weakly conserved regions more accurately than other genetic algorithm based motif discovery methods.

congress on evolutionary computation | 2004

Selection of the most useful subset of genes for gene expression-based classification

Topon Kumar Paul; Hitoshi Iba

Recently, there has been a growing interest in classification of patient samples based on gene expressions. Here the classification task is made more difficult by the noisy nature of the data, and by the overwhelming number of genes relative to the number of available training samples in the data set. Moreover, many of these genes are irrelevant for classification and have negative effect on the accuracy and on the required learning time for the classifier. We propose a new evolutionary computation method to select the most useful subset of genes for molecular classification. We apply this method to three benchmark data sets and present our unbiased experimental results.

genetic and evolutionary computation conference | 2003

Reinforcement learning estimation of distribution algorithm

Topon Kumar Paul; Hitoshi Iba

This paper proposes an algorithm for combinatorial optimizations that uses reinforcement learning and estimation of joint probability distribution of promising solutions to generate a new population of solutions. We call it Reinforcement Learning Estimation of Distribution Algorithm (RELEDA). For the estimation of the joint probability distribution we consider each variable as univariate. Then we update the probability of each variable by applying reinforcement learning method. Though we consider variables independent of one another, the proposed method can solve problems of highly correlated variables. To compare the efficiency of our proposed algorithm with other Estimation of Distribution Algorithms (EDAs) we provide the experimental results of the two problems: four peaks problem and bipolar function.

Archive | 2009

Applied Genetic Programming and Machine Learning

Hitoshi Iba; Yoshihiko Hasegawa; Topon Kumar Paul

Reflecting rapidly developing concepts and newly emerging paradigms in intelligent machines, this text is the first to integrate genetic programming and machine learning techniques to solve diverse real-world tasks.These tasks include financial data prediction, day-trading rule development; and bio-marker selection. Written by a leading authority, this text will teach readers how to use machine learning techniques, make learning operators that efficiently sample a search space, navigate the search process through the design of objective fitness functions, and examine the search performance of the evolutionary system. All source codes and GUIs are available for download from the authors website.

genetic and evolutionary computation conference | 2004

Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm

Topon Kumar Paul; Hitoshi Iba

DNA microarray allows the monitoring and measurement of the expression levels of thousands of genes simultaneously in an organism. A systematic and computational analysis of this vast amount of data provides understanding and insight into many aspects of biological processes. Recently, there has been a growing interest in classification of patient samples based on these gene expressions. The main challenge here is the overwhelming number of genes relative to the number of available training samples in the data set, and many of these genes are irrelevant for classification and have negative effect on the accuracy of the classifier. The choice of genes affects several aspects of classification: accuracy, required learning time, cost, and number of training samples needed. In this paper, we propose a new Probabilistic Model Building Genetic Algorithm (PMBGA) for the identification of informative genes for molecular classification and present our unbiased experimental results on three bench-mark data sets.

ieee international conference on evolutionary computation | 2006

Classification of Gene Expression Data by Majority Voting Genetic Programming Classifier

Topon Kumar Paul; Yoshihiko Hasegawa; Hitoshi Iba

Recently, genetic programming (GP) has been applied to the classification of gene expression data. In its typical implementation, using training data, a single rule or a single set of rules is evolved with GP, and then it is applied to test data to get generalized test accuracy. However, in most cases, the generalized test accuracy is not higher. In this paper, we propose a majority voting technique for prediction of the labels of test samples. Instead of a single rule or a single set of rules, we evolve multiple rules with GP and then apply those rules to test samples to determine their labels by using the majority voting technique. We demonstrate the effectiveness of our proposed method by performing different types of experiments on two microarray data sets.

international conference on intelligent transportation systems | 2014

Operation and Charging Scheduling of Electric Buses in a City Bus Route Network

Topon Kumar Paul; Hisashi Yamada

To promote wide-spread adoption of Electric Vehicles (EVs), various government organizations are providing subsidies for installation of charging infrastructure in various countries and encouraging the adoption of Electric Buses (EV Buses) for city transportation. Though EV Buses are environment-friendly, the buses as well as the charging facilities including quick chargers and rechargeable Stationary Storage Batteries (SSBs) are expensive. Therefore, it is expected that in near future some of the city buses of a route will be replaced with EV Buses, and SSBs along with grid power will be used to charge these buses. In that situation, proper operation and charging scheduling of the EV Buses will be needed. In this paper, we propose a k-Greedy Algorithm-based approach for this task. We perform simulations by using the real bus diagram data of four bus routes of a city bus route network in Japan and evaluate the algorithm from various perspectives. Our simulation results suggest that the k-Greedy Algorithm can maximize the travel distance of the EV Buses, which contributes to reducing fuel cost and CO2 emission for a bus operator.

international conference on machine learning and applications | 2008

Prioritizing Health Promotion Plans with k-Bayesian Network Classifier

Ken Ueno; Toshio Hayashi; Koichiro Iwata; Nobuyoshi Honda; Youichi Kitahara; Topon Kumar Paul

Recently, Bayesian network classifiers (BNCs) have attracted many researchers because they can produce classification models with dependencies among attributes. From the application viewpoint, however, BNCs sometimes produce models too complicated to interpret easily. In this paper, we propose k-Bayesian network classifier (k-BNC), which is a new method to reconstruct the attribute-dependency relationship from data for health promotion planning. From the health promotion viewpoint, it would be highly advantageous if occupational physicians could make effective plans for employees, and if employees could carry out the plans easily. Therefore, we focus on the attribute dependencies in classification models represented as a directed acyclic graph (DAG), and find the effective attributes by measuring the standardized Kullback-Leibler divergence from parent attributes to their children. In experimental evaluation, we firstly compare the accuracy of k-BNC with that of Naive Bayes Classifiers, and other wellknown Bayesian Networks and structure learning methods (k2 algorithm etc.) on some public datasets. We show that our proposed k-BNC method successfully produces classification models for the prioritization of health promotion plans on our health checkup data.

simulated evolution and learning | 2008

Genetic Algorithm Based Methods for Identification of Health Risk Factors Aimed at Preventing Metabolic Syndrome

Topon Kumar Paul; Ken Ueno; Koichiro Iwata; Toshio Hayashi; Nobuyoshi Honda

In recent years, metabolic syndrome has emerged as a major health concern because it increases the risk of developing lifestyle diseases, such as diabetes, hypertension, and cardiovascular disease. Some of the symptoms of the metabolic syndrome are high blood pressure, decreased HDL cholesterol, and elevated triglycerides (TG). To prevent the developing of metabolic syndrome, accurate prediction of the future values of these health risk factors and identification of other factors from the health checkup and lifestyle data, which are highly related with these risk factors, are very important. In this paper, we propose a new framework, based on genetic algorithm and its variants, for identifying those important health factors and predicting the future health risk of a person with high accuracy. We show the effectiveness of the proposed system by applying it to the health checkup and lifestyle data of Toshiba Corporation.

Explore More