Sushmita Paul | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sushmita Paul is active.

Explore More

Publication

Featured researches published by Sushmita Paul.

International Journal of Approximate Reasoning | 2011

Rough set based maximum relevance-maximum significance criterion and Gene selection from microarray data

Pradipta Maji; Sushmita Paul

Among the large amount of genes presented in microarray gene expression data, only a small fraction of them is effective for performing a certain diagnostic test. In this regard, a new feature selection algorithm is presented based on rough set theory. It selects a set of genes from microarray data by maximizing the relevance and significance of the selected genes. A theoretical analysis is presented to justify the use of both relevance and significance criteria for selecting a reduced gene set with high predictive accuracy. The importance of rough set theory for computing both relevance and significance of the genes is also established. The performance of the proposed algorithm, along with a comparison with other related methods, is studied using the predictive accuracy of K-nearest neighbor rule and support vector machine on five cancer and two arthritis microarray data sets. Among seven data sets, the proposed algorithm attains 100% predictive accuracy for three cancer and two arthritis data sets, while the rough set based two existing algorithms attain this accuracy only for one cancer data set.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2013

Rough-Fuzzy Clustering for Grouping Functionally Similar Genes from Microarray Data

Pradipta Maji; Sushmita Paul

Gene expression data clustering is one of the important tasks of functional genomics as it provides a powerful tool for studying functional relationships of genes in a biological process. Identifying coexpressed groups of genes represents the basic challenge in gene clustering problem. In this regard, a gene clustering algorithm, termed as robust rough-fuzzy c-means, is proposed judiciously integrating the merits of rough sets and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in cluster definition, the integration of probabilistic and possibilistic memberships of fuzzy sets enables efficient handling of overlapping partitions in noisy environment. The concept of possibilistic lower bound and probabilistic boundary of a cluster, introduced in robust rough-fuzzy c-means, enables efficient selection of gene clusters. An efficient method is proposed to select initial prototypes of different gene clusters, which enables the proposed c-means algorithm to converge to an optimum or near optimum solutions and helps to discover coexpressed gene clusters. The effectiveness of the algorithm, along with a comparison with other algorithms, is demonstrated both qualitatively and quantitatively on 14 yeast microarray data sets.

systems man and cybernetics | 2010

Rough Sets for Selection of Molecular Descriptors to Predict Biological Activity of Molecules

Pradipta Maji; Sushmita Paul

Quantitative structure activity relationship (QSAR) is one of the important disciplines of computer-aided drug design that deals with the predictive modeling of properties of a molecule. In general, each QSAR dataset is small in size with large number of features or descriptors. Among the large amount of descriptors presented in the QSAR dataset, only a small fraction of them is effective for performing the predictive modeling task. In this paper, a new feature selection algorithm is presented, based on rough set theory, to select a set of effective molecular descriptors from a given QSAR dataset. The proposed algorithm selects the set of molecular descriptors by maximizing both relevance and significance of the descriptors. An important finding is that the proposed feature selection algorithm is shown to be effective in selecting relevant and significant molecular descriptors from the QSAR dataset for predictive modeling. The performance of the proposed algorithm is studied using R2 statistic of support vector regression method. The effectiveness of the proposed algorithm, along with a comparison with existing algorithms, is demonstrated on three QSAR datasets.

bioinformatics and biomedicine | 2011

Microarray Time-Series Data Clustering Using Rough-Fuzzy C-Means Algorithm

Pradipta Maji; Sushmita Paul

Clustering is one of the important analysis in functional genomics that discovers groups of co-expressed genes from micro array data. In this paper, the application of rough-fuzzy c-means (RFCM)algorithm is presented to discover co-expressed gene clusters. One of the major issues of the RFCM based micro array data clustering is how to select initial prototypes of different clusters. To overcome this limitation, a method is proposed to select initial cluster centers. It enables the RFCM algorithm to converge to an optimum or near optimum solutions and helps to discover co-expressed gene clusters. A method is also introduced based on Dunns cluster validity index to identify optimum values of different parameters of the initialization method and the RFCM algorithm. The effectiveness of the RFCM algorithm, along with a comparison with other related methods, is demonstrated on five yeast gene expression time-series data sets using Silhouette index, Davies-Bould in index, and gene ontology based analysis.

Fundamenta Informaticae | 2013

Robust Rough-Fuzzy C-Means Algorithm: Design and Applications in Coding and Non-coding RNA Expression Data Clustering

Pradipta Maji; Sushmita Paul

Cluster analysis is a technique that divides a given data set into a set of clusters in such a way that two objects from the same cluster are as similar as possible and the objects from different clusters are as dissimilar as possible. In this background, different rough-fuzzy clustering algorithms have been shown to be successful for finding overlapping and vaguely defined clusters. However, the crisp lower approximation of a cluster in existing rough-fuzzy clustering algorithms is usually assumed to be spherical in shape, which restricts to find arbitrary shapes of clusters. In this regard, this paper presents a new rough-fuzzy clustering algorithm, termed as robust rough-fuzzy c-means. Each cluster in the proposed clustering algorithm is represented by a set of three parameters, namely, cluster prototype, a possibilistic fuzzy lower approximation, and a probabilistic fuzzy boundary. The possibilistic lower approximation helps in discovering clusters of various shapes. The cluster prototype depends on the weighting average of the possibilistic lower approximation and probabilistic boundary. The proposed algorithm is robust in the sense that it can find overlapping and vaguely defined clusters with arbitrary shapes in noisy environment. An efficient method is presented, based on Pearsons correlation coefficient, to select initial prototypes of different clusters. A method is also introduced based on cluster validity index to identify optimum values of different parameters of the initialization method and the proposed clustering algorithm. The effectiveness of the proposed algorithm, along with a comparison with other clustering algorithms, is demonstrated on synthetic as well as coding and non-coding RNA expression data sets using some cluster validity indices.

Information Sciences | 2017

RelSim: An integrated method to identify disease genes using gene expression profiles and PPIN based similarity measure

Pradipta Maji; Ekta Shah; Sushmita Paul

Abstract One of the important problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new gene selection algorithm, termed as RelSim, to identify disease genes. It integrates judiciously the information of gene expression profiles and protein-protein interaction networks. A new similarity measure is introduced to compute the functional similarity between two genes. It is based on the information of protein-protein interaction networks. The new similarity measure offers an efficient way to calculate the functional similarity between two genes. The proposed algorithm selects a set of genes as disease genes, considering both microarray and protein-protein interaction data, by maximizing the relevance and functional similarity of the selected genes. While gene expression profiles are used to identify differentially expressed genes, the protein-protein interaction networks help to compute the functional similarity among genes. The performance of the proposed algorithm, along with a comparison with other related methods, is demonstrated on several colon cancer data sets.

Scientific Reports | 2016

Model-based genotype-phenotype mapping used to investigate gene signatures of immune sensitivity and resistance in melanoma micrometastasis

Guido Santos; Xin Lai; Martin Eberhardt; Florian S. Dreyer; Sushmita Paul; Gerold Schuler; Julio Vera

In this paper, we combine kinetic modelling and patient gene expression data analysis to elucidate biological mechanisms by which melanoma becomes resistant to the immune system and to immunotherapy. To this end, we systematically perturbed the parameters in a kinetic model and performed a mathematical analysis of their impact, thereby obtaining signatures associated with the emergence of phenotypes of melanoma immune sensitivity and resistance. Our phenotypic signatures were compared with published clinical data on pretreatment tumor gene expression in patients subjected to immunotherapy against metastatic melanoma. To this end, the differentially expressed genes were annotated with standard gene ontology terms and aggregated into metagenes. Our method sheds light on putative mechanisms by which melanoma may develop immunoresistance. Precisely, our results and the clinical data point to the existence of a signature of intermediate expression levels for genes related to antigen presentation that constitutes an intriguing resistance mechanism, whereby micrometastases are able to minimize the combined anti-tumor activity of complementary responses mediated by cytotoxic T cells and natural killer cells, respectively. Finally, we computationally explored the efficacy of cytokines used as low-dose co-adjuvants for the therapeutic anticancer vaccine to overcome tumor immunoresistance.

BMC Bioinformatics | 2013

μHEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix

Sushmita Paul; Pradipta Maji

BackgroundThe miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples.ResultsIn this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results.ConclusionsAn important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

international conference on multimedia computing and systems | 2010

Rough set based gene selection algorithm for microarray sample classification

Sushmita Paul; Pradipta Maji

Gene selection from microarray data is an important issue for gene expression based classification and to carry out a diagnostic test. In this regard, a rough set based gene selection algorithm is presented. It selects the set of genes by maximizing the relevance and significance of the genes, which are calculated based on the theory of rough sets. Using the predictive accuracy of K-nearest neighbor rule and support vector machine, the performance of the proposed algorithm, along with a comparison with other related methods is studied on five cancer and two arthritis microarray data sets. Promising performance was achieved by the proposed gene selection algorithm with relevant and significant genes from microarray data set in a reasonable time.

Natural Computing | 2016

Gene expression and protein---protein interaction data for identification of colon cancer related genes using f-information measures

Sushmita Paul; Pradipta Maji

Abstract One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene expression profiles and shortest path analysis of protein–protein interaction networks. While the

Explore More