Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Shib Sankar Bhowmick is active.

Publication


Featured researches published by Shib Sankar Bhowmick.


Journal of Chemical Information and Modeling | 2015

Binding Activity Prediction of Cyclin-Dependent Inhibitors.

Indrajit Saha; Benedykt Rak; Shib Sankar Bhowmick; Ujjwal Maulik; Debotosh Bhattacharjee; Uwe Koch; Michal Lazniewski; Dariusz Plewczynski

The Cyclin-Dependent Kinases (CDKs) are the core components coordinating eukaryotic cell division cycle. Generally the crystal structure of CDKs provides information on possible molecular mechanisms of ligand binding. However, reliable and robust estimation of ligand binding activity has been a challenging task in drug design. In this regard, various machine learning techniques, such as Support Vector Machine, Naive Bayesian classifier, Decision Tree, and K-Nearest Neighbor classifier, have been used. The performance of these heterogeneous classification techniques depends on proper selection of features from the data set. This fact motivated us to propose an integrated classification technique using Genetic Algorithm (GA), Rotational Feature Selection (RFS) scheme, and Ensemble of Machine Learning methods, named as the Genetic Algorithm integrated Rotational Ensemble based classification technique, for the prediction of ligand binding activity of CDKs. This technique can automatically find the important features and the ensemble size. For this purpose, GA encodes the features and ensemble size in a chromosome as a binary string. Such encoded features are then used to create diverse sets of training points using RFS in order to train the machine learning method multiple times. The RFS scheme works on Principal Component Analysis (PCA) to preserve the variability information of the rotational nonoverlapping subsets of original data. Thereafter, the testing points are fed to the different instances of trained machine learning method in order to produce the ensemble result. Here accuracy is computed as a final result after 10-fold cross validation, which also used as an objective function for GA to maximize. The effectiveness of the proposed classification technique has been demonstrated quantitatively and visually in comparison with different machine learning methods for 16 ligand binding CDK docking and rescoring data sets. In addition, the best possible features have been reported for CDK docking and rescoring data sets separately. Finally, the Friedman test has been conducted to judge the statistical significance of the results produced by the proposed technique. The results indicate that the integrated classification technique has high relevance in predicting of protein-ligand binding activity.


pattern recognition and machine intelligence | 2015

MaER: A New Ensemble Based Multiclass Classifier for Binding Activity Prediction of HLA Class II Proteins

Giovanni Mazzocco; Shib Sankar Bhowmick; Indrajit Saha; Ujjwal Maulik; Debotosh Bhattacharjee; Dariusz Plewczynski

Human Leukocyte Antigen class II (HLA II) proteins are crucial for the activation of adaptive immune response. In HLA class II molecules, high rate of polymorphisms has been observed. Hence, the accurate prediction of HLA II-peptide interactions is a challenging task that can both improve the understanding of immunological processes and facilitate decision-making in vaccine design. In this regard, during the last decade various computational tools have been developed, which were mainly focused on the binding activity prediction of different HLA II isotypes (such as DP, DQ and DR) separately. This fact motivated us to make a humble contribution towards the prediction of isotypes binding propensity as a multiclass classification task. In this regard, we have analysed a binding affinity dataset, which contains the interactions of 27 HLA II proteins with 636 variable length peptides, in order to prepare new multiclass datasets for strong and weak binding peptides. Thereafter, a new ensemble based multiclass classifier, called Meta EnsembleR (MaER) is proposed to predict the activity of weak/unknown binding peptides, by integrating the results of various heterogeneous classifiers. It pre-processes the training and testing datasets by making feature subsets, bootstrap samples and creates diverse datasets using principle component analysis, which are then used to train and test the MaER. The performance of MaER with respect to other existing state-of-the-art classifiers, has been estimated using validity measures, ROC curves and gain value analysis. Finally, a statistical test called Friedman test has been conducted to judge the statistical significance of the results produced by MaER.


biomedical engineering systems and technologies | 2014

Application of RotaSVM for HLA Class II Protein-Peptide Interaction Prediction

Shib Sankar Bhowmick; Indrajit Saha; Giovanni Mazzocco; Ujjwal Maulik; Luís Rato; Debotosh Bhattacharjee; Dariusz Plewczynski

In this article, the recently developed RotaSVM is used for accurate prediction of binding peptides to Human Leukocyte Antigens class II (HLA class II) proteins. The HLA II - peptide complexes are generated in the antigen presenting cells (APC) and transported to the cell membrane to elicit an immune response via T-cell activation. The understanding of HLA class II protein-peptide binding interaction facilitates the design of peptide-based vaccine, where the high rate of polymorphisms in HLA class II molecules poses a big challenge. To determine the binding activity of 636 non-redundant peptides, a set of 27 HLA class II proteins are considered in the present study. The prediction of HLA class II - peptide binding is carried out by an ensemble classifier called RotaSVM. In RotaSVM, the feature selection scheme generates bootstrap samples that are further used to create a diverse set of features using Principal Component Analysis. Thereafter, Support Vector Machines are trained with these bootstrap samples with the integration of their original feature values. The effectiveness of the RotaSVM for HLA class II protein-peptide binding prediction is demonstrated in comparison with other traditional classifiers by evaluating several validity measures with the visual plot of ROC curves. Finally, Friedman test is conducted to judge the statistical significance of RotaSVM in prediction of peptides binding to HLA class II proteins.


international conference on recent advances in information technology | 2016

Identification of miRNA signature using Next-Generation Sequencing data of prostate cancer

Shib Sankar Bhowmick; Indrajit Saha; Ujjwal Maulik; Debotosh Bhattacharjee

MicroRNAs (miRNAs) are a class of ~22-nucleotide endogenous noncoding RNAs which have critical functions across various biological processes. It is quite well-known that the miRNAs are playing a crucial role for regulating the expression of target gene via repressing translation or promoting messenger RNAs degradation. Therefore, identification of discriminative and differentially expressed miRNA as a signature is an important task for cancer therapy. In this regard, Next-Generation Sequencing (NGS) data of miRNAs, available at The Cancer Research Atlas (TCGA) repository, is analyzed here for prostate cancer. This cancer type is a serious threat to the health of men as found in the literature. Hence, finding miRNA signature using NGS based miRNA expression data for prostate cancer is an important research direction. Generally by motivating this fact, a new miRNA signature identification method for prostate cancer is proposed. The proposed method uses a global optimization technique, called Simulated Annealing (SA), Principal Component Analysis (PCA) and Support Vector Machine (SVM) classifier. Here SA encodes L number of features, in this case miRNAs. Similar number of top L key principal components of the original dataset is extracted using PCA. Thereafter, such components are multiplied with the reduced subset of data so that the classification task can be done on diverse dataset using SVM. Here the classification accuracy of SVM is considered as an underlying objective to optimize using SA. The proposed method can be seen as feature section technique in order to find potential miRNA signature. Finally, the experimental results provide a set of miRNAs with optimal classification accuracy. However, due to the stochastic nature of this algorithm a list of miRNAs is prepared. From the top 15 miRNAs of that list, four miRNAs, hsa-mir-152, hsa-mir-23a, hsa-mir-302f and hsa-mir-101-1, are associated with prostate cancer. Moreover, the performance of the proposed method has also been compared with other widely used state-of-the-art techniques. Furthermore, the obtained results have been justified by means of statistical test along with biological significance tests for the selected miRNAs.


advances in computing and communications | 2016

Biomarker identification using next generation sequencing data of RNA

Shib Sankar Bhowmick; Indrajit Saha; Ujjwal Maulik; Debotosh Bhattacharjee

Over the years, numerous studies have been performed in order to identify messenger RNAs (mRNAs) that are differentially expressed at different biological conditions for various diseases including cancer. In this regard, getting complete and noiseless data were always very challenging in previous technological set-up. While the inception of Next-Generation Sequencing (NGS) technology revolutionized the genome research, especially in the field of mRNA expression profile analysis. Here such data of breast cancer is used from The Cancer Genome Atlas (TCGA) to identify the cancer biomarkers. For this purpose, data have been preprocessed using statistical test and fold change concepts so that significant number of differentially expressed up and down regulated mRNAs can be recognized. Thereafter, wrapper based feature selection approach using Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) has been applied on such preprocessed dataset to identify the potential mRNAs as biomarkers. Identified top 10 biomarkers are COMP, LRRC15, CTHRC1, CILP2, FOXF1, FIGF, PRDM16, LMX1B, IRX5 and LEPREL1. The quantitative results of the proposed method have been demonstrated in comparison with other state-of-the-art methods. Finally, enrichment analysis and the KEGG pathway analysis have also been conducted for the selected mRNAs.


swarm evolutionary and memetic computing | 2015

Analysis of Next-Generation Sequencing Data of miRNA for the Prediction of Breast Cancer

Indrajit Saha; Shib Sankar Bhowmick; Filippo Geraci; Marco Pellegrini; Debotosh Bhattacharjee; Ujjwal Maulik; Dariusz Plewczynski

Recently, Next-Generation Sequencing (NGS) has emerged as revolutionary technique in the fields of ‘-omics’ research. The Cancer Research Atlas (TCGA) is a great example of it where massive amount of sequencing data is present for miRNA and mRNA. Analysing these data could bring out some potential biological insight. Moreover, developing a prognostic system based on this newly available sequencing data will give a greater help to cancer diagnosis. Hence, in this article, we have made an attempt to analyse such sequencing data of miRNA for accurate prediction of Breast Cancer. Generally miRNAs are small non-coding RNAs which are shown to participate in several carcinogenic processes either by tumor suppressors or oncogenes. This is the reason clinical treatment of the breast cancer patient has changed nowadays. Thus, it is interesting to understand the role of miRNAs for the prediction of breast cancer. In this regard, we have developed a technique using Gravitation Search Algorithm, which optimizes the underlying classification performance of Support Vector Machine. The proposed technique is able to select the potential features, in this case miRNAs, in order to achieve better prediction accuracy. In this study, we have achieved the classification accuracy upto 95.29 % by considering \({\simeq }\)1.5 % miRNAs of whole dataset automatically. Thereafter, a list of miRNAs is created after providing a rank. It is found from the list of top 15 miRNAs that 6 miRNAs are associated with the breast cancer while in others, 5 miRNAs are associated with different cancer types and 4 are unknown miRNAs. The performance of the proposed technique is compared with seven other state-of-the-art techniques. Finally, the results have been justified by the means of statistical test along with biological significance analysis of selected miRNAs.


congress on evolutionary computation | 2016

A new evolutionary microRNA marker selection using next-generation sequencing data

Adrian Lancucki; Indrajit Saha; Shib Sankar Bhowmick; Ujjwal Maulik; Piotr Lipinski

Next-generation sequencing allows high-throughput measurements of non-coding RNA expression levels in tissues. Analysis of microRNAs (miRNAs) is particularly effective in differentiation of cancerous tissue samples, based on patterns of their expression levels. The paper presents a wrapper feature selection approach based on t-Distributed Stochastic Neighbor Embedding (t-SNE), Covariance Matrix Adaptation Evolution Strategy (CMA-Es) and Support Vector Machine (SVM). The advantage of t-SNE is amplification of pairwise similarities by the means of t-Student neighborhood function. The attributes are embedded into 1-D space to reveal similarities between the features. Such information is used by CMA-ES through real-valued encoding in order to model pairwise relations between miRNAs with covariance matrices. Finally, the wrapper uses SVM to evaluate the objective, which expresses the tradeoff between classification quality and the desired number of features. The approach is tested on eight different cancer types from The Cancer Genome Atlas. It allows to find small sets of miRNAs to differentiate cancer types from a single tumor class to the normal one with high certainty.


Archive | 2013

RotaSVM: A New Ensemble Classifier

Shib Sankar Bhowmick; Indrajit Saha; Luís Rato; Debotosh Bhattacharjee

In this paper, an ensemble classifier, namely RotaSVM, is proposed that uses recently developed rotational feature selection approach and Support Vector Machine classifier cohesively. The RotaSVM generates the number of predefined outputs of Support Vector Machines. For each Support Vector Machine, the training data is generated by splitting the feature set randomly into \(\mathcal{S}\) subsets. Subsequently, principal component analysis is used for each subset to create new feature sets and all the principal components are retained to preserve the variability information in the training data. Thereafter, such features are used to train a Support Vector Machine. During the testing phase of RotaSVM, first the rotation specific Support Vector Machines are used to test and then average posterior probability is computed to classify sample data. The effectiveness of the RotaSVM is demonstrated quantitatively by comparing it with other widely used ensemble based classifiers such as Bagging, AdaBoost, MultiBoost and Rotation Forest for 10 real-life data sets. Finally, a statistical test has been conducted to establish the superiority of the result produced by proposed RotaSVM.


PLOS ONE | 2018

Genome-wide analysis of NGS data to compile cancer-specific panels of miRNA biomarkers

Shib Sankar Bhowmick; Indrajit Saha; Debotosh Bhattacharjee; Loredana M. Genovese; Filippo Geraci

MicroRNAs are small non-coding RNAs that influence gene expression by binding to the 3’ UTR of target mRNAs in order to repress protein synthesis. Soon after discovery, microRNA dysregulation has been associated to several pathologies. In particular, they have often been reported as differentially expressed in healthy and tumor samples. This fact suggested that microRNAs are likely to be good candidate biomarkers for cancer diagnosis and personalized medicine. With the advent of Next-Generation Sequencing (NGS), measuring the expression level of the whole miRNAome at once is now routine. Yet, the collaborative effort of sharing data opens to the possibility of population analyses. This context motivated us to perform an in-silico study to distill cancer-specific panels of microRNAs that can serve as biomarkers. We observed that the problem of finding biomarkers can be modeled as a two-class classification task where, given the miRNAomes of a population of healthy and cancerous samples, we want to find the subset of microRNAs that leads to the highest classification accuracy. We fulfill this task leveraging on a sensible combination of data mining tools. In particular, we used: differential evolution for candidate selection, component analysis to preserve the relationships among miRNAs, and SVM for sample classification. We identified 10 cancer-specific panels whose classification accuracy is always higher than 92%. These panels have a very little overlap suggesting that miRNAs are not only predictive of the onset of cancer, but can be used for classification purposes as well. We experimentally validated the contribution of each of the employed tools to the selection of discriminating miRNAs. Moreover, we tested the significance of each panel for the corresponding cancer type. In particular, enrichment analysis showed that the selected miRNAs are involved in oncogenesis pathways, while survival analysis proved that miRNAs can be used to evaluate cancer severity. Summarizing: results demonstrated that our method is able to produce cancer-specific panels that are promising candidates for a subsequent in vitro validation.


Archive | 2018

Finding the Association of mRNA and miRNA Using Next Generation Sequencing Data of Kidney Renal Cell Carcinoma

Shib Sankar Bhowmick; Luís Rato; Debotosh Bhattacharjee

MicroRNAs (miRNAs) are a class of 22-nucleotide endogenous noncoding RNAs, and plays an important role in regulating target gene expression via repressing translation or promoting messenger RNAs (mRNA) degradation. Numerous researchers have found that miRNAs have serious effects on cancer. Therefore, study of mRNAs and miRNAs together through the integrated analysis of mRNA and miRNA expression profiling could help us in getting a deeper insight into the cancer research. In this regards, high-throughput sequencing data of Kidney renal cell carcinoma is used here. The proposed method focuses on identifying mRNA-miRNA pair that has a signature in kidney tumor sample. For this analysis, random forests, particle swarm optimization, and support vector machine classifier is used to have best sets of mRNAs-miRNA pairs. Additionally, the significance of selected mRNA-miRNA pairs is tested using gene ontology and pathway analysis tools. Moreover, the selected mRNA-miRNA pairs are searched based on changes in expression values of the used mRNA and miRNA dataset.

Collaboration


Dive into the Shib Sankar Bhowmick's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Filippo Geraci

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge