Sujay Saha
Heritage Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sujay Saha.
Proceedings of the International Conference on Advances in Computer Science and Electronics Engineering | 2012
Sumit Chakraborty; Sujay Saha; Kashinath Dey
DNA microarray technology which is used in molecular biology, allows for the observation of expression levels of thousands of genes under a variety of conditions. The analysis of microarray data has been successfully applied in a number of studies over a broad range of biological disciplines. Now it is very unfortunate that various microarray experiments generate data sets containing missing values. Since most of the algorithms for gene expression analysis require a complete gene array as input, the missing values need to estimate. The methods exist for estimating missing values are like KNNimpute, SVDimpute, LLSimpute, LLS-SVDimpute etc. In this paper we present a new fuzzy technique Fuzzy Difference Vector Impute (FDVimpute) for estimating missing values in a DNA microarray.
ubiquitous computing | 2016
Dibyendu Bikash Seal; Sujay Saha; Prokriti Mukherjee; Mayukh Chatterjee; Aradhita Mukherjee; Kashi Nath Dey
The Cancer disease involves abnormal cell growth and has the potential to spread to other parts of the body. Today, technology has provided us with many methods to study the pattern of thousands of cancer gene expressions simultaneously. Often microarray gene expression data comprises of a huge number of genes and a very small number of samples or observations. Our task is to identify those genes that are most significant in the expression of a particular disease, in this case, cancer. In order to achieve that, it is useful to rank the genes. In this article, we propose a novel method for ranking genes using Relative Entropy and Decision Trees. Relative Entropy has been used to reduce the dimensionality of the microarray dataset and rank the genes. The final reduced set of genes is then used for classification using decision trees with 10 folds cross-validation. The proposed method has been applied on eight benchmark datasets, and results show that it can reach 70-100 % classification accuracy with a very few dominant genes.
advances in computing and communications | 2016
Sujay Saha; Saikat Bandopadhyay; Anupam Ghosh; Kashi Nath Dey
DNA microarray experiments normally generate gene expression profiles in the form of high dimensional matrices. It may happen that DNA microarray gene expression values contain many missing values within its data due to several reasons like image disruption, hybridization error, dust, moderate resolution etc. It will be very unfortunate if these missing values affect the performance of subsequent statistical and machine learning experiments significantly. There exist various missing value estimation algorithms. In this work we have proposed a modification to the existing imputation approach named as Collaborative Filtering Based on Rough-Set Theory (CFBRST) [10]. This proposed approach (CFBRSTFDV) uses Fuzzy Difference Vector (FDV) along with Rough Set based Collaborative Filtering that analyzes historical interactions and helps to estimate the missing values. This is a suggestion based system that works on the principle of how suggestion of items or products arrive to an individual while using FB, Twitter or looking for books in Amazon. We have applied our proposed algorithm on two benchmark dataset SPELLMAN & Tumor Cell (GDS2932) and the experiments show that the modified approach, CFBRSTFDV, outperforms the other existing state-of-the art methods as far as RMSE measures are concerned, particularly when we increase the number of missing values.
International Journal of Bioinformatics Research and Applications | 2016
Sujay Saha; Dibyendu Bikash Seal; Anupam Ghosh; Kashi Nath Dey
Over the last few decades, a large amount of research work has been carried on genomic data. The cancer disease make cells in specific tissues in the body undergo uncontrolled division which results in the malignant growth or tumour. Today, DNA microarray technologies allow us to simultaneously monitor the expression pattern of thousands of genes. Microarray gene expression data are characterised by a very high dimensionality genes, and a relatively small number of samples observations. If one wants to identify all those genes from these thousands of gene expressions which are responsible for the disease like cancer, then it is useful to rank the genes. In this paper, we have proposed a novel gene ranking method based on Wilcoxon Rank Sum Test and genetic algorithm. WRST has been used for reducing dimensionality and genetic algorithm for finding out those differentially expressed genes. The final subset of genes has been cross-validated using k fold LOOCV k varied for different dataset method and thereafter used for classification of data using SVM with linear kernel. At first the proposed method has been applied on two relatively new benchmark datasets, like GDS4382 colorectal cancer dataset and GDS4794 small cell lung cancer dataset and the results show that the proposed method can reach up to 100% classification accuracy with very few dominant genes, which indirectly validates the biological and statistical significance of the proposed method. After that it is also applied on five real-life datasets and the results are compared with one of the recent state of the art approach on the basis of % of Accuracy, Sensitivity, and Specificity etc.
Advances in Fuzzy Systems | 2016
Sujay Saha; Anupam Ghosh; Dibyendu Bikash Seal; Kashi Nath Dey
Most of the gene expression data analysis algorithms require the entire gene expression matrix without any missing values. Hence, it is necessary to devise methods which would impute missing data values accurately. There exist a number of imputation algorithms to estimate those missing values. This work starts with a microarray dataset containing multiple missing values. We first apply the modified version of the fuzzy theory based existing method LRFDVImpute to impute multiple missing values of time series gene expression data and then validate the result of imputation by genetic algorithm GA based gene ranking methodology along with some regular statistical validation techniques, like RMSE method. Gene ranking, as far as our knowledge, has not been used yet to validate the result of missing value estimation. Firstly, the proposed method has been tested on the very popular Spellman dataset and results show that error margins have been drastically reduced compared to some previous works, which indirectly validates the statistical significance of the proposed method. Then it has been applied on four other 2-class benchmark datasets, like Colorectal Cancer tumours dataset GDS4382, Breast Cancer dataset GSE349-350, Prostate Cancer dataset, and DLBCL-FL Leukaemia for both missing value estimation and ranking the genes, and the results show that the proposed method can reach 100% classification accuracy with very few dominant genes, which indirectly validates the biological significance of the proposed method.
2016 Second International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN) | 2016
Sujay Saha; Saikat Bandopadhyay; Anupam Ghosh; Kashi Nath Dey
DNA microarrays are normally used to measure the expression values of thousands of several genes simultaneously in the form of large matrices. This raw gene expression data may contain some missing cells. These missing values may affect the analysis performed subsequently on these gene expression data. Several imputation methods, like K-Nearest Neighbor Imputation (KNNImpute), Singular Value Decomposition Imputation (SVDImpute), Local Least Square Imputation (LLSImpute), Bayesian Principal Component Analysis (BPCAImpute) etc. have already been proposed to impute those missing values. In this work we have proposed an ensemble classifier based Artificial Neural Network implementation, ANNImpute, to enhance the accuracy of the missing value imputation technique by applying Two Layer Perceptron Learning algorithm. Ensemble classification is done on the parameters such as learning rate a, weight vector & bias. We have applied our algorithm on two benchmark datasets like SPELLMAN and Tumour (GDS2932) and the results show that this approach performs well compared to the other existing methods as far as RMSE measures are concerned.
advances in information technology | 2011
Sujay Saha; Arnab Kole; Kashinath Dey
A continuous version of particle swarm optimization (CPSO) is employed to solve uncapacitated facility location (UFL) problem which is one of the most widely studied in combinatorial optimization. The basic algorithm had already been published in the Research Article “A Discrete Particle Swarm Optimization Algorithm for Uncapacitated Facility Location Problem” [1]. But in addition to that, the algorithm is slightly modified here to get better result in a lesser time. To make a reasonable comparison, the same benchmark suites that are collected from OR-library [6] are applied here. In conclusion, the results showed that this modified CPSO algorithm is slightly better than the published CPSO algorithm.
international conference on bioinformatics and biomedical engineering | 2018
Sujay Saha; Sukriti Roy; Anupam Ghosh; Kashi Nath Dey
Logical interaction between every pair of genes in a gene interaction network affects the observable behavior of any organism. This genetic interaction helps us to identify pathways of associated genes for various diseases and also finds the level of interaction between the genes in the network. In this paper, at first we have used three correlation measures, like Pearson, Spearman and Kendall-Tau to find the interaction level in a gene interaction network. Rough set can also be used to find the level of interaction, as well as direction of interaction between every pair of genes. That’s why in the second phase of the experiment, entropy measure & Rough set theory are also used to determine the level of interaction between every pair of genes as well as finds the direction of interaction that indicates which gene regulates which other genes. Experiments are done on normal & diseased samples of Colorectal Cancer dataset (GDS4382) separately. At the end we try to find out those interactions responsible for this cancer disease to take place. To validate the experimental results biologically we compare it with interactions given in NCBI database.
Archive | 2018
Sujay Saha; Priyojit Das; Anupam Ghosh; Kashi Nath Dey
Genes need to be investigated either in Gene Interaction Network or in a DNA microarray gene expression data to understand the role they play in complex diseases like cancer. The prioritized genes can help us to know the molecular mechanism, as well as to discover the promising candidates of cancer. Several gene ranking algorithms already have been proposed that produces the top ranked genes according to their importance with respect to a particular disease. In this work, we have developed one Genetic Algorithm (GA) based algorithm, MicroarrayGA, to rank the genes responsible for a particular cancer to occur. The whole research works on six datasets like Colorectal Cancer, Diffuse Large B-Cell Lymphoma, Pediatric Immune Thrombocytopenia (ITP), Small Cell Lung Cancer (SCLC), Breast Cancer and Prostate Cancer, publicly available from NCBI (National Center for Biotechnology Information) online repository. We have validated the outcome of the proposed algorithm by classification step using Support Vector Machine (SVM) classifier and we have also compared the results of MicroarrayGA with three existing methods on the basis of percentage of accuracy, precision, recall, F1-Score and G-Mean metrics.
ubiquitous computing | 2016
Dibyendu Bikash Seal; Sujay Saha; Mayukh Chatterjee; Prokriti Mukherjee; Aradhita Mukherjee; Bipasha Mukhopadhyay; Sohini Mukherjee
Gene - Gene Interaction is a logical interaction between two genes that affects the observable behavior of one organism. This genetic interaction helps to identify pathways of associated genes for various diseases. In this paper we have used two metrics, like correlation & entropy to find the level of interaction between the genes applied on Gene Interaction networks. We have applied our algorithm on three benchmark cancer datasets Colorectal, Leukaemia and CML. Results show some weighted graphs, where the weights along each edge represents the level of interaction between two genes in a particular network.