Sunanda Das | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sunanda Das is active.

Explore More

Publication

Featured researches published by Sunanda Das.

Knowledge Based Systems | 2017

Ensemble feature selection using bi-objective genetic algorithm

Asit Kumar Das; Sunanda Das; Arka Ghosh

An ensemble parallel processing bi-objective genetic algorithm based feature selection method is proposed.Rough set theory and Mutual information gain are used to select informative data removing the vague one.Parallel processing in genetic algorithm reduces time complexity.The method is compared with the existing state-of-the-art methods using suitable datasets.Classification accuracy and statistical measures outperforms that of other state-of-the-art methods. Feature selection problem in data mining is addressed here by proposing a bi-objective genetic algorithm based feature selection method. Boundary region analysis of rough set theory and multivariate mutual information of information theory are used as two objective functions in the proposed work, to select only precise and informative data from the data set. Data set is sampled with replacement strategy and the method is applied to determine non-dominated feature subsets from each sampled data set. Finally, ensemble of such bi-objective genetic algorithm based feature selectors is developed with the help of parallel implementations to produce much generalized feature subset. In fact, individual feature selector outputs are aggregated using a novel dominance based principle to produce final feature subset. Proposed work is validated using repository especially for feature selection datasets as well as on UCI machine learning repository datasets and the experimental results are compared with related state of art feature selection methods to show effectiveness of the proposed ensemble feature selection method.

Archive | 2015

An Approach Towards Most Cancerous Gene Selection from Microarray Data

Sunanda Das; Asit Kumar Das

Microarray gene dataset is often very high-dimensional which presents complicated problems, like the degradation of data accessing, data manipulating and query processing performance. Dimensionality reduction efficiently tackles this problem and benefited us to visualize the intrinsic properties hidden in the dataset. Therefore, Rough set theory (RST) has been used for selecting only the relevant attributes of the dataset, called reduct, sufficient to characterize the information system. The investigation has been carried out on the publicly available microarray dataset. The analysis revealed that Rough Set using the concepts of dependency among genes is able to extract the various dominant genes in term of reducts which play an important role in causing the disease. Experimental results show the effectiveness of the algorithm.

international conference on big data | 2017

Strength pareto evolutionary algorithm based gene subset selection

Swagatam Basu; Sunanda Das; Sujata Ghatak; Asit Kr. Das

Microarray gene expression data is voluminous and very few genes in the dataset are informative for disease analysis. Selecting those genes from the whole dataset is a very challenging task. There are many optimization techniques used by the researchers for gene subset selection but none of them provides global optimal solution for all gene datasets. In the paper, we have proposed a strength pareto evolutionary algorithm based gene subset selection technique to select the informative gene subset for analyzing and identifying the disease efficiently. It is a multi-objective optimization algorithm that provides a non-dominated pareto front exploring the search space to obtain an optimal gene subset. The external cluster validation index and number of genes in a sample are considered as two objective functions of the algorithm and based on this two objective functions the chromosomes in the population are evaluated and after the convergence of the algorithm, chromosomes in the non-dominated pareto front gives the important gene subset. The experimental result on selected gene subset proves the usefulness of the method.

International Journal of Biomedical Engineering and Technology | 2016

Gene selection and decision tree based classification for cancerous sample detection

Sunanda Das; Asit Kumar Das

Generally, gene expression data are of high-dimensional which cause degradation of the performance of gene data analysis for disease prediction. Therefore, it is a big issue for the traditional classifiers to perform well on high-dimensional microarray data where the number of genes far exceeds the number of samples. In the proposed work, initially, Pearsons correlation coefficient is computed between every pair of genes and based on these coefficients gene dependency set is formed. From every pair of gene dependencies in the gene dependency set, similarity coefficient is measured between two genes using Jaccard Coefficient and thus a gene similarity matrix is computed and a rank is set for each gene indicating its importance. The highest rank gene is considered as the core or the most important gene of the gene set. Next, a rough set theory-based quick reduct algorithm is applied to select only the most informative genes, called reduct, which are sufficient to fully characterise the overall class structure of the gene dataset for disease analysis. Finally, from the reduced gene set of all samples, a rule-based classifier, namely, decision tree is constructed which is applied to unknown samples to predict if it is a diseased or normal sample. Experimental results show the effectiveness of the algorithm.

RAIT | 2014

Selection of Graph-Based Features for Character Recognition Using Similarity Based Feature Dependency and Rough Set Theory

Sunanda Das; Suvra jyoti Choudhury; Asit Kumar Das; Jaya Sil

Recently, large amount of data is populated almost in every field, analysis of which is a challenging task in data mining community. Feature based character recognition is a well-known field of research where numerous features are used without analyzing their importance resulting lengthy recognition process. Feature selection plays an important role in character recognition problem which has not been explored. In the paper, the characters are represented by graphs and features of the graphs form feature vectors. A novel feature selection method has been proposed using the concepts of feature dependency and rough set theory to select only the features which are important for character recognition. Initially, feature dependency is measured based on correlation coefficients and similarity among the features are evaluated using feature dependency based on which the features are ranked. Rough set theory based quick reduct generation algorithm is applied for selecting the important features using feature ranking. The method is applied on character data set as well as on various benchmark data set and the experimental result is compared with well-defined dimension reduction techniques that demonstrates the effectiveness of the method.

Multi-Objective Optimization | 2018

A Comparative Study on Different Versions of Multi-Objective Genetic Algorithm for Simultaneous Gene Selection and Sample Categorization

Asit Kumar Das; Sunanda Das

Gene selection from microarray gene expression datasets and clustering of samples into different groups are important data mining tasks for disease identification. Selection of more interpretable genes from the gene expression dataset is an essential data-preprocessing task, which helps to study on cancer diseases. Gene selection during sample clustering is inherently a difficult task as there is no obvious criterion to guide the search. Simultaneous gene selection and sample clustering is a two-way data analysis technique which has recently gained attention in research area. The traditional clustering techniques are unable to handle noisy data properly. So, effective clustering algorithms are more desirable which can deal with the relevant and noise free data. Therefore, target genes selection before sample clustering is essential and of course effective if both the tasks are done simultaneously. In this chapter, optimal gene subset is selected and sample clustering is performed simultaneously using Multi-Objective Genetic Algorithm (MOGA). Different versions of MOGA are employed to choose the optimal gene subset, where natural number of optimal clusters of samples is automatically obtained at the end of the process. Non-dominated sorting genetic algorithm (NSGA), Strength pareto evolutionary algorithm (SPEA) and its modified version SPEA2 are applied for the purpose. The methods use nonlinear hybrid uniform cellular automata for generating initial population, tournament selection strategy, two-point crossover operation, and a suitable jumping gene mutation mechanism to maintain diversity in the population. It uses mutual correlation coefficient; internal and external cluster validation indices as objective functions to find out the non-dominated solutions. To measure the cluster validation indices, clustering algorithm is applied on data subset associated to chromosomes in the population to find out different clusters. After the convergence of genetic algorithm, the best solution from the non-dominated solutions is identified that provides the important genes and categorizes the samples into clusters. The experimental results express the correctness of the proposed simultaneous gene selection and sample categorization method. The goodness of optimality of the clusters obtained using different genetic algorithms is expressed by comparing various cluster validation indices.

International Journal of Rough Sets and Data Analysis (IJRSDA) | 2018

Probability Based Most Informative Gene Selection From Microarray Data

Sunanda Das; Asit Kumar Das

Microarraydatasetshaveawideapplication inbioinformatics research.Analysis tomeasure the expressionlevelofthousandsofgenesofthiskindofhigh-throughputdatacanhelpforfindingthe causeandsubsequenttreatmentofanydisease.Therearemanytechniquesingeneanalysistoextract biologicallyrelevantinformationfrominconsistentandambiguousdata.Inthispaper,theconceptsof functionaldependencyandclosureofanattributeofdatabasetechnologyareusedforfindingthemost importantsetofgenesforcancerdetection.Firstly,themethodcomputessimilarityfactorbetween eachpairofgenes.Basedonthesimilarityfactorsasetofgenedependencyisformedfromwhich closuresetisobtained.Subsequently,conditionalprobabilitybasedinterestingnessmeasurementsare usedtodeterminethemostinformativegenefordiseaseclassification.Theproposedmethodisapplied onsomepubliclyavailablecancerousgeneexpressiondataset.Theresultshowstheeffectiveness androbustnessofthealgorithm. KeywoRDS Important Gene Set, Most Informative Gene Selection, Probability Factor, Similarity Based Gene Dependency

computer and information technology | 2016

Simultaneous Feature Selection and Cluster Analysis Using Genetic Algorithm

Sunanda Das; Shreya Chaudhuri; Sujata Ghatak; Asit Kumar Das

Cluster analysis being one of the important techniques of data mining applied in several fields such as bioinformatics, social networks, computer vision, and so on. It is an unsupervised learning technique for exploring the structure of the data without class label. Many clustering algorithms have been proposed to analyze high volume of data, but very few of them evaluate the quality of the clusters due to irrelevant and inconsistent features present in the dataset. So, feature selection is an important pre-processing step in data analysis mainly for high dimensional dataset. In the paper, we select optimal subset of features and perform clusters analysis simultaneously using genetic algorithm. Basically, genetic algorithm is used to select the optimal subset of features which automatically finds optimal number of clusters sat the end of the process. Optimality of the clusters is measured by calculating various cluster validation indices. The overall performance of the method is investigated on popular UCI datasets and the experimental results are compared with Fuzzy C-Means algorithm to demonstrate effectiveness of the proposed method.

Archive | 2016

Sample Classification Based on Gene Subset Selection

Sunanda Das; Asit Kumar Das

Microarray datasets contain genetic information of patients analysis of which can reveal new findings about the cause and subsequent treatment of any disease. With an objective to extract biologically relevant information from the datasets, many techniques are used in gene analysis. In the paper, the concepts like functional dependency and closure of an attribute of database technology are applied to find the most important gene subset and based on which the samples of the gene datasets are classified as normal and disease samples. The gene dependency is defined as the number of genes dependent on a particular gene using gene similarity measurement on collected samples. The closure of a gene is computed using gene dependency set which helps to know how many genes are logically implied by it. Finally, the minimum number of genes whose closure logically implies all the genes in the dataset is selected for sample classification.

swarm evolutionary and memetic computing | 2014

A Neighbourhood Based Hybrid Genetic Search Model for Feature Selection

Sunanda Das; Arka Ghosh; Asit Kumar Das

The paper presents a hybrid genetic search model (HGSM) with novel neighbourhood based uniform local search to select the subset of salient features removing redundant information from the universe of discourse. The method uses least square regression error as the fitness function for selecting the most feasible set of features from a large number of feature set. Proposed work is validated using our simulated character dataset and some real world datasets available in UCI Machine learning repository and performance comparison of proposed method with some other state of art feature selection methods are provided.

Explore More