Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Sudipta Acharya is active.

Publication


Featured researches published by Sudipta Acharya.


IEEE Journal of Biomedical and Health Informatics | 2016

Multiobjective Simulated Annealing-Based Clustering of Tissue Samples for Cancer Diagnosis

Sudipta Acharya; Sriparna Saha; Yamini Thadisina

In the field of pattern recognition, the study of the gene expression profiles of different tissue samples over different experimental conditions has become feasible with the arrival of microarray-based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of microarray technology. In this paper, we have presented a multiobjective optimization (MOO)-based clustering technique utilizing archived multiobjective simulated annealing(AMOSA) as the underlying optimization strategy for classification of tissue samples from cancer datasets. The presented clustering technique is evaluated for three open source benchmark cancer datasets [Brain tumor dataset, Adult Malignancy, and Small Round Blood Cell Tumors (SRBCT)]. In order to evaluate the quality or goodness of produced clusters, two cluster quality measures viz, adjusted rand index and classification accuracy (% CoA) are calculated. Comparative results of the presented clustering algorithm with ten state-of-the-art existing clustering techniques are shown for three benchmark datasets. Also, we have conducted a statistical significance test called t-test to prove the superiority of our presented MOO-based clustering technique over other clustering techniques. Moreover, significant gene markers have been identified and demonstrated visually from the clustering solutions obtained. In the field of cancer subtype prediction, this study can have important impact.


Molecular BioSystems | 2016

Importance of proximity measures in clustering of cancer and miRNA datasets: proposal of an automated framework

Sudipta Acharya; Sriparna Saha

Distance plays an important role in the clustering process for allocating data points to different clusters. Several distance or proximity measures have been developed and reported in the literature to determine dissimilarities between two given points. The choice of distance measure depends on a particular domain as well as different data sets of the same domain. It is important to automatically determine the appropriate distance measure which acts best for a particular data set. In this study we have developed an automatic clustering technique using the search capability of multiobjective optimization which can automatically determine the relevant distance measure and the corresponding partitioning from a given data set. Our proposed automated framework is generic in nature i.e., any number of different distance measures can be incorporated into it. In our work we have used four existing widely used distance measures, i.e., Euclidean, line symmetry, point symmetry and city block distance to be explored for each data set. In order to measure the richness of an obtained partitioning using a particular distance, four cluster validity indices, the Silhouette index, the DB index, the adjusted rand index and classification accuracy are used. A new encoding strategy which can encode the set of cluster centers and the particular distance function is used to represent the problem. The appropriate distance function and the corresponding partitioning are determined using the search capability of a multiobjective optimization based technique. The efficiency of the proposed technique is shown on clustering three microRNA and three microarray gene expression data sets having varying complexities. The results show the usefulness of the proposed automated approach.


advances in computing and communications | 2014

Multi-objective clustering of tissue samples for cancer diagnosis

Sudipta Acharya; Yamini Thadisina; Sriparna Saha

In the field of pattern recognition, the study of the gene expression profiles for different tissue samples over different experimental conditions has became feasible with the arrival of micro-array based technology. In cancer research, classification of tissue samples is necessary for cancer diagnosis, which can be done with the help of micro-array technology. In this article we have presented a multi-objective optimization ( MOO ) based clustering technique utilizing AMOSA ( Archived Multi-Objective Simulated Annealing ) as the underlying optimization strategy for classification of tissue samples from cancer data sets. As objective functions three cluster validity indices namely, XB, PBM, and FCM indices are optimized simultaneously to form more accurate clusters of tissue samples. The presented clustering technique is evaluated for two open source benchmark cancer data sets, which are Brain tumor data set and Adult Malignancy data set. In order to evaluate the quality or goodness of produced clusters two cluster quality measures viz, Adjusted Rand Index ( ARI ) and Classification Accuracy ( %CoA ) are calculated for each data set. Comparative results of the presented clustering algorithm with 10 state-of-the-art existing single-objective, multi-objective based clustering algorithms are shown for two benchmark data sets.


BMC Bioinformatics | 2017

Unsupervised gene selection using biological knowledge : application in sample clustering

Sudipta Acharya; Sriparna Saha; N. Nikhil

BackgroundClassification of biological samples of gene expression data is a basic building block in solving several problems in the field of bioinformatics like cancer and other disease diagnosis and making a proper treatment plan. One big challenge in sample classification is handling large dimensional and redundant gene expression data. To reduce the complexity of handling this high dimensional data, gene/feature selection plays a major role.ResultsThe current paper explores the use of biological knowledge acquired from Gene Ontology database in selecting the proper subset of genes which can further participate in clustering of samples. The proposed feature selection technique is unsupervised in nature as it does not utilize any class label information in the process of gene selection. At the end, a multi-objective clustering approach is deployed to cluster the available set of samples in the reduced gene space.ConclusionsReported results show that consideration of biological knowledge in gene selection technique not only reduces the feature space dimensionality in great extent but also improves the accuracy of sample classification. The obtained reduced gene space is validated using strong biological significance tests. In order to prove the supremacy of our proposed gene selection based sample clustering technique, a thorough comparative analysis has also been performed with state-of-the-art techniques.


soft computing | 2016

Use of line based symmetry for developing cluster validity indices

Sudipta Acharya; Sriparna Saha; Sanghamitra Bandyopadhyay

From a dataset automatically identifying possible count of clusters is an important task of unsupervised classification. To address this issue, in the current paper, we have focused on the symmetry property of any cluster. Point and line symmetry are two important attributes of data partitions. Here we have proposed line symmetry versions of eight well-known validity indices: XB, PBM, FCM, PS, FS, K, SV, and DB indices to make them capable of identifying the accurate count of partitions from data sets containing clusters having line symmetric property. The global optimality of two of these newly developed indices is established mathematically. Eight artificially generated data sets of varying dimensions containing clusters of different convexities and shapes and three real-life data sets are used for the purpose of experiment. Initially, to obtain different partitions an existing genetic clustering technique which uses line symmetry property (GALS clustering) is applied on data sets varying the count of clusters. queryPlease check and confirm the edit in the following sentence: We have also provided a comparative study of our proposed line-symmetry-based cluster validity indices with their point-symmetry-based versions and original versions based on Euclidean distance. We have also provided a comparative study of our proposed line-symmetry-based cluster validity indices with their point-symmetry-based versions and original versions based on Euclidean distance. From the experimental results it is revealed that most of the line-symmetry-distance-based cluster validity indices perform better than their point symmetry and Euclidean-distance-based versions.


international conference on pattern recognition | 2016

Automatic generation of biclusters from gene expression data using multi-objective simulated annealing approach

Pracheta Sahoo; Sudipta Acharya; Sriparna Saha

The invention of microarray technology aids in the successful monitoring of the gene expression patterns. Biclustering is a method in which a number of co-regulated genes are identified over subset of conditions. Our aim is to detect all the non trivial biclusters having low mean squared residue(MSR) and high row variance. In this paper, we have proposed a multi-objective simulated annealing based solution framework to solve the biclustering problem from gene expression data sets. Two objective functions MSR and row-variance capturing two important properties of biclusters are optimized in parallel using the search capability of multi-objective simulated annealing based optimization technique, AMOSA. A new encoding strategy and several different search operators are defined for fast convergence of the algorithm. We have done experiment on two real-life data sets and obtained results are quantified by using several cluster validity indices. We have compared our obtained results with some state-of-the-art biclustering techniques.


international conference on information technology | 2014

Identifying Co-expressed miRNAs using Multiobjective Optimization

Sudipta Acharya; Sriparna Saha

The micro RNAs or miRNAs are short non-coding RNAs, which are capable in regulating gene expression in post-transcriptional level. A huge volume of data is generated by expression profiling of miRNAs. From various studies it has been proved that a large proportion of miRNAs tend to form clusters on chromosome. So, in this article we are proposing a multi-objective optimization based clustering algorithm for extraction of relevant information from expression data of miRNA. The proposed method integrates the ability of point symmetry based distance and existing Multi-objective optimization based clustering technique-AMOSA to identify co-regulated or co-expressed miRNA clusters. The superiority of our proposed approach by comparing it with other state-of-the-art clustering methods, is demonstrated on two publicly available miRNA expression data sets using Davies-Bouldin index - an external cluster validity index.


Gene | 2018

Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering

Sudipta Acharya; Sriparna Saha; Prasanna Pradhan

In recent years DNA microarray technology, leading to the generation of high-volume biological data, has gained significant attention. To analyze this high volume gene-expression data, one such powerful tool is Clustering. For any clustering algorithm, its efficiency majorly depends upon the underlying similarity/dissimilarity measure. During the analysis of such data often there is a need to further explore the similarity of genes not only with respect to their expression values but also with respect to their functional annotations, which can be obtained from Gene Ontology (GO) databases. In the existing literature, several novel clustering and bi-clustering approaches were proposed to identify co-regulated genes from gene-expression datasets. Identifying co-regulated genes from gene expression data misses some important biological information about functionalities of genes, which is necessary to identify semantically related genes. In this paper, we have proposed sixteen different semantic gene-gene dissimilarity measures utilizing biological information of genes retrieved from a global biological database namely Gene Ontology (GO). Four proximity measures, viz. Euclidean, Cosine, point symmetry and line symmetry are utilized along with four different representations of gene-GO-term annotation vectors to develop total sixteen gene-gene dissimilarity measures. In order to illustrate the profitability of developed dissimilarity measures, some multi-objective as well as single-objective clustering algorithms are applied utilizing proposed measures to identify functionally similar genes from Mouse genome and Yeast datasets. Furthermore, we have compared the performance of our proposed sixteen dissimilarity measures with three existing state-of-the-art semantic similarity and distance measures.


Expert Systems With Applications | 2018

Fusion of stability and multi-objective optimization for solving cancer tissue classification problem

Sayantan Mitra; Sriparna Saha; Sudipta Acharya

Abstract The concept of stability is one of the commonly used physical phenomena. Current paper builds on the hypothesis that the optimal number of clusters present in the dataset corresponds to that partitioning which is most stable over some small changes in the dataset. In order to quantify the degree of stability, a new measure is also proposed in the paper. Thereafter an expert clustering approach is developed in the current paper which utilizes the properties of stability for automatically detecting the number of clusters from a given dataset. Initially, several different variants of the dataset are generated by introducing small perturbations. A multi-objective based expert clustering framework is developed to automatically partition different variants of the data. A new objective function, capturing stability property of clustering solution namely ‘Agreement-index’, along with two well-known objective functions are optimized simultaneously using a multi-objective simulated annealing based process, namely AMOSA for the purpose of clustering. Finally, the problem of cancer classification is addressed as the application domain of the proposed expert framework. Results of our newly developed stability based clustering namely Stab-clustering with respect to existing approaches are shown for twelve microarray cancer datasets in terms of different cluster quality measures. The obtained results confirm the robustness of our proposed technique over state-of-the-art. A thorough biological and statistical significance tests are also conducted to prove the effectiveness of the proposed approach.


applications of natural language to data bases | 2016

Multi-objective Word Sense Induction Using Content and Interlink Connections

Sudipta Acharya; Asif Ekbal; Sriparna Saha; Prabhakaran Santhanam; Jose G. Moreno; Gaël Dias

In this paper, we propose a multi-objective optimization based clustering approach to address the word sense induction problem by leveraging the advantages of document-content and their structures in the Web. Recent works attempt to tackle this problem from the perspective of content analysis framework. However, in this paper, we show that contents and hyperlinks existing in the Web are important and complementary sources of information. Our strategy is based on the adaptation of a simulated annealing algorithm to take into account second-order similarity measures as well as structural information obtained with a pageRank based similarity kernel. Exhaustive results on the benchmark datasets show that our proposed approach attains better accuracy compared to the content based or hyperlink strategy encouraging the combination of these sources.

Collaboration


Dive into the Sudipta Acharya's collaboration.

Top Co-Authors

Avatar

Sriparna Saha

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Asif Ekbal

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Pracheta Sahoo

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yamini Thadisina

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Abhay Kumar Alok

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Kavya K

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

Kuldeep Kaushik

Indian Institute of Technology Patna

View shared research outputs
Top Co-Authors

Avatar

N. Nikhil

Indian Institute of Technology Ropar

View shared research outputs
Researchain Logo
Decentralizing Knowledge