Abhay Kumar Alok
Indian Institute of Technology Patna
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Abhay Kumar Alok.
Expert Systems With Applications | 2016
Sriparna Saha; Abhay Kumar Alok; Asif Ekbal
Here multiple centers are used to represent a partitioning.We have assumed that for 10% data points actual class labels are known.Used objective functions are based on internal and external cluster validity indices.AMOSA is used to optimize these three objective functions.Three different mutation operators are used to obtain global Pareto front. The objective of brain image segmentation is to partition the brain images into different non-overlapping homogeneous regions representing the different anatomical structures. Magnetic resonance brain image segmentation has large number of applications in diagnosis of neurological disorders like Alzheimer diseases, Parkinson related syndrome etc. But automatically segmenting the MR brain image is not an easy task. To solve this problem, several unsupervised and supervised based classification techniques have been developed in the literature. But supervised classification techniques are more time consuming and cost-sensitive due to the requirement of sufficient labeled data. In contrast, unsupervised classification techniques work without using any prior information but it suffers from the local trap problems. So, to overcome the problems associated with unsupervised and supervised classification techniques, we have proposed a new semi-supervised clustering technique using the concepts of multiobjective optimization and applied this technique for automatic segmentation of MR brain images in the intensity space. Multiple centers are used to encode a cluster in the form of a string. The proposed clustering technique utilizes intensity values of the brain pixels as the features. Additionally it also assumes that the actual class label information of 10% points of a particular image data set is also known. Three cluster validity indices are utilized as the objective functions, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. First two cluster validity indices are symmetry distance based Sym-index and Euclidean distance based I-index, which are based on unsupervised properties. Last one is a supervised information based cluster validity index, Minkowski Index. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on several simulated MR normal brain images and MR brain images having some multiple sclerosis lesions. The performance of the proposed semi-supervised clustering technique is compared with some other popular image segmentation techniques like Fuzzy C-means, Expectation Maximization and some recent image clustering techniques like multi-objective based MCMOClust technique, and Fuzzy-VGAPS clustering techniques.
Applied Intelligence | 2015
Abhay Kumar Alok; Sriparna Saha; Asif Ekbal
Semi-supervised clustering techniques have been proposed in the literature to overcome the problems associated with unsupervised and supervised classification. It considers a small amount of labeled data and the whole data distribution during the process of clustering a data. In this paper, a new approach towards semi-supervised clustering is implemented using multiobjective optimization (MOO) framework. Four objective functions are optimized using the search capability of a multiobjective simulated annealing based technique, AMOSA. These objective functions are based on some unsupervised and supervised information. First three objective functions represent, respectively, the goodness of the partitioning in terms of Euclidean distance, total symmetry present in the clusters and the cluster connectedness. For the last objective function, we have considered different external cluster validity indices, including adjusted rand index, rand index, a newly developed min-max distance based MMI index, NMMI index and Minkowski Score. Results show that the proposed semi-supervised clustering technique can effectively detect the appropriate number of clusters as well as the appropriate partitioning from the data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Twenty four artificial and five real-life data sets have been used in the evaluation. We develop five different versions of Semi-GenClustMOO clustering technique by varying the external cluster validity indices. Obtained partitioning results are compared with another recently developed multiobjective semi-supervised clustering technique, Mock-Semi. At the end of the paper the effectiveness of the proposed Semi-GenClustMOO clustering technique is shown in segmenting one remote sensing satellite image on the part from the city of Kolkata.
international conference hybrid intelligent systems | 2012
Sriparna Saha; Asif Ekbal; Abhay Kumar Alok
Semi-supervised clustering uses the information of unsupervised and supervised learning to overcome the problems associated with them. Extracted information are given in the form of class labels and data distribution during clustering process. In this paper the problem of semi-supervised clustering is formulated under the framework of multiobjective optimization (MOO). Thereafter, a multiobjective based clustering technique is extended to solve the semi-supervised clustering problem. The newly developed semi-supervised multiobjective clustering algorithm (Semi-GenClustMOO), is used for appropriate partitioning of data into appropriate number of clusters. Four objective functions are optimized, out of which first three use some unsupervised information and the last one uses supervised information. These four objective functions represent, respectively, the, total compactness of the partitioning, total symmetry present in the clusters, cluster connectedness and Adjust Rand Index. These four objective functions are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method. Results show that it can easily detect the appropriate number of clusters as well as the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. Seven artificial and four real-life data sets have been used for evaluation to show the effectiveness of the Semi-GenClustMOO technique. In each case class information of 10% randomly chosen data point is known to us 1.
International Journal of Machine Learning and Cybernetics | 2017
Abhay Kumar Alok; Sriparna Saha; Asif Ekbal
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
international conference hybrid intelligent systems | 2012
Abhay Kumar Alok; Sriparna Saha; Asif Ekbal
Evaluating a given clustering result is a very difficult problem in real world. Cluster validity indices are developed for this purpose. There are two different types of cluster validity indices available : External and Internal. External cluster validity indices utilize some supervised information and internal cluster validity indices utilize the intrinsic structure of the data. In this paper a new external cluster validity index, MMI has been implemented based on Max-Min distance among data points and prior information based on structure of the data. A new probabilistic approach has been implemented to find the correct correspondence between the true and obtained clustering. Genetic K-means algorithm (GAK-means) and single linkage have been used as the underlying clustering techniques. Results of the proposed index for identifying the appropriate number of clusters is shown for five artificial and two real-life data sets. GAK-means and single linkage clustering techniques are used as the underlying partitioning techniques with the number of clusters varied over a range. The MMI index is then used to determine the appropriate number of clusters. The performance of MMI is compared with existing external cluster validity indices, adjusted rand index (ARI) and rand index (RI). It works well for two class and multi class data sets.
IEEE Journal of Biomedical and Health Informatics | 2016
Sriparna Saha; Abhay Kumar Alok; Asif Ekbal
Studying the patterns hidden in gene-expression data helps to understand the functionality of genes. In general, clustering techniques are widely used for the identification of natural partitionings from the gene expression data. In order to put constraints on dimensionality, feature selection is the key issue because not all features are important from clustering point of view. Moreover some limited amount of supervised information can help to fine tune the obtained clustering solution. In this paper, the problem of simultaneous feature selection and semisupervised clustering is formulated as a multiobjective optimization (MOO) task. A modern simulated annealing-based MOO technique namely AMOSA is utilized as the background optimization methodology. Here, features and cluster centers are represented in the form of a string and the assignment of genes to different clusters is done using a point symmetry-based distance. Six optimization criteria based on several internal and external cluster validity indices are utilized. In order to generate the supervised information, a popular clustering technique, Fuzzy C-mean, is utilized. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. The effectiveness of this proposed semisupervised clustering technique, Semi-FeaClustMOO, is demonstrated on five publicly available benchmark gene-expression datasets. Comparison results with the existing techniques for gene-expression data clustering again reveal the superiority of the proposed technique. Statistical and biological significance tests have also been carried out.
SpringerPlus | 2014
Sriparna Saha; Asif Ekbal; Abhay Kumar Alok; Rachamadugu Spandana
In this paper we have coupled feature selection problem with semi-supervised clustering. Semi-supervised clustering utilizes the information of unsupervised and supervised learning in order to overcome the problems related to them. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. In this paper we have solved the problem of automatic feature selection and semi-supervised clustering using multiobjective optimization. A recently created simulated annealing based multiobjective optimization technique titled archived multiobjective simulated annealing (AMOSA) is used as the underlying optimization technique. Here features and cluster centers are encoded in the form of a string. We assume that for each data set for 10% data points class level information are known to us. Two internal cluster validity indices reflecting different data properties, an external cluster validity index measuring the similarity between the obtained partitioning and the true labelling for 10% data points and a measure counting the number of features present in a particular string are optimized using the search capability of AMOSA. AMOSA is utilized to detect the appropriate subset of features, appropriate number of clusters as well as the appropriate partitioning from any given data set. The effectiveness of the proposed semi-supervised feature selection technique as compared to the existing techniques is shown for seven real-life data sets of varying complexities.
international conference on industrial and information systems | 2014
Abhay Kumar Alok; Sriparna Saha; Asif Ekbal
Classifying the pixels of satellite images into homogeneous regions is a very challenging task as different regions have different types of land covers. Some land covers contain large regions, while some contain relatively smaller regions (eg. bridges, roads). In satellite image segmentation, no prior information is available about the number of clusters. Here, in this paper, we have solved this problem using the concepts of semi-supervised clustering which utilizes the property of unsupervised and supervised classification. Three cluster validity indices are utilized, which are simultaneously optimized using AMOSA, a modern multiobjective optimization technique based on the concepts of simulated annealing. First two cluster validity indices are symmetry distance based Sym-index and Euclidean distance based I-index, which are based on unsupervised properties. Last one is a supervised information based cluster validity index, Minkowski Index. For supervised information, initially Fuzzy C-mean clustering technique is used. Thereafter, based on the highest membership values of the data points with respect to different clusters, randomly 10% data points with their class labels are chosen. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on one Indian satellite image data set.
international conference on industrial and information systems | 2014
Abhay Kumar Alok; Neha Kanekar; Sriparna Saha; Asif Ekbal
In this paper, the problem of simultaneous feature selection and automatic clustering is formulated as a multi-objective optimization task. Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the highly complex biological networks some sophisticated techniques are required to study available data consisting of large number of measurements. In general clustering techniques are used to identify natural partitioning and detect some interesting patterns from the given data as a first step of studying the gene expression data. But in general all the features present in the data set may not be important for clustering purpose. Thus appropriate selection of features from the set of all features is very much relevant from clustering point of view. A modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Here features and cluster centers are represented in the form of a string. Three optimization criteria are utilized: i) a function representing the total compactness of the partitioning based on the Euclidean distance, ii) a function representing the total compactness of the partitioning based on the point symmetry based distance and iii) a function counting the number of features. The objective is to optimize values of cluster validity indices where as to increase the number of features in order to remove the bias of internal cluster validity indices on dimensionality. Appropriate subset of features, proper number of clusters and the proper partitioning are determined using the search capability of AMOSA. In order to assign cluster label to all points, a recently introduced distance, namely point symmetry based distance, is utilized. Thus the effectiveness of this proposed Fea-GenClustMOO technique is shown for automatically clustering publicly available gene-expression data sets. Results are compared with existing techniques for gene expression data clustering.
international conference on signal processing | 2015
Abhay Kumar Alok; Sriparna Saha; Asif Ekbal; Neha Kanekar
In this paper, a new multiobjective optimization based technique is developed for simultaneous feature selection and semi-supervised clustering. Thereafter the proposed technique is applied for solving the problem of classifying gene expression data. Here a modern simulated annealing based multiobjective optimization technique namely AMOSA is utilized as the background optimization methodology. Features and cluster centers are represented in the form of a string. Based on the available features and the cluster centers, genes belonging to different clusters are assigned based on point symmetry distance. Four objective functions are simultaneously optimized by AMOSA to obtain the appropriate partitioning. First two cluster validity indices are symmetry distance based Sym-index and the Euclidean distance based XB-index, which are based on some unsupervised properties. Third one is a supervised information based cluster validity index, Minkowski index and last one is a function counting the number of features. For generating the supervised information, initially Fuzzy C-mean clustering technique is applied on the given gene expression data set. Thereafter based on the highest membership values of the data points to their respective clusters, randomly 10% data points with their class labels are chosen for measuring external validity index, MS Index. The proposed technique is applied on some publicly available gene-expression data sets. Results are compared with the existing techniques of gene expression data clustering.