Is this you? Create Your Porfile

Sriparna Saha

Indian Institute of Technology Patna

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sriparna Saha is active.

Explore More

Publication

Featured researches published by Sriparna Saha.

IEEE Transactions on Evolutionary Computation | 2008

A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA

Sanghamitra Bandyopadhyay; Sriparna Saha; Ujjwal Maulik; Kalyanmoy Deb

This paper describes a simulated annealing based multiobjective optimization algorithm that incorporates the concept of archive in order to provide a set of tradeoff solutions for the problem under consideration. To determine the acceptance probability of a new solution vis-a-vis the current solution, an elaborate procedure is followed that takes into account the domination status of the new solution with the current solution, as well as those in the archive. A measure of the amount of domination between two solutions is also used for this purpose. A complexity analysis of the proposed algorithm is provided. An extensive comparative study of the proposed algorithm with two other existing and well-known multiobjective evolutionary algorithms (MOEAs) demonstrate the effectiveness of the former with respect to five existing performance measures, and several test problems of varying degrees of difficulty. In particular, the proposed algorithm is found to be significantly superior for many objective test problems (e.g., 4, 5, 10, and 15 objective problems), while recent studies have indicated that the Pareto ranking-based MOEAs perform poorly for such problems. In a part of the investigation, comparison of the real-coded version of the proposed algorithm is conducted with a very recent multiobjective simulated annealing algorithm, where the performance of the former is found to be generally superior to that of the latter.

Pattern Recognition | 2007

GAPS: A clustering method using a new point symmetry-based distance measure

Sanghamitra Bandyopadhyay; Sriparna Saha

In this paper, an evolutionary clustering technique is described that uses a new point symmetry-based distance measure. The algorithm is therefore able to detect both convex and non-convex clusters. Kd-tree based nearest neighbor search is used to reduce the complexity of finding the closest symmetric point. Adaptive mutation and crossover probabilities are used. The proposed GA with point symmetry (GAPS) distance based clustering algorithm is able to detect any type of clusters, irrespective of their geometrical shape and overlapping nature, as long as they possess the characteristic of symmetry. GAPS is compared with existing symmetry-based clustering technique SBKM, its modified version, and the well-known K-means algorithm. Sixteen data sets with widely varying characteristics are used to demonstrate its superiority. For real-life data sets, ANOVA and MANOVA statistical analyses are performed.

Pattern Recognition | 2010

A symmetry based multiobjective clustering technique for automatic evolution of clusters

Sriparna Saha; Sanghamitra Bandyopadhyay

In this paper the problem of automatic clustering a data set is posed as solving a multiobjective optimization (MOO) problem, optimizing a set of cluster validity indices simultaneously. The proposed multiobjective clustering technique utilizes a recently developed simulated annealing based multiobjective optimization method as the underlying optimization strategy. Here variable number of cluster centers is encoded in the string. The number of clusters present in different strings varies over a range. The points are assigned to different clusters based on the newly developed point symmetry based distance rather than the existing Euclidean distance. Two cluster validity indices, one based on the Euclidean distance, XB-index, and another recently developed point symmetry distance based cluster validity index, Sym-index, are optimized simultaneously in order to determine the appropriate number of clusters present in a data set. Thus the proposed clustering technique is able to detect both the proper number of clusters and the appropriate partitioning from data sets either having hyperspherical clusters or having point symmetric clusters. A new semi-supervised method is also proposed in the present paper to select a single solution from the final Pareto optimal front of the proposed multiobjective clustering technique. The efficacy of the proposed algorithm is shown for seven artificial data sets and six real-life data sets of varying complexities. Results are also compared with those obtained by another multiobjective clustering technique, MOCK, two single objective genetic algorithm based automatic clustering techniques, VGAPS clustering and GCUK clustering.

data and knowledge engineering | 2013

Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition

Sriparna Saha; Asif Ekbal

In this paper, we pose the classifier ensemble problem under single and multiobjective optimization frameworks, and evaluate it for Named Entity Recognition (NER), an important step in almost all Natural Language Processing (NLP) application areas. We propose the solutions to two different versions of the ensemble problem for each of the optimization frameworks. We hypothesize that the reliability of predictions of each classifier differs among the various output classes. Thus, in an ensemble system it is necessary to find out either the eligible classes for which a classifier is most suitable to vote (i.e., binary vote based ensemble) or to quantify the amount of voting for each class in a particular classifier (i.e., real vote based ensemble). We use seven diverse classifiers, namely Naive Bayes, Decision Tree (DT), Memory Based Learner (MBL), Hidden Markov Model (HMM), Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) to build a number of models depending upon the various representations of the available features that are identified and selected mostly without using any domain knowledge and/or language specific resources. The proposed technique is evaluated for three resource-constrained languages, namely Bengali, Hindi and Telugu. Results using multiobjective optimization (MOO) based technique yield the overall recall, precision and F-measure values of 94.21%, 94.72% and 94.74%, respectively for Bengali, 99.07%, 90.63% and 94.66%, respectively for Hindi and 82.79%, 95.18% and 88.55%, respectively for Telugu. Results for all the languages show that the proposed MOO based classifier ensemble with real voting attains the performance level which is superior to all the individual classifiers, three baseline ensembles and the corresponding single objective based ensemble.

IEEE Geoscience and Remote Sensing Letters | 2008

Application of a New Symmetry-Based Cluster Validity Index for Satellite Image Segmentation

Sriparna Saha; Sanghamitra Bandyopadhyay

An important approach for image segmentation is clustering pixels based on their spectral properties. In particular, satellite images contain land cover types, some of which cover significantly large areas, while some (e.g., bridges and roads) occupy relatively much smaller regions. Automatically detecting regions or clusters of such widely varying sizes presents a challenging task. In this letter, a symmetry-based cluster validity index, named Sym-index (Symmetry distance-based index), is proposed. It is able to correctly indicate the presence of clusters of different sizes as long as they are internally symmetrical. A genetic-algorithm-based clustering technique that optimizes the Sym-index is used for image segmentation where the number of clusters is determined automatically. The superiority of the proposed index, as compared to other indices, is established for automatically segmenting the land cover types from SPOT and Indian Remote Sensing satellite images of two different cities in India.

Applied Soft Computing | 2013

A generalized automatic clustering algorithm in a multiobjective framework

Sriparna Saha; Sanghamitra Bandyopadhyay

In this paper a new multiobjective (MO) clustering technique (GenClustMOO) is proposed which can automatically partition the data into an appropriate number of clusters. Each cluster is divided into several small hyperspherical subclusters and the centers of all these small sub-clusters are encoded in a string to represent the whole clustering. For assigning points to different clusters, these local sub-clusters are considered individually. For the purpose of objective function evaluation, these sub-clusters are merged appropriately to form a variable number of global clusters. Three objective functions, one reflecting the total compactness of the partitioning based on the Euclidean distance, the other reflecting the total symmetry of the clusters, and the last reflecting the cluster connectedness, are considered here. These are optimized simultaneously using AMOSA, a newly developed simulated annealing based multiobjective optimization method, in order to detect the appropriate number of clusters as well as the appropriate partitioning. The symmetry present in a partitioning is measured using a newly developed point symmetry based distance. Connectedness present in a partitioning is measured using the relative neighborhood graph concept. Since AMOSA, as well as any other MO optimization technique, provides a set of Pareto-optimal solutions, a new method is also developed to determine a single solution from this set. Thus the proposed GenClustMOO is able to detect the appropriate number of clusters and the appropriate partitioning from data sets having either well-separated clusters of any shape or symmetrical clusters with or without overlaps. The effectiveness of the proposed GenClustMOO in comparison with another recent multiobjective clustering technique (MOCK), a single objective genetic algorithm based automatic clustering technique (VGAPS-clustering), K-means and single linkage clustering techniques is comprehensively demonstrated for nineteen artificial and seven real-life data sets of varying complexities. In a part of the experiment the effectiveness of AMOSA as the underlying optimization technique in GenClustMOO is also demonstrated in comparison to another evolutionary MO algorithm, PESA2.

Information Sciences | 2009

A new point symmetry based fuzzy genetic clustering technique for automatic evolution of clusters

Sriparna Saha; Sanghamitra Bandyopadhyay

In this paper a fuzzy point symmetry based genetic clustering technique (Fuzzy-VGAPS) is proposed which can automatically determine the number of clusters present in a data set as well as a good fuzzy partitioning of the data. The clusters can be of any size, shape or convexity as long as they possess the property of symmetry. Here the membership values of points to different clusters are computed using the newly proposed point symmetry based distance. A variable number of cluster centers are encoded in the chromosomes. A new fuzzy symmetry based cluster validity index, FSym-index is first proposed here and thereafter it is utilized to measure the fitness of the chromosomes. The proposed index can detect non-convex, as well as convex-non-hyperspherical partitioning with variable number of clusters. It is mathematically justified via its relationship to a well-defined hard cluster validity function: the Dunns index, for which the condition of uniqueness has already been established. The results of the Fuzzy-VGAPS are compared with those obtained by seven other algorithms including both fuzzy and crisp methods on four artificial and four real-life data sets. Some real-life applications of Fuzzy-VGAPS to automatically cluster the gene expression data as well as segmenting the magnetic resonance brain image with multiple sclerosis lesions are also demonstrated.

Knowledge and Information Systems | 2010

A new multiobjective clustering technique based on the concepts of stability and symmetry

Sriparna Saha; Sanghamitra Bandyopadhyay

Most clustering algorithms operate by optimizing (either implicitly or explicitly) a single measure of cluster solution quality. Such methods may perform well on some data sets but lack robustness with respect to variations in cluster shape, proximity, evenness and so forth. In this paper, we have proposed a multiobjective clustering technique which optimizes simultaneously two objectives, one reflecting the total cluster symmetry and the other reflecting the stability of the obtained partitions over different bootstrap samples of the data set. The proposed algorithm uses a recently developed simulated annealing-based multiobjective optimization technique, named AMOSA, as the underlying optimization strategy. Here, points are assigned to different clusters based on a newly defined point symmetry-based distance rather than the Euclidean distance. Results on several artificial and real-life data sets in comparison with another multiobjective clustering technique, MOCK, three single objective genetic algorithm-based automatic clustering techniques, VGAPS clustering, GCUK clustering and HNGA clustering, and several hybrid methods of determining the appropriate number of clusters from data sets show that the proposed technique is well suited to detect automatically the appropriate number of clusters as well as the appropriate partitioning from data sets having point symmetric clusters. The performance of AMOSA as the underlying optimization technique in the proposed clustering algorithm is also compared with PESA-II, another evolutionary multiobjective optimization technique.

Applied Soft Computing | 2012

Some connectivity based cluster validity indices

Sriparna Saha; Sanghamitra Bandyopadhyay

Identification of the correct number of clusters and the appropriate partitioning technique are some important considerations in clustering where several cluster validity indices, primarily utilizing the Euclidean distance, have been used in the literature. In this paper a new measure of connectivity is incorporated in the definitions of seven cluster validity indices namely, DB-index, Dunn-index, Generalized Dunn-index, PS-index, I-index, XB-index and SV-index, thereby yielding seven new cluster validity indices which are able to automatically detect clusters of any shape, size or convexity as long as they are well-separated. Here connectivity is measured using a novel approach following the concept of relative neighborhood graph. It is empirically established that incorporation of the property of connectivity significantly improves the capabilities of these indices in identifying the appropriate number of clusters. The well-known clustering techniques, single linkage clustering technique and K-means clustering technique are used as the underlying partitioning algorithms. Results on eight artificially generated and three real-life data sets show that connectivity based Dunn-index performs the best as compared to all the other six indices. Comparisons are made with the original versions of these seven cluster validity indices.

ACM Transactions on Asian Language Information Processing | 2011

Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

Asif Ekbal; Sriparna Saha

In this article, we report the search capability of Genetic Algorithm (GA) to construct a weighted vote-based classifier ensemble for Named Entity Recognition (NER). Our underlying assumption is that the reliability of predictions of each classifier differs among the various named entity (NE) classes. Thus, it is necessary to quantify the amount of voting of a particular classifier for a particular output class. Here, an attempt is made to determine the appropriate weights of voting for each class in each classifier using GA. The proposed technique is evaluated for four leading Indian languages, namely Bengali, Hindi, Telugu, and Oriya, which are all resource-poor in nature. Evaluation results yield the recall, precision and F-measure values of 92.08%, 92.22%, and 92.15%, respectively for Bengali; 96.07%, 88.63%, and 92.20%, respectively for Hindi; 78.82%, 91.26%, and 84.59%, respectively for Telugu; and 88.56%, 89.98%, and 89.26%, respectively for Oriya. Finally, we evaluate our proposed approach with the benchmark dataset of CoNLL-2003 shared task that yields the overall recall, precision, and F-measure values of 88.72%, 88.64%, and 88.68%, respectively. Results also show that the vote based classifier ensemble identified by the GA-based approach outperforms all the individual classifiers, three conventional baseline ensembles, and some other existing ensemble techniques. In a part of the article, we formulate the problem of feature selection in any classifier under the single objective optimization framework and show that our proposed classifier ensemble attains superior performance to it.

Explore More