Is this you? Create Your Porfile

Pabitra Mitra

Indian Institute of Technology Kharagpur

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pabitra Mitra is active.

Explore More

Publication

Featured researches published by Pabitra Mitra.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002

Unsupervised feature selection using feature similarity

Pabitra Mitra; C. A. Murthy; Sankar K. Pal

In this article, we describe an unsupervised feature selection algorithm suitable for data sets, large in both dimension and size. The method is based on measuring similarity between features whereby redundancy therein is removed. This does not need any search and, therefore, is fast. A new feature similarity measure, called maximum information compression index, is introduced. The algorithm is generic in nature and has the capability of multiscale representation of data sets. The superiority of the algorithm, in terms of speed and performance, is established extensively over various real-life data sets of different sizes and dimensions. It is also demonstrated how redundancy and information loss in feature selection can be quantified with an entropy measure.

IEEE Transactions on Neural Networks | 2002

Data mining in soft computing framework: a survey

Sankar K. Pal; Pabitra Mitra

The present article provides a survey of the available literature on data mining using soft computing. A categorization has been provided based on the different soft computing tools and their hybridizations used, the data mining function implemented, and the preference criterion selected by the model. The utility of the different soft computing methodologies is highlighted. Generally fuzzy sets are suitable for handling the issues related to understandability of patterns, incomplete/noisy data, mixed media information and human interaction, and can provide approximate solutions faster. Neural networks are nonparametric, robust, and exhibit good learning and generalization capabilities in data-rich environments. Genetic algorithms provide efficient search algorithms to select a model, from mixed media data, based on some preference criterion/objective function. Rough sets are suitable for handling different types of uncertainty in data. Some challenges to data mining and the application of soft computing methodologies are indicated. An extensive bibliography is also included.

Pattern Recognition Letters | 2004

Segmentation of multispectral remote sensing images using active support vector machines

Pabitra Mitra; B. Uma Shankar; Sankar K. Pal

The problem of scarcity of labeled pixels, required for segmentation of remotely sensed satellite images in supervised pixel classification framework, is addressed in this article. A support vector machine (SVM) is considered for classifying the pixels into different landcover types. It is initially designed using a small set of labeled points, and subsequently refined by actively querying for the labels of pixels from a pool of unlabeled data. The label of the most interesting/ ambiguous unlabeled point is queried at each step. Here, active learning is exploited to minimize the number of labeled data used by the SVM classifier by several orders. These features are demonstrated on an IRS-1A four band multi-spectral image. Comparison with related methods is made in terms of number of data points used, computational time and a cluster quality measure.

IEEE Transactions on Knowledge and Data Engineering | 2004

Case generation using rough sets with fuzzy representation

Sankar K. Pal; Pabitra Mitra

We propose a rough-fuzzy hybridization scheme for case generation. Fuzzy set theory is used for linguistic representation of patterns, thereby producing a fuzzy granulation of the feature space. Rough set theory is used to obtain dependency rules which model informative regions in the granulated feature space. The fuzzy membership functions corresponding to the informative regions are stored as cases along with the strength values. Case retrieval is made using a similarity measure based on these membership functions. Unlike the existing case selection methods, the cases here are cluster granules and not sample points. Also, each case involves a reduced number of relevant features. These makes the algorithm suitable for mining data sets, large both in dimension and size, due to its low-time requirement in case generation as well as retrieval. Superiority of the algorithm in terms of classification accuracy and case generation and retrieval times is demonstrated on some real-life data sets.

international world wide web conferences | 2008

Feature weighting in content based recommendation system using social network analysis

Souvik Debnath; Niloy Ganguly; Pabitra Mitra

We propose a hybridization of collaborative filtering and content based recommendation system. Attributes used for content based recommendations are assigned weights depending on their importance to users. The weight values are estimated from a set of linear regression equations obtained from a social network graph which captures human judgment about similarity of items.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2002

Density-based multiscale data condensation

Pabitra Mitra; C. A. Murthy; Sankar K. Pal

A problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approaches. The accuracy of representation by the condensed set is measured in terms of the error in density estimates of the original and reduced sets. Experimental studies on several real life data sets show that the multiscale approach is superior to several related condensation methods both in terms of condensation ratio and estimation error. The condensed set obtained was also experimentally shown to be effective for some important data mining tasks like classification, clustering, and rule generation on large data sets. Moreover, it is empirically found that the algorithm is efficient in terms of sample complexity.

ACM Transactions on Information Systems | 2007

YASS: Yet another suffix stripper

Prasenjit Majumder; Mandar Mitra; Swapan K. Parui; Gobinda Kole; Pabitra Mitra; Kalyankumar Datta

Stemmers attempt to reduce a word to its stem or root form and are used widely in information retrieval tasks to increase the recall rate. Most popular stemmers encode a large number of language-specific rules built over a length of time. Such stemmers with comprehensive rules are available only for a few languages. In the absence of extensive linguistic resources for certain languages, statistical language processing tools have been successfully used to improve the performance of IR systems. In this article, we describe a clustering-based approach to discover equivalence classes of root words and their morphological variants. A set of string distance measures are defined, and the lexicon for a given text collection is clustered using the distance measures to identify these equivalence classes. The proposed approach is compared with Porters and Lovins stemmers on the AP and WSJ subcollections of the Tipster dataset using 200 queries. Its performance is comparable to that of Porters and Lovins stemmers, both in terms of average precision and the total number of relevant documents retrieved. The proposed stemming algorithm also provides consistent improvements in retrieval performance for French and Bengali, which are currently resource-poor.

IEEE Transactions on Geoscience and Remote Sensing | 2002

Multispectral image segmentation using the rough-set-initialized EM algorithm

Sankar K. Pal; Pabitra Mitra

The problem of segmentation of multispectral satellite images is addressed. An integration of rough-set-theoretic knowledge extraction, the Expectation Maximization (EM) algorithm, and minimal spanning tree (MST) clustering is described. EM provides the statistical model of the data and handles the associated measurement and representation uncertainties. Rough-set theory helps in faster convergence and in avoiding the local minima problem, thereby enhancing the performance of EM. For rough-set-theoretic rule generation, each band is discretized using fuzzy-correlation-based gray-level thresholding. MST enables determination of nonconvex clusters. Since this is applied on Gaussians, determined by granules, rather than on the original data points, time required is very low. These features are demonstrated on two IRS-1A four-band images. Comparison with related methods is made in terms of computation time and a cluster quality measure.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2004

A probabilistic active support vector learning algorithm

Pabitra Mitra; C. A. Murthy; Sankar K. Pal

The paper describes a probabilistic active learning strategy for support vector machine (SVM) design in large data applications. The learning strategy is motivated by the statistical query model. While most existing methods of active SVM learning query for points based on their proximity to the current separating hyperplane, the proposed method queries for a set of points according to a distribution as determined by the current separating hyperplane and a newly defined concept of an adaptive confidence factor. This enables the algorithm to have more robust and efficient learning capabilities. The confidence factor is estimated from local information using the k nearest neighbor principle. The effectiveness of the method is demonstrated on real-life data sets both in terms of generalization performance, query complexity, and training time.

international conference on pattern recognition | 2000

Data condensation in large databases by incremental learning with support vector machines

Pabitra Mitra; C. A. Murthy; Sankar K. Pal

An algorithm for data condensation using support vector machines (SVM) is presented. The algorithm extracts data points lying close to the class boundaries, which form a much reduced but critical set for classification. The problem of large memory requirements for training SVM in batch mode is circumvented by adopting an active incremental learning algorithm. The learning strategy is motivated from the condensed nearest neighbor classification technique. Experimental results presented show that such active incremental learning enjoy superiority in terms of computation time and condensation ratio, over related methods.

Explore More