Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Anil K. Ghosh is active.

Publication


Featured researches published by Anil K. Ghosh.


Computational Statistics & Data Analysis | 2006

On optimum choice of k in nearest neighbor classification

Anil K. Ghosh

A major issue in k-nearest neighbor classification is how to choose the optimum value of the neighborhood parameter k. Popular cross-validation techniques often fail to guide us well in selecting k mainly due to the presence of multiple minimizers of the estimated misclassification rate. This article investigates a Bayesian method in this connection, which solves the problem of multiple optimizers. The utility of the proposed method is illustrated using some benchmark data sets.


Technometrics | 2006

Classification Using Kernel Density Estimates: Multiscale Analysis and Visualization

Anil K. Ghosh; Probal Chaudhuri; Debasis Sengupta

The use of kernel density estimates in discriminant analysis is quite well known among scientists and engineers interested in statistical pattern recognition. Using a kernel density estimate involves properly selecting the scale of smoothing, namely the bandwidth parameter. The bandwidth that is optimum for the mean integrated square error of a class density estimator may not always be good for discriminant analysis, where the main emphasis is on the minimization of misclassification rates. On the other hand, cross-validation–based methods for bandwidth selection, which try to minimize estimated misclassification rates, may require huge computation when there are several competing populations. Besides, such methods usually allow only one bandwidth for each population density estimate, whereas in a classification problem, the optimum bandwidth for a class density estimate may vary significantly, depending on its competing class densities and their prior probabilities. Therefore, in a multiclass problem, it would be more meaningful to have different bandwidths for a class density when it is compared with different competing class densities. Moreover, good choice of bandwidths should also depend on the specific observation to be classified. Consequently, instead of concentrating on a single optimum bandwidth for each population density estimate, it is more useful in practice to look at the results for different scales of smoothing for the kernel density estimates. This article presents such a multiscale approach along with a graphical device leading to a more informative discriminant analysis than the usual approach based on a single optimum scale of smoothing for each class density estimate. When there are more than two competing classes, this method splits the problem into a number of two-class problems, which allows the flexibility of using different bandwidths for different pairs of competing classes and at the same time reduces the computational burden that one faces for usual cross-validation–based bandwidth selection in the presence of several competing populations. We present some benchmark examples to illustrate the usefulness of the proposed methodology.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005

On visualization and aggregation of nearest neighbor classifiers

Anil K. Ghosh; Probal Chaudhuri; C. A. Murthy

Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k.


Journal of Multivariate Analysis | 2014

A nonparametric two-sample test applicable to high dimensional data

Munmun Biswas; Anil K. Ghosh

The multivariate two-sample testing problem has been well investigated in the literature, and several parametric and nonparametric methods are available for it. However, most of these two-sample tests perform poorly for high dimensional data, and many of them are not applicable when the dimension of the data exceeds the sample size. In this article, we propose a multivariate two-sample test that can be conveniently used in the high dimension low sample size setup. Asymptotic results on the power properties of our proposed test are derived when the sample size remains fixed, and the dimension of the data grows to infinity. We investigate the performance of this test on several high-dimensional simulated and real data sets, and demonstrate its superiority over several other existing two-sample tests. We also study some theoretical properties of the proposed test for situations when the dimension of the data remains fixed and the sample size tends to infinity. In such cases, it turns out to be asymptotically distribution-free and consistent under general alternatives.


Bernoulli | 2011

Some intriguing properties of Tukey’s half-space depth

Subhajit Dutta; Anil K. Ghosh; Probal Chaudhuri

For multivariate data, Tukeys half-space depth is one of the most popular depth functions available in the literature. It is conceptually simple and satisfies several desirable properties of depth functions. The Tukey median, the multivariate median associated with the half-space depth, is also a well-known measure of center for multivariate data with several interesting properties. In this article, we derive and investigate some interesting properties of half-space depth and its associated multivariate median. These properties, some of which are counterintuitive, have important statistical consequences in multivariate analysis. We also investigate a natural extension of Tukeys half-space depth and the related median for probability distributions on any Banach space (which may be finite- or infinite-dimensional) and prove some results that demonstrate anomalous behavior of half-space depth in infinite-dimensional spaces.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2009

Classification Based on Hybridization of Parametric and Nonparametric Classifiers

Probal Chaudhuri; Anil K. Ghosh; Hannu Oja

Parametric methods of classification assume specific parametric models for competing population densities (e.g., Gaussian population densities can lead to linear and quadratic discriminant analysis) and they work well when these model assumptions are valid. Violation in one or more of these parametric model assumptions often leads to a poor classifier. On the other hand, nonparametric classifiers (e.g., nearest-neighbor and kernel-based classifiers) are more flexible and free from parametric model assumptions. But, the statistical instability of these classifiers may lead to poor performance when we have small numbers of training sample observations. Nonparametric methods, however, do not use any parametric structure of population densities. Therefore, even when one has some additional information about population densities, that important information is not used to modify the nonparametric classification rule. This paper makes an attempt to overcome these limitations of parametric and nonparametric approaches and combines their strengths to develop some hybrid classification methods. We use some simulated examples and benchmark data sets to examine the performance of these hybrid discriminant analysis tools. Asymptotic results on their misclassification rates have been derived under appropriate regularity conditions.


systems man and cybernetics | 2006

Multiscale Classification Using Nearest Neighbor Density Estimates

Anil K. Ghosh; Probal Chaudhuri; C. A. Murthy

Density estimates based on k-nearest neighbors have useful applications in nonparametric discriminant analysis. In classification problems, optimal values of k are usually estimated by minimizing the cross-validated misclassification rates. However, these cross-validation techniques allow only one value of k for each population density estimate, while in a classification problem, the optimum value of k for a class may also depend on its competing population densities. Further, it is computationally difficult to minimize the cross-validated error rate when there are several competing populations. Moreover, in addition to depending on the entire training data set, a good choice of k should also depend on the specific observation to be classified. Therefore, instead of using a single value of k for each population density estimate, it is more useful in practice to consider the results for multiple values of k to arrive at the final decision. This paper presents one such approach along with a graphical device, which gives more information about classification results for various choices of k and the related statistical uncertainties present there. The utility of this proposed methodology has been illustrated using some benchmark data sets


International Journal of Productivity and Performance Management | 2012

Performance appraisal based on a forced distribution system: its drawbacks and remedies

Rachana Chattopadhayay; Anil K. Ghosh

Purpose – Performance appraisal based on a forced distribution system (FDS) is widely used in large corporate sectors around the globe. Though many researchers have pointed out several drawbacks in FDS, due to the absence of any suitable alternative, it has been (and continues to be) adopted by many industries over a long period of time. The purpose of this paper is to point out some serious limitations of this system and propose a simple modification to overcome these limitations.Design/methodology/approach – FDS determines the relative positions of the employees involved in similar work by comparing them against one another, and based on their performance, the employees receive different grades. Here the authors use the Likerts scaling method to convert these grades into numerical scores, then these scores are used to estimate the average performance of each group of employees, which is referred to as the group index. Taking these group indices into consideration, the authors propose a modified perform...


Journal of Multivariate Analysis | 2015

On high dimensional two-sample tests based on nearest neighbors

Pronoy Kanti Mondal; Munmun Biswas; Anil K. Ghosh

In this article, we propose new multivariate two-sample tests based on nearest neighbor type coincidences. While several existing tests for the multivariate two-sample problem perform poorly for high dimensional data, and many of them are not applicable when the dimension exceeds the sample size, these proposed tests can be conveniently used in the high dimension low sample size (HDLSS) situations. Unlike Schilling (1986) [26] and Henze’s (1988) test based on nearest neighbors, under fairly general conditions, these new tests are found to be consistent in HDLSS asymptotic regime, where the sample size remains fixed and the dimension grows to infinity. Several high dimensional simulated and real data sets are analyzed to compare their empirical performance with some popular two-sample tests available in the literature. We further investigate the behavior of these proposed tests in classical asymptotic regime, where the dimension of the data remains fixed and the sample size tends to infinity. In such cases, they turn out to be asymptotically distribution-free and consistent under general alternatives.


Pattern Recognition Letters | 2012

A probabilistic approach for semi-supervised nearest neighbor classification

Anil K. Ghosh

In supervised classification, we learn from a training set of labeled observations to form a decision rule for classifying all unlabeled test cases. But if the training sample is small, one may fail to extract sufficient information from that sample to develop a good classifier. Because of the statistical instability of nonparametric methods, this problem becomes more evident in the case of nonparametric classification. In such cases, if one can extract useful information also from unlabeled test cases and use that to modify the classification rule, the performance of the resulting classifier can be improved substantially. In this article, we use a probabilistic framework to develop such methods for nearest neighbor classification. The resulting classifiers, called semi-supervised or transductive classifiers, usually perform better than supervised methods, especially when the training sample is small. Some benchmark data sets are analyzed to show the utility of these proposed methods.

Collaboration


Dive into the Anil K. Ghosh's collaboration.

Top Co-Authors

Avatar

Probal Chaudhuri

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Soham Sarkar

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Munmun Biswas

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Subhajit Dutta

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Smarajit Bose

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

C. A. Murthy

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Debasis Sengupta

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Minerva Mukhopadhyay

Indian Statistical Institute

View shared research outputs
Top Co-Authors

Avatar

Partha Sarathi Mandal

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Pronoy Kanti Mondal

Indian Statistical Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge