Probal Chaudhuri
Indian Statistical Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Probal Chaudhuri.
Journal of the American Statistical Association | 1999
Probal Chaudhuri; J. S. Marron
Abstract In the use of smoothing methods in data analysis, an important question is which observed features are “really there,” as opposed to being spurious sampling artifacts. An approach is described based on scale-space ideas originally developed in the computer vision literature. Assessment of Significant ZERo crossings of derivatives results in the SiZer map, a graphical device for display of significance of features with respect to both location and scale. Here “scale” means “level of resolution”; that is, “bandwidth.”
Journal of the American Statistical Association | 1996
Probal Chaudhuri
Abstract An extension of the concept of quantiles in multidimensions that uses the geometry of multivariate data clouds has been considered. The approach is based on blending as well as generalization of the key ideas used in the construction of spatial median and regression quantiles, both of which have been extensively studied in the literature. These geometric quantiles are potentially useful in constructing trimmed multivariate means as well as many other L estimates of multivariate location, and they lead to a directional notion of central and extreme points in a multidimensional setup. Such quantiles can be defined as meaningful and natural objects even in infinite-dimensional Hilbert and Banach spaces, and they yield an effective generalization of quantile regression in multiresponse linear model problems. Desirable equivariance properties are shown to hold for these multivariate quantiles, and issues related to their computation for data in finite-dimensional spaces are discussed. n 1/2 consistenc...
Journal of Multivariate Analysis | 1991
Probal Chaudhuri
Let (X, Y) be a random vector such that X is d-dimensional, Y is real valued, and [theta](X) is the conditional [alpha]th quantile of Y given X, where [alpha] is a fixed number such that 0 0, and set r = (p - m)/(2p + d), where m is a nonnegative integer smaller than p. Let T([theta]) denote a derivative of [theta] of order m. It is proved that there exists estimate of T([theta]), based on a set of i.i.d. observations (X1, Y1), ..., (Xn, Yn), that achieves the optimal nonparametric rate of convergence n-r in Lq-norms (1
IEEE Transactions on Image Processing | 2004
Rishi R. Rakesh; Probal Chaudhuri; C. A. Murthy
Many edge detectors are available in image processing literature where the choices of input parameters are to be made by the user. Most of the time, such choices are made on an ad-hoc basis. In this article, an edge detector is proposed where thresholding is performed using statistical principles. Local standardization of thresholds for each individual pixel (local thresholding), which depends upon the statistical variability of the gradient vector at that pixel, is done. Such a standardized statistic based on the gradient vector at each pixel is used to determine the eligibility of the pixel to be an edge pixel. The results obtained from the proposed method are found to be comparable to those from many well-known edge detectors. However, the values of the input parameters providing the appreciable results in the proposed detector are found to be more stable than other edge detectors and possess statistical interpretation.
Journal of Computational and Graphical Statistics | 2002
Fred Godtliebsen; J. S. Marron; Probal Chaudhuri
An important problem in the use of density estimation for data analysis is whether or not observed features, such as bumps, are “really there” as opposed to being artifacts of the natural sampling variability. Here we propose a solution to this problem, in the challenging two-dimensional case, using the graphical technique of significance in scale space. Color and dynamic graphics form an important part of the visualization method.
Journal of the American Statistical Association | 1993
Probal Chaudhuri; Per A. Mykland
Abstract Nonlinear experiments involve response and regressors that are connected through a nonlinear regression-type structure. Examples of nonlinear models include standard nonlinear regression, logistic regression, probit regression. Poisson regression, gamma regression, inverse Gaussian regression, and so on. The Fisher information associated with a nonlinear experiment is typically a complex nonlinear function of the unknown parameter of interest. As a result, we face an awkward situation. Designing an efficient experiment will require knowledge of the parameter, but the purpose of the experiment is to generate data to yield parameter estimates! Our principal objective here is to investigate proper designing of nonlinear experiments that will let us construct efficient estimates of parameters. We focus our attention on a very general nonlinear setup that includes many models commonly encountered in practice. The experiments considered have two fundamental stages: a static design in the initial stage,...
Proceedings of the American Mathematical Society | 1996
Biman Chakraborty; Probal Chaudhuri
An affine equivariant version of multivariate median is introduced. The proposed median is easy to compute and has some appealing geometric features that are related to the configuration of a multivariate data cloud. The transformation and re-transformation approach used in the construction of the median has some fundamental connection with the data driven co-ordinate system considered by Chaudhuri and Sengupta (1993, Journal of the American Statistical Association). Large sample statistical properties of the median are discussed and finite sample performance is investigated using Monte Carlo simulations.
Journal of the American Statistical Association | 1993
Probal Chaudhuri; Debapriiya Sengupta
Abstract Multivariate sign tests attracted several statisticians in the past, and it is evident from recent nonparametric literature that they still continue to draw attention. One of the most important features of the univariate sign test is that it does not involve much technical assumptions or complicacy, and this makes it quite popular among statistics users. In this article we have come up with a new method for constructing multivariate sign tests that have reasonable statistical properties and can be used conveniently to solve one-sample location problems. Our principal strategy here is to make a wise utilization of certain geometric structures in the constellation of data points for making inference about the location of their distribution. As we proceed with the development of a fairly broad and general methodology, we indicate its relationship with previous work done by others and sometimes attempt to unify some of the earlier ideas. In particular, we pick up some well-known tests for uniform dis...
Technometrics | 2006
Anil K. Ghosh; Probal Chaudhuri; Debasis Sengupta
The use of kernel density estimates in discriminant analysis is quite well known among scientists and engineers interested in statistical pattern recognition. Using a kernel density estimate involves properly selecting the scale of smoothing, namely the bandwidth parameter. The bandwidth that is optimum for the mean integrated square error of a class density estimator may not always be good for discriminant analysis, where the main emphasis is on the minimization of misclassification rates. On the other hand, cross-validation–based methods for bandwidth selection, which try to minimize estimated misclassification rates, may require huge computation when there are several competing populations. Besides, such methods usually allow only one bandwidth for each population density estimate, whereas in a classification problem, the optimum bandwidth for a class density estimate may vary significantly, depending on its competing class densities and their prior probabilities. Therefore, in a multiclass problem, it would be more meaningful to have different bandwidths for a class density when it is compared with different competing class densities. Moreover, good choice of bandwidths should also depend on the specific observation to be classified. Consequently, instead of concentrating on a single optimum bandwidth for each population density estimate, it is more useful in practice to look at the results for different scales of smoothing for the kernel density estimates. This article presents such a multiscale approach along with a graphical device leading to a more informative discriminant analysis than the usual approach based on a single optimum scale of smoothing for each class density estimate. When there are more than two competing classes, this method splits the problem into a number of two-class problems, which allows the flexibility of using different bandwidths for different pairs of competing classes and at the same time reduces the computational burden that one faces for usual cross-validation–based bandwidth selection in the presence of several competing populations. We present some benchmark examples to illustrate the usefulness of the proposed methodology.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2005
Anil K. Ghosh; Probal Chaudhuri; C. A. Murthy
Nearest neighbor classification is one of the simplest and most popular methods for statistical pattern recognition. A major issue in k-nearest neighbor classification is how to find an optimal value of the neighborhood parameter k. In practice, this value is generally estimated by the method of cross-validation. However, the ideal value of k in a classification problem not only depends on the entire data set, but also on the specific observation to be classified. Instead of using any single value of k, this paper studies results for a finite sequence of classifiers indexed by k. Along with the usual posterior probability estimates, a new measure, called the Bayesian measure of strength, is proposed and investigated in this paper as a measure of evidence for different classes. The results of these classifiers and their corresponding estimated misclassification probabilities are visually displayed using shaded strips. These plots provide an effective visualization of the evidence in favor of different classes when a given data point is to be classified. We also propose a simple weighted averaging technique that aggregates the results of different nearest neighbor classifiers to arrive at the final decision. Based on the analysis of several benchmark data sets, the proposed method is found to be better than using a single value of k.