Ranjan Maitra
Iowa State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ranjan Maitra.
Statistics Surveys | 2010
Volodymyr Melnykov; Ranjan Maitra
Finite mixture models have a long history in statistics, hav- ing been used to model pupulation heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classication. This paper provides a detailed review into mixture models and model-based clustering. Recent trends in the area, as well as open problems are also discussed.
Journal of Computational and Graphical Statistics | 2010
Ranjan Maitra; Volodymyr Melnykov
A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.
IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2009
Ranjan Maitra
Clustering datasets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are sub-optimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. Results on test experiments are excellent. We also provide a detailed comparative assessment of the suggested algorithm with many commonly-used initialization approaches in the literature. Finally, the methodology is applied to two datasets on diurnal microarray gene expressions and industrial releases of mercury.
Magnetic Resonance in Medicine | 2002
Ranjan Maitra; Steven R. Roys; Rao P. Gullapalli
Functional magnetic resonance imaging (fMRI) data are commonly used to construct activation maps for the human brain. It is important to quantify the reliability of such maps. We have developed statistical models to provide precise estimates for reliability from several runs of the same paradigm over time. Specifically, our method extends the premise of maximum likelihood (ML) developed by Genovese et al. (Magn Reson Med 1997;38:497–507) by incorporating spatial context into the estimation process. Experiments indicate that our methodology provides more conservative estimates of true positives compared to those obtained by Genovese et al. The reliability estimates can be used to obtain voxel‐specific reliability measures for activated as well as inactivated regions in future experiments. We derive statistical methodology to determine optimal thresholds for region‐ and context‐specific activations. Empirical guidelines are also provided on the number of repeat scans to acquire in order to arrive at accurate reliability estimates. We report the results from experiments involving a motor paradigm performed on a single subject several times over a period of 2 months. Magn Reson Med 48:62–70, 2002.
Technometrics | 2001
Ranjan Maitra
Clustering datasets is not an easy problem in general, and the difficulty is compounded for massive datasets. This article develops, under Gaussian assumptions, a multistage algorithm that clusters an initial sample, filters out observations that can be reasonably classified by these clusters, and iterates the preceding procedure on the remainder. A final step uses the estimated class probabilities and dispersions to classify each observation in the dataset. Results on test experiments indicate good performance. Application to datasets from software metrics and positron emission tomography required no more than five stages each, suggesting that the procedure is practical to implement.
NeuroImage | 2010
Ranjan Maitra
Functional Magnetic Resonance Imaging (fMRI) is a popular noninvasive modality to investigate activation in the human brain. The end result of most fMRI experiments is an activation map corresponding to the given paradigm. These maps can vary greatly from one study to the next, so quantifying the reliability of identified activation over several fMRI studies is important. The percent overlap of activation (Rombouts et al., 1998; Machielsen et al., 2000) is a global reliability measure between activation maps drawn from any two fMRI studies. A slightly modified but more intuitive measure is provided by the Jaccard (1901) coefficient of similarity, whose use we study in this paper. A generalization of these measures is also proposed to comprehensively summarize the reliability of multiple fMRI studies. Finally, a testing mechanism to flag potentially anomalous studies is developed. The methodology is illustrated on studies involving left- and right-hand motor task paradigms performed by a right-hand dominant male subject several times over a period of two months, with excellent results.
Journal of Computational and Graphical Statistics | 2010
Ranjan Maitra; Ivan Ramler
A k-means-type algorithm is proposed for efficiently clustering data constrained to lie on the surface of a p-dimensional unit sphere, or data that are mean-zero-unit-variance standardized observations such as those that occur when using Euclidean distance to cluster time series gene expression data using a correlation metric. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results from a detailed series of experiments show excellent performance, even with very large datasets. The methodology is applied to the analysis of the mitotic cell division cycle of budding yeast dataset of Cho et al. [Molecular Cell (1998), 2, 65–73]. The entire dataset has not been analyzed previously, so our analysis provides an understanding for the complete set of genes acting in concert and differentially. We also use our methodology on the submitted abstracts of oral presentations made at the 2008 Joint Statistical Meetings (JSM) to identify similar topics. Our identified groups are both interpretable and distinct and the methodology provides a possible automated tool for efficient parallel scheduling of presentations at professional meetings. The supplemental materials described in the article are available in the online supplements.
IEEE Transactions on Medical Imaging | 2009
Ranjan Maitra; David Faden
Estimating the noise parameter in magnitude magnetic resonance (MR) images is important in a wide range of applications. We propose an automatic noise estimation method that does not rely on a substantial proportion of voxels being from the background. Specifically, we model the magnitude of the observed signal as a mixture of Rice distributions with common noise parameter. The expectation-maximization (EM) algorithm is used to estimate all parameters, including the common noise parameter. The algorithm needs initializing values for which we provide some strategies that work well. The number of components in the mixture model also needs to be estimated en route to noise estimation and we provide a novel approach to doing so. Our methodology performs very well on a range of simulation experiments and physical phantom data. Finally, the methodology is demonstrated on four clinical datasets.
Journal of the American Statistical Association | 2012
Ranjan Maitra; Volodymyr Melnykov; Soumendra N. Lahiri
This article proposes a bootstrap approach for assessing significance in the clustering of multidimensional datasets. The procedure compares two models and declares the more complicated model a better candidate if there is significant evidence in its favor. The performance of the procedure is illustrated on two well-known classification datasets and comprehensively evaluated in terms of its ability to estimate the number of components via extensive simulation studies, with excellent results. The methodology is also applied to the problem of k-means color quantization of several standard images in the literature and is demonstrated to be a viable approach for determining the minimal and optimal numbers of colors needed to display an image without significant loss in resolution. Additional illustrations and performance evaluations are provided in the online supplementary material.
Biometrics | 2009
Ranjan Maitra; Ivan Ramler
SUMMARY A new methodology is proposed for clustering datasets in the presence of scattered observations. Scattered observations are defined as unlike any other, so traditional approaches that force them into groups can lead to erroneous conclusions. Our suggested approach is a scheme which, under assumption of homogeneous spherical clusters, iteratively builds cores around their centers and groups points within each core while identifying points outside as scatter. In the absence of scatter, the algorithm reduces to k-means. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results in experimental situations show excellent performance, especially when clusters are elliptically symmetric. The methodology is applied to the analysis of the United States Environmental Protection Agencys Toxic Release Inventory reports on industrial releases of mercury for the year 2000.