Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where J. Sunil Rao is active.

Publication


Featured researches published by J. Sunil Rao.


Annals of Statistics | 2005

Spike and slab variable selection: Frequentist and Bayesian strategies

Hemant Ishwaran; J. Sunil Rao

Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedures ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty.


Journal of the American Statistical Association | 2003

Detecting Differentially Expressed Genes in Microarrays Using Bayesian Model Selection

Hemant Ishwaran; J. Sunil Rao

DNA microarrays open up a broad new horizon for investigators interested in studying the genetic determinants of disease. The high throughput nature of these arrays, where differential expression for thousands of genes can be measured simultaneously, creates an enormous wealth of information, but also poses a challenge for data analysis because of the large multiple testing problem involved. The solution has generally been to focus on optimizing false-discovery rates while sacrificing power. The drawback of this approach is that more subtle expression differences will be missed that might give investigators more insight into the genetic environment necessary for a disease process to take hold. We introduce a new method for detecting differentially expressed genes based on a high-dimensional model selection technique, Bayesian ANOVA for microarrays (BAM), which strikes a balance between false rejections and false nonrejections. The basis of the new approach involves a weighted average of generalized ridge regression estimates that provides the benefits of using shrinkage estimation combined with model averaging. A simple graphical tool based on the amount of shrinkage is developed to visualize the trade-off between low false-discovery rates and finding more genes. Simulations are used to illustrate BAMs performance, and the method is applied to a large database of colon cancer gene expression data. Our working hypothesis in the colon cancer analysis is that large differential expressions may not be the only ones contributing to metastasis—in fact, moderate changes in expression of genes may be involved in modifying the genetic environment to a sufficient extent for metastasis to occur. A functional biological analysis of gene effects found by BAM, but not other false-discovery-based approaches, lends support to this hypothesis.


Journal of the American Statistical Association | 2005

Spike and Slab Gene Selection for Multigroup Microarray Data

Hemant Ishwaran; J. Sunil Rao

DNA microarrays can provide insight into genetic changes that characterize different stages of a disease process. Accurate identification of these changes has significant therapeutic and diagnostic implications. Statistical analysis for multistage (multigroup) data is challenging, however. ANOVA-based extensions of two-sample Z-tests, a popular method for detecting differentially expressed genes in two groups, do not work well in multigroup settings. False detection rates are high because of variability of the ordinary least squares estimators and because of regression to the mean induced by correlated parameter estimates. We develop a Bayesian rescaled spike and slab hierarchical model specifically designed for the multigroup gene detection problem. Data preprocessing steps are introduced to deal with unique features of microarray data and to enhance selection performance. We show theoretically that spike and slab models naturally encourage sparse solutions through a process called selective shrinkage. This translates into oracle-like gene selection risk performance compared with ordinary least squares estimates. The methodology is illustrated on a large microarray repository of samples from different clinical stages of metastatic colon cancer. Through a functional analysis of selected genes, we show that spike and slab models identify important biological signals while minimizing biologically implausible false detections.


Annals of Statistics | 2008

Fence methods for mixed model selection

Jiming Jiang; J. Sunil Rao; Zhonghua Gu; Thuan Nguyen

Many model search strategies involve trading off model fit with model complexity in a penalized goodness of fit measure. Asymptotic properties for these types of procedures in settings like linear regression and ARMA time series have been studied, but these do not naturally extend to nonstandard situations such as mixed effects models, where simple definition of the sample size is not meaningful. This paper introduces a new class of strategies, known as fence methods, for mixed model selection, which includes linear and generalized linear mixed models. The idea involves a procedure to isolate a subgroup of what are known as correct models (of which the optimal model is a member). This is accomplished by constructing a statistical fence, or barrier, to carefully eliminate incorrect models. Once the fence is constructed, the optimal model is selected from among those within the fence according to a criterion which can be made flexible. In addition, we propose two variations of the fence. The first is a stepwise procedure to handle situations of many predictors; the second is an adaptive approach for choosing a tuning constant. We give sufficient conditions for consistency of fence and its variations, a desirable property for a good model selection procedure. The methods are illustrated through simulation studies and real data analysis.


BMC Bioinformatics | 2006

BAMarray™: Java software for Bayesian analysis of variance for microarray data

Hemant Ishwaran; J. Sunil Rao; Udaya B. Kogalur

BackgroundDNA microarrays open up a new horizon for studying the genetic determinants of disease. The high throughput nature of these arrays creates an enormous wealth of information, but also poses a challenge to data analysis. Inferential problems become even more pronounced as experimental designs used to collect data become more complex. An important example is multigroup data collected over different experimental groups, such as data collected from distinct stages of a disease process. We have developed a method specifically addressing these issues termed Bayesian ANOVA for microarrays (BAM). The BAM approach uses a special inferential regularization known as spike-and-slab shrinkage that provides an optimal balance between total false detections and total false non-detections. This translates into more reproducible differential calls. Spike and slab shrinkage is a form of regularization achieved by using information across all genes and groups simultaneously.ResultsBAMarray™ is a graphically oriented Java-based software package that implements the BAM method for detecting differentially expressing genes in multigroup microarray experiments (up to 256 experimental groups can be analyzed). Drop-down menus allow the user to easily select between different models and to choose various run options. BAMarray™ can also be operated in a fully automated mode with preselected run options. Tuning parameters have been preset at theoretically optimal values freeing the user from such specifications. BAMarray™ provides estimates for gene differential effects and automatically estimates data adaptive, optimal cutoff values for classifying genes into biological patterns of differential activity across experimental groups. A graphical suite is a core feature of the product and includes diagnostic plots for assessing model assumptions and interactive plots that enable tracking of prespecified gene lists to study such things as biological pathway perturbations. The user can zoom in and lasso genes of interest that can then be saved for downstream analyses.ConclusionBAMarray™ is user friendly platform independent software that effectively and efficiently implements the BAM methodology. Classifying patterns of differential activity is greatly facilitated by a data adaptive cutoff rule and a graphical suite. BAMarray™ is licensed software freely available to academic institutions. More information can be found at http://www.bamarray.com.


Human Molecular Genetics | 2010

Functional interactions between the LRP6 WNT co-receptor and folate supplementation

Jason D. Gray; Ghunwa Nakouzi; Bozena Slowinska-Castaldo; Jean Eudes Dazard; J. Sunil Rao; Joseph H. Nadeau; M. Elizabeth Ross

Crooked tail (Cd) mice bear a gain-of-function mutation in Lrp6, a co-receptor for canonical WNT signaling, and are a model of neural tube defects (NTDs), preventable with dietary folic acid (FA) supplementation. Whether the FA response reflects a direct influence of FA on LRP6 function was tested with prenatal supplementation in LRP6-deficient embryos. The enriched FA (10 ppm) diet reduced the occurrence of birth defects among all litters compared with the control (2 ppm FA) diet, but did so by increasing early lethality of Lrp6(-/-) embryos while actually increasing NTDs among nulls alive at embryonic days 10-13 (E10-13). Proliferation in cranial neural folds was reduced in homozygous Lrp6(-/-) mutants versus wild-type embryos at E10, and FA supplementation increased proliferation in wild-type but not mutant neuroepithelia. Canonical WNT activity was reduced in LRP6-deficient midbrain-hindbrain at E9.5, demonstrated in vivo by a TCF/LEF-reporter transgene. FA levels in media modulated the canonical WNT response in NIH3T3 cells, suggesting that although FA was required for optimal WNT signaling, even modest FA elevations attenuated LRP5/6-dependent canonical WNT responses. Gene expression analysis in embryos and adults showed striking interactions between targeted Lrp6 deficiency and FA supplementation, especially for mitochondrial function, folate and methionine metabolism, WNT signaling and cytoskeletal regulation that together implicate relevant signaling and metabolic pathways supporting cell proliferation, morphology and differentiation. We propose that FA supplementation rescues Lrp6(Cd/Cd) fetuses by normalizing hyperactive WNT activity, whereas in LRP6-deficient embryos, added FA further attenuates reduced WNT activity, thereby compromising development.


Journal of the American Statistical Association | 2011

Best Predictive Small Area Estimation

Jiming Jiang; Thuan Nguyen; J. Sunil Rao

We derive the best predictive estimator (BPE) of the fixed parameters under two well-known small area models, the Fay–Herriot model and the nested-error regression model. This leads to a new prediction procedure, called observed best prediction (OBP), which is different from the empirical best linear unbiased prediction (EBLUP). We show that BPE is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood (ML) and restricted maximum likelihood (REML), if the main interest is estimation of small area means, which is a mixed-model prediction problem. We use both theoretical derivations and empirical studies to demonstrate that the OBP can significantly outperform EBLUP in terms of the mean squared prediction error (MSPE), if the underlying model is misspecified. On the other hand, when the underlying model is correctly specified, the overall predictive performance of the OBP is very similar to that of the EBLUP if the number of small areas is large. A general theory about OBP, including its exact MSPE comparison with the BLUP in the context of mixed-model prediction, and asymptotic behavior of the BPE, is developed. A real data example is considered. A supplementary appendix is available online.


Journal of Computational and Graphical Statistics | 2010

Local sparse bump hunting

Jean Eudes Dazard; J. Sunil Rao

The search for structures in real datasets, for example, in the form of bumps, components, classes, or clusters, is important as these often reveal underlying phenomena leading to scientific discoveries. One of these tasks, known as bump hunting, is to locate domains of a multidimensional input space where the target function assumes local maxima without prespecifying their total number. A number of related methods already exist, yet are challenged in the context of high-dimensional data. We introduce a novel supervised and multivariate bump hunting strategy for exploring modes or classes of a target function of many continuous variables. This addresses the issues of correlation, interpretability, and high-dimensionality (p ≫ n case), while making minimal assumptions. The method is based upon a divide and conquer strategy, combining a tree-based method, a dimension reduction technique, and the Patient Rule Induction Method (PRIM). Important to this task, we show how to estimate the PRIM meta-parameters. Using accuracy evaluation procedures such as cross-validation and ROC analysis, we show empirically how the method outperforms a naive PRIM as well as competitive nonparametric supervised and unsupervised methods in the problem of class discovery. The method has practical application especially in the case of noisy high-throughput data. It is applied to a class discovery problem in a colon cancer microarray dataset aimed at identifying tumor subtypes in the metastatic stage. Supplemental Materials are available online.


Econometric Theory | 2008

AN IN-DEPTH LOOK AT HIGHEST POSTERIOR MODEL SELECTION

Tanujit Dey; Hemant Ishwaran; J. Sunil Rao

We consider the properties of the highest posterior probability model in a linear regression setting+ Under a spike and slab hierarchy we find that although highest posterior model selection is total risk consistent, it possesses hidden undesirable properties+ One such property is a marked underfitting in finite samples, a phenomenon well noted for Bayesian information criterion ~BIC! related procedures but not often associated with highest posterior model selection+ Another concern is the substantial effect the prior has on model selection+ We employ a rescaling of the hierarchy and show that the resulting rescaled spike and slab models mitigate the effects of underfitting because of a perfect cancellation of a BIC-like penalty term+ Furthermore, by drawing upon an equivalence between the highest posterior model and the median model, we find that the effect of the prior is less influential on model selection, as long as the underlying true model is sparse+ Nonsparse settings are, however, problematic+ Using the posterior mean for variable selection instead of posterior inclusion probabilities avoids these issues+


Journal of Statistical Planning and Inference | 2000

The gic for model selection : a hypothesis testing approach

Jun Shao; J. Sunil Rao

Abstract We consider the model (subset) selection problem for linear regression. Although hypothesis testing and model selection are two different approaches, there are similarities between them. In this article we combine these two approaches together and propose a particular choice of the penalty parameter in the generalized information criterion (GIC), which leads to a model selection procedure that inherits good properties from both approaches, i.e., its overfitting and underfitting probabilities converge to 0 as the sample size n →∞ and, when n is fixed, its overfitting probability is controlled to be approximately under a pre-assigned level of significance.

Collaboration


Dive into the J. Sunil Rao's collaboration.

Top Co-Authors

Avatar

Jean Eudes Dazard

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Jiming Jiang

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Theresa P. Pretlow

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Thomas G. Pretlow

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Xing Pei Hao

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph Willis

Case Western Reserve University

View shared research outputs
Top Co-Authors

Avatar

Michael Choe

Case Western Reserve University

View shared research outputs
Researchain Logo
Decentralizing Knowledge