Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Debashis Paul is active.

Publication


Featured researches published by Debashis Paul.


Journal of the American Statistical Association | 2006

Prediction by Supervised Principal Components

Eric Bair; Trevor Hastie; Debashis Paul; Robert Tibshirani

In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer.


IEEE Transactions on Information Theory | 2006

On the distribution of SINR for the MMSE MIMO receiver and performance analysis

Ping Li; Debashis Paul; Ravi Narasimhan; John M. Cioffi

This correspondence studies the statistical distribution of the signal-to-interference-plus-noise ratio (SINR) for the minimum mean-square error (MMSE) receiver in multiple-input multiple-output (MIMO) wireless communications. The channel model is assumed to be (transmit) correlated Rayleigh flat-fading with unequal powers. The SINR can be decomposed into two independent random variables: SINR=SINR/sup ZF/+T, where SINR/sup ZF/ corresponds to the SINR for a zero-forcing (ZF) receiver and has an exact Gamma distribution. This correspondence focuses on characterizing the statistical properties of T using the results from random matrix theory. First three asymptotic moments of T are derived for uncorrelated channels and channels with equicorrelations. For general correlated channels, some limiting upper bounds for the first three moments are also provided. For uncorrelated channels and correlated channels satisfying certain conditions, it is proved that T converges to a Normal random variable. A Gamma distribution and a generalized Gamma distribution are proposed as approximations to the finite sample distribution of T. Simulations suggest that these approximate distributions can be used to estimate accurately the probability of errors even for very small dimensions (e.g., two transmit antennas).


Annals of Statistics | 2013

Minimax bounds for sparse PCA with noisy high-dimensional data

Aharon Birnbaum; Iain M. Johnstone; Boaz Nadler; Debashis Paul

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l2 loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.


Journal of Computational and Graphical Statistics | 2009

A Geometric Approach to Maximum Likelihood Estimation of the Functional Principal Components From Sparse Longitudinal Data

Jie Peng; Debashis Paul

In this article, we consider the problem of estimating the eigenvalues and eigenfunctions of the covariance kernel (i.e., the functional principal components) from sparse and irregularly observed longitudinal data. We exploit the smoothness of the eigenfunctions to reduce dimensionality by restricting them to a lower dimensional space of smooth functions. We then approach this problem through a restricted maximum likelihood method. The estimation scheme is based on a Newton–Raphson procedure on the Stiefel manifold using the fact that the basis coefficient matrix for representing the eigenfunctions has orthonormal columns. We also address the selection of the number of basis functions, as well as that of the dimension of the covariance kernel by a second-order approximation to the leave-one-curve-out cross-validation score that is computationally very efficient. The effectiveness of our procedure is demonstrated by simulation studies and an application to a CD4+ counts dataset. In the simulation studies, our method performs well on both estimation and model selection. It also outperforms two existing approaches: one based on a local polynomial smoothing, and another using an EM algorithm. Supplementary materials including technical details, the R package fpca, and data analyzed by this article are available online.


Journal of the American Statistical Association | 2011

A Regularized Hotelling’s T2 Test for Pathway Analysis in Proteomic Studies

Lin Chen; Debashis Paul; Ross L. Prentice; Pei Wang

Recent proteomic studies have identified proteins related to specific phenotypes. In addition to marginal association analysis for individual proteins, analyzing pathways (functionally related sets of proteins) may yield additional valuable insights. Identifying pathways that differ between phenotypes can be conceptualized as a multivariate hypothesis testing problem: whether the mean vector μ of a p-dimensional random vector X is μ0. Proteins within the same biological pathway may correlate with one another in a complicated way, and Type I error rates can be inflated if such correlations are incorrectly assumed to be absent. The inflation tends to be more pronounced when the sample size is very small or there is a large amount of missingness in the data, as is frequently the case in proteomic discovery studies. To tackle these challenges, we propose a regularized Hotelling’s T2 (RHT) statistic together with a nonparametric testing procedure, which effectively controls the Type I error rate and maintains good power in the presence of complex correlation structures and missing data patterns. We investigate asymptotic properties of the RHT statistic under pertinent assumptions and compare the test performance with four existing methods through simulation examples. We apply the RHT test to a hormone therapy proteomics dataset, and identify several interesting biological pathways for which blood serum concentrations changed following hormone therapy initiation. This article has supplementary material online.


Annals of Statistics | 2009

CONSISTENCY OF RESTRICTED MAXIMUM LIKELIHOOD ESTIMATORS OF PRINCIPAL COMPONENTS

Debashis Paul; Jie Peng

In this paper we consider two closely related problems : estimation of eigenvalues and eigenfunctions of the covariance kernel of functional data based on (possibly) irregular measurements, and the problem of estimating the eigenvalues and eigenvectors of the covariance matrix for high-dimensional Gaussian vectors. In [23], a restricted maximum likelihood (REML) approach has been developed to deal with the flrst problem. In this paper, we establish consistency and derive rate of convergence of the REML estimator for the functional data case, under appropriate smoothness conditions. Moreover, we prove that when the number of measurements per sample curve is bounded, under squared-error loss, the rate of convergence of the REML estimators of eigenfunctions is near-optimal. In the case of Gaussian vectors, asymptotic consistency and an e‐cient score representation of the estimators are obtained under the assumption that the efiective dimension grows at a rate slower than the sample size. These results are derived through an explicit utilization of the intrinsic geometry of the parameter space, which is non-Euclidean. Moreover, the results derived in this paper suggest an asymptotic equivalence between the inference on functional data with dense measurements and that of the high dimensional Gaussian vectors.


Annals of Statistics | 2015

On the Marčenko–Pastur law for linear time series

Haoyang Liu; Alexander Aue; Debashis Paul

This paper is concerned with extensions of the classical Mar\v{c}enko-Pastur law to time series. Specifically,


Computational Statistics & Data Analysis | 2011

Nonstationary covariance modeling for incomplete data: Monte Carlo EM approach

Tomoko Matsuo; Douglas W. Nychka; Debashis Paul

p


Electronic Journal of Statistics | 2011

Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

Debashis Paul; Jie Peng

-dimensional linear processes are considered which are built from innovation vectors with independent, identically distributed (real- or complex-valued) entries possessing zero mean, unit variance and finite fourth moments. The coefficient matrices of the linear process are assumed to be simultaneously diagonalizable. In this setting, the limiting behavior of the empirical spectral distribution of both sample covariance and symmetrized sample autocovariance matrices is determined in the high-dimensional setting


Bernoulli | 2009

Tie-respecting bootstrap methods for estimating distributions of sets and functions of eigenvalues

Peter Hall; Young Kyung Lee; Byeong U. Park; Debashis Paul

p/n\to c\in (0,\infty)

Collaboration


Dive into the Debashis Paul's collaboration.

Top Co-Authors

Avatar

Jie Peng

University of California

View shared research outputs
Top Co-Authors

Avatar

Alexander Aue

University of California

View shared research outputs
Top Co-Authors

Avatar

Prabir Burman

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lili Wang

University of California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jun Chen

University of California

View shared research outputs
Top Co-Authors

Avatar

Pei Wang

Icahn School of Medicine at Mount Sinai

View shared research outputs
Researchain Logo
Decentralizing Knowledge