Sam Efromovich
University of Texas at Dallas
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Sam Efromovich.
Methods in Enzymology | 2008
Sam Efromovich; David C. Grainger; Diane M. Bodenmiller; Stephen Spiro
NsrR is a nitric oxide-sensitive regulator of transcription. In Escherichia coli, NsrR is a repressor of the hmp gene encoding the flavohemoglobin that detoxifies nitric oxide. Three other transcription units (ytfE, ygbA, and hcp-hcr) are known to be subject to regulation by NsrR. This chapter describes experimental and statistical protocols used to identify NsrR-binding sites in the E. coli chromosome using chromatin immunoprecipitation and microarray analysis. The methods are applicable, with suitable modifications, to any regulatory protein and any organism.
Annals of Statistics | 2007
Sam Efromovich
Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors.
Sequential Analysis | 2007
Sam Efromovich
Abstract The paper considers, for the first time in the literature, sharp minimax design of predictors and sharp minimax sequential estimation of regression functions in a classical heteroscedastic nonparametric regression. The suggested methodology of a sharp minimax design of predictors in controlled regression experiments with fixed-size samples is based on minimization of the coefficient of difficulty of an underlying regression model, which is defined as a factor in changing the sample size that makes the estimation problem comparable with estimation in a homoscedastic regression with unit-variance errors and uniform design. It is established that an optimal design density is proportional to an underlying scale function. This makes a sequential design of predictors, based on a sequential choice of design densities, a particularly attractive statistical strategy that allows the statistician to minimize the coefficient of difficulty during a controlled experiment. It is shown that a developed sequential design of predictors asymptotically matches the mean integrated squared error (MISE) of an oracle that knows the optimal design (the scale function). Another considered problem is a sequential estimation of an underlying regression function with a preassigned MISE and sharp minimax mean stopping time in a heteroscedastic regression setting where predictors are generated according to some (in general unknown) underlying design density. For this setting the theory of a sharp minimax sequential estimation of a regression function is developed and a sharp minimax sequential estimator is suggested. Finally, a problem of sequential estimation of a regression function based on a sequential design is considered. A procedure of sequential analysis of heteroscedastic regression is suggested that matches the performance of an oracle that knows the optimal design density and optimal stopping time, which implies a preassigned MISE of estimation of an underlying regression function. A discussion of possible extensions and further developments is also presented.
Annals of Statistics | 2008
Sam Efromovich
The theory of adaptive estimation and oracle inequalities for the case of Gaussian-shift-finite-interval experiments has made significant progress in recent years. In particular, sharp-minimax adaptive estimators and exact exponential-type oracle inequalities have been suggested for a vast set of functions including analytic and Sobolev with any positive index as well as for Efromovich-Pinsker and Stein blockwise-shrinkage estimators. Is it possible to obtain similar results for a more interesting applied problem of density estimation and/or the dual problem of characteristic function estimation? The answer is yes. In particular, the obtained results include exact exponential-type oracle inequalities which allow to consider, for the first time in the literature, a simultaneous sharp-minimax estimation of Sobolev densities with any positive index (not necessarily larger than 1/2), infinitely differentiable densities (including analytic, entire and stable), as well as of not absolutely integrable characteristic functions. The same adaptive estimator is also rate minimax over a familiar class of distributions with bounded spectrum where the density and the characteristic function can be estimated with the parametric rate.
Statistical Applications in Genetics and Molecular Biology | 2008
Sam Efromovich; Laura Kubatko
The relationship between speciation times and the corresponding times of gene divergence is of interest in phylogenetic inference as a means of understanding the past evolutionary dynamics of populations and of estimating the timing of speciation events. It has long been recognized that gene divergence times might substantially pre-date speciation events. Although the distribution of the difference between these has previously been studied for the case of two populations, this distribution has not been explicitly computed for larger species phylogenies. Here we derive a simple method for computing this distribution for trees of arbitrary size. A two-stage procedure is proposed which (i) considers the probability distribution of the time from the speciation event at the root of the species tree to the gene coalescent time conditionally on the number of gene lineages available at the root; and (ii) calculates the probability mass function for the number of gene lineages at the root. This two-stage approach dramatically simplifies numerical analysis, because in the first step the conditional distribution does not depend on an underlying species tree, while in the second step the pattern of gene coalescence prior to the species tree root is irrelevant. In addition, the algorithm provides intuition concerning the properties of the distribution with respect to the various features of the underlying species tree. The methodology is complemented by developing probabilistic formulae and software, written in R. The method and software are tested on five-taxon species trees with varying levels of symmetry. The examples demonstrate that more symmetric species trees tend to have larger mean coalescent times and are more likely to have a unimodal gamma-like distribution with a long right tail, while asymmetric trees tend to have smaller mean coalescent times with an exponential-like distribution. In addition, species trees with longer branches generally have shorter mean coalescent times, with branches closest to the root of the tree being most influential.
Journal of Multivariate Analysis | 2011
Sam Efromovich
The problem of nonparametric estimation of the joint probability density of a vector of continuous and ordinal/nominal categorical random variables with bounded support is considered. There are numerous publications devoted to the cases of either continuous or categorical variables, and the curse of dimensionality and strong regularity assumptions are the two familiar issues in the literature. Mixed variables occur in practically all applications of the statistical science and, nonetheless, the literature devoted to the joint density estimation is practically next to none. This paper develops the theory of estimation of the density of mixed variables which is on par with results known for simpler settings. Specifically, a data-driven estimator is developed that adapts to unknown anisotropic smoothness of the joint density and, whenever the density depends on a smaller number of variables, performs a dimension reduction that implies the corresponding optimal rate of the mean integrated squared error (MISE) convergence. The results hold without traditional, in the density estimation literature, minimal regularity assumptions like differentiability or continuity of the density. The procedure of estimation is based on mimicking an oracle-estimator that knows the underlying density, and the main theoretical result is the oracle inequality which relates the MISEs of the estimator and the oracle-estimator. The proof is based on a new exponential inequality for Sobolev statistics which is of interest on its own merits.
Journal of biometrics & biostatistics | 2013
Sam Efromovich; Ekaterina Smirnova
The paper describes the theory, methods and application of statistical analysis of large-p-small-n cross-correlation matrices arising in fMRI studies of neuroplasticity, which is the ability of the brain to recognize neural pathways based on new experience and change in learning. Traditionally these studies are based on averaging images over large areas in right and left hemispheres and then finding a single cross-correlation function. It is proposed to conduct such an analysis based on a voxel-to-voxel level which immediately yields large cross-correlation matrices. Furthermore, the matrices have an interesting property to have both sparse and dense rows and columns. Main steps in solving the problem are: (i) treat observations, available for a single voxel, as a nonparametric regression; (ii) use a wavelet transform and then work with empirical wavelet coefficients; (iii) develop the theory and methods of adaptive simultaneous confidence intervals and adaptive rate-minimax thresholding estimation for the matrices. The developed methods are illustrated via analysis of fMRI experiments and the results allow us not only conclude that during fMRI experiments there is a change in cross-correlation between left and right hemispheres (the fact well known in the literature), but that we can also enrich our understanding how neural pathways are activated and then remain activated in timeon a single voxel-to-voxel level.
Sequential Analysis | 2015
Sam Efromovich
Abstract Stein (1945) proposed a two-stage sequential methodology of inference that influenced numerous areas of statistics. In this article, the Steins methodology is used and expands upon nonparametric estimation of the directional probability density. The aim is to propose a data-driven nonparametric sequential procedure that mimics performance of an oracle that knows smoothness of an estimated density and minimizes the mean stopping time given an assigned mean integrated squared error. For such a setting, using a sequential estimator is the must because smoothness of the density is unknown. It is known that for a general random variable the stated problem has no solution because an estimator cannot perform as well as the minimax oracle. At the same time, this article shows that for the case of directional density, under a mild assumption, there exists a data-driven two-stage sequential procedure that is minimax and adapts to unknown smoothness of an underlying density. Furthermore, we are able to solve the same problem for a setting where some observations in a sample may be lost due to a stochastic missing mechanism. This is a practically important and theoretically interesting problem because for missing data even the above-mentioned oracle cannot solve it without knowing the missing mechanism.
Sequential Analysis | 2012
Sam Efromovich
Abstract Regression for data with randomly missed responses is a well-known and complicated statistical problem. This article, for the first time in the literature, explores the asymptotic theory of sharp minimax sequential estimation of a regression function for two classical settings. The former is when the design of predictors is random and an expected stopping time (or its moment) is bounded, and the latter is when the sample size is fixed and predictors can be chosen sequentially to attenuate effects of heteroscedasticity and missing responses. For the former setting it is shown that sequential estimation cannot outperform a design with a fixed sample size. This conclusion expands the famous single-parameter result of Anscombe (1952) upon nonparametric regression (infinite-dimensional parameter) with missing data. For the latter setting a sequential design of predictors is proposed that allows the statistician to match performance of a sharp minimax oracle-estimator that knows all nuisance functions and parameters, including the scale function, the conditional probability of missing the response given the predictor, and smoothness of estimated regression. A numerical study is presented.
Communications in Statistics-theory and Methods | 2009
Sam Efromovich
Multiwavelets imply a better spatial and temporal resolution than uniwavelets. Nonetheless, they are practically unknown to statisticians and practitioners, and multiwavelet statistical literature is practically next to none. This article is devoted to the theory and applications of multiwavelets in microarray analysis and functional magnetic resonance imaging.