Daniel Rubin
United States Department of Health and Human Services
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel Rubin.
The International Journal of Biostatistics | 2006
Mark J. van der Laan; Daniel Rubin
Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of a given likelihood based density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and might therefore result in a poor estimator of a particular smooth functional of the density. In this article we propose a one step (and, by iteration, k-th step) targeted maximum likelihood density estimator which involves 1) creating a hardest parametric submodel with parameter epsilon through the given density estimator with score equal to the efficient influence curve of the pathwise differentiable parameter at the density estimator, 2) estimating epsilon with the maximum likelihood estimator, and 3) defining a new density estimator as the corresponding update of the original density estimator. We show that iteration of this algorithm results in a targeted maximum likelihood density estimator which solves the efficient influence curve estimating equation and thereby yields a locally efficient estimator of the parameter of interest, under regularity conditions. In particular, we show that, if the parameter is linear and the model is convex, then the targeted maximum likelihood estimator is often achieved in the first step, and it results in a locally efficient estimator at an arbitrary (e.g., heavily misspecified) starting density.We also show that the targeted maximum likelihood estimators are now in full agreement with the locally efficient estimating function methodology as presented in Robins and Rotnitzky (1992) and van der Laan and Robins (2003), creating, in particular, algebraic equivalence between the double robust locally efficient estimators using the targeted maximum likelihood estimators as an estimate of its nuisance parameters, and targeted maximum likelihood estimators. In addition, it is argued that the targeted MLE has various advantages relative to the current estimating function based approach. We proceed by providing data driven methodologies to select the initial density estimator for the targeted MLE, thereby providing data adaptive targeted maximum likelihood estimation methodology. We illustrate the method with various worked out examples.
Statistical Applications in Genetics and Molecular Biology | 2006
Daniel Rubin; Sandrine Dudoit; Mark J. van der Laan
Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the most powerful method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used in a practical setting. We instead consider splitting the sample so that the optimal cutoffs are estimated from a portion of the data, and then testing on the remaining data using these estimated cutoffs. When the null distributions for all test statistics are the same, the obvious way to control the expected number of false positives would be to use a common cutoff for all tests. In this work, we consider the common cutoff method as a benchmark multiple testing procedure. We show that in certain circumstances the use of estimated optimal cutoffs via sample splitting can dramatically outperform this benchmark method, resulting in increased true discoveries, while retaining Type-I error control. This paper is an updated version of the work presented in Rubin et al. (2005), later expanded upon by Wasserman and Roeder (2006).
The International Journal of Biostatistics | 2012
Daniel Rubin; Mark J. van der Laan
Abstract We discuss using clinical trial data to construct and evaluate rules that use baseline covariates to assign different treatments to different patients. Given such a candidate personalization rule, we first note that its performance can often be evaluated without actually applying the rule to subjects, and a class of estimators is characterized from a statistical efficiency standpoint. We also point out a recently noted reduction of the rule construction problem to a classification task and extend results in this direction. Together these facts suggest a natural form of cross-validation in which a personalized medicine rule can be constructed from clinical trial data using standard classification tools and then evaluated in a replicated trial. Because replication is often required by the FDA to provide evidence of safety and efficacy before pharmaceutical drugs can be marketed, there are abundant data with which to explore the potential benefits of more tailored therapy. We constructed and evaluated personalized medicine rules using simulations based on two active-controlled randomized clinical trials of antibacterial drugs for the treatment of skin and skin structure infections. Unfortunately we present negative results that did not suggest benefit from personalization. We discuss the implications of this finding and why statistical approaches to personalized medicine problems will often face difficult challenges.
The International Journal of Biostatistics | 2016
Daniel Rubin
Abstract The Optimal Discovery Procedure (ODP) is a method for simultaneous hypothesis testing that attempts to gain power relative to more standard techniques by exploiting multivariate structure [1]. Specializing to the example of testing whether components of a Gaussian mean vector are zero, we compare the power of the ODP to a Bonferroni-style method and to the Benjamini-Hochberg method when the testing procedures aim to respectively control certain Type I error rate measures, such as the expected number of false positives or the false discovery rate. We show through theoretical results, numerical comparisons, and two microarray examples that when the rejection regions for the ODP test statistics are chosen such that the procedure is guaranteed to uniformly control a Type I error rate measure, the technique is generally less powerful than competing methods. We contrast and explain these results in light of previously proven optimality theory for the ODP. We also compare the ordering given by the ODP test statistics to the standard rankings based on sorting univariate p-values from smallest to largest. In the cases we considered the standard ordering was superior, and ODP rankings were adversely impacted by correlation.
Archive | 2005
Mark J. van der Laan; Daniel Rubin
Archive | 2007
Mark J. van der Laan; Daniel Rubin
Archive | 2008
Daniel Rubin; Mark J. van der Laan
Archive | 2007
Daniel Rubin; Mark J. van der Laan
The International Journal of Biostatistics | 2010
Daniel Rubin
Archive | 2006
Daniel Rubin; Mark J. van der Laan