Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Donald B. Rubin is active.

Publication


Featured researches published by Donald B. Rubin.


The American Statistician | 1985

Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score

Paul R. Rosenbaum; Donald B. Rubin

Abstract Matched sampling is a method for selecting units from a large reservoir of potential controls to produce a control group of modest size that is similar to a treated group with respect to the distribution of observed covariates. We illustrate the use of multivariate matching methods in an observational study of the effects of prenatal exposure to barbiturates on subsequent psychological development. A key idea is the use of the propensity score as a distinct matching variable.


Journal of the American Statistical Association | 1996

Identification of Causal Effects Using Instrumental Variables

Joshua D. Angrist; Guido W. Imbens; Donald B. Rubin

Abstract We outline a framework for causal inference in settings where assignment to a binary treatment is ignorable, but compliance with the assignment is not perfect so that the receipt of treatment is nonignorable. To address the problems associated with comparing subjects by the ignorable assignment—an “intention-to-treat analysis”—we make use of instrumental variables, which have long been used by economists in the context of regression models with constant treatment effects. We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model (RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers. Without these assumptions, the IV estimand is simply the ratio of intention-to-treat causal estimands with no interpretation as an average causal effect. The advantages of embedding the IV approach in the RCM are that it clarifies the nature of critical assumptions needed for a...


Journal of the American Statistical Association | 1984

Reducing Bias in Observational Studies Using Subclassification on the Propensity Score

Paul R. Rosenbaum; Donald B. Rubin

Abstract The propensity score is the conditional probability of assignment to a particular treatment given a vector of observed covariates. Previous theoretical arguments have shown that subclassification on the propensity score will balance all observed covariates. Subclassification on an estimated propensity score is illustrated, using observational data on treatments for coronary artery disease. Five subclasses defined by the estimated propensity score are constructed that balance 74 covariates, and thereby provide estimates of treatment effects using direct adjustment. These subclasses are applied within sub-populations, and model-based adjustments are then used to provide estimates of treatment effects within these sub-populations. Two appendixes address theoretical issues related to the application: the effectiveness of subclassification on the propensity score in removing bias, and balancing properties of propensity scores with incomplete data.


Annals of Internal Medicine | 1997

Estimating Causal Effects from Large Data Sets Using Propensity Scores

Donald B. Rubin

Many observational studies based on large databases attempt to estimate the causal effects of some new treatment or exposure relative to a control condition, such as the effect of smoking on mortality. In most such studies, it is necessary to control for naturally occurring systematic differences in background characteristics between the treatment group and the control group, such as age or sex distributions, that would not occur in the context of a randomized experiment. Typically, many background characteristics need to be controlled. Propensity score technology, introduced by Rosenbaum and Rubin [1], addresses this situation by reducing the entire collection of background characteristics to a single composite characteristic that appropriately summarizes the collection. This reduction from many characteristics to one composite characteristic allows the straightforward assessment of whether the treatment and control groups overlap enough with respect to background characteristics to allow a sensible estimation of treatment versus control effects from the data set. Moreover, when such overlap is present, the propensity score approach allows a straightforward estimation of treatment versus control effects that reflects adjustment for differences in all observed background characteristics. Subclassification on One Confounding Variable Before describing the use of propensity scores in the statistical analysis of observational studies with many confounding background characteristics, I begin with an example showing how subclassification adjusts for a single confounding covariate, such as age, in a study of smoking and mortality. I then show how propensity score methods generalize subclassification in the presence of many confounding covariates, such as age, region of the country, and sex. The potential for a large database to suggest causal effects of treatments is indicated in Table 1, adapted from Cochrans work [2], which concerns mortality rates per 1000 person-years for nonsmokers, cigarette smokers, and cigar and pipe smokers drawn from three large databases in the United States, the United Kingdom, and Canada. The treatment factor here involves three levels of smoking. The unadjusted mortality rates in Table 1 make it seem that cigarette smoking is good for health, especially relative to cigar and pipe smoking; clearly, this result is contrary to current wisdom. A problem with this naive conclusion is exposed in Table 1, where the average ages of the subpopulations are given. Age correlates with both mortality rates and smoking behavior. In this example, age is a confounding covariate, and conclusions about the effects of smoking should be adjusted for its effects. Table 1. Comparison of Mortality Rates for Three Smoking Groups in Three Databases* A straightforward way of adjusting for age is to 1) divide the population into age categories of approximately equal size [such as younger and older if two categories are appropriate; younger, middle-aged, and older if three are appropriate; and so on], 2) compare mortality rates within an age category [for example, compare mortality rates for the three treatment groups within the younger population and similarly for the older population], and 3) average the age-group-specific comparisons to obtain overall estimates of the age-adjusted mortality rates per 1000 person-years for each of the three groups. Table 1 shows the results for different numbers of age categories where the subclass-age boundaries were defined to have equal numbers of nonsmokers in each subclass. These results align better than the unadjusted mortality rates with our current understanding of the effects of smoking, especially when 9 to 11 subclasses are used. Incidentally, having approximately equal numbers of nonsmokers within each subclass is not necessary, but if the nonsmokers are considered the baseline group, it is a convenient and efficient choice because then the overall estimated effect is the simple unweighted average of the subclass-specific results. That is, the mortality rates in all three groups are being standardized [3] to the age distribution of nonsmokers as defined by their subclass counts. Cochran [2] calls this method subclassification and offers theoretical results showing that as long as the treatment and exposure groups overlap in their age distributions (that is, as long as a reasonable number of persons from each treatment group are in each subclass), comparisons using five or six subclasses will typically remove 90% or more of the bias present in the raw comparisons shown in Table 1. More than five subclasses were used for the adjusted mortality rates because the large size of the data sets made it possible to do so. A particular statistical model, such as a linear regression (or a logistic regression model; or in other settings, a hazard model) could have been used to adjust for age, but subclassification has two distinct advantages over such models, at least for offering initial trustworthy comparisons that are easy to communicate. First, if the treatment or exposure groups do not adequately overlap on the confounding covariate age, the investigator will see it immediately and be warned. Thus, if members of one group have ages outside the range of another groups ages, it will be obvious because one or more age-specific subclasses will consist almost solely of members exposed to one treatment. In contrast, nothing in the standard output of any regression modeling software will display this critical fact; the reason is that models predict an outcome (such as death) from regressors (such as age and treatment indicators), and standard regression diagnostics do not include careful analysis of the joint distribution of the regressors (such as a comparison of the distributions of age across treatment groups). When the overlap on age is too limited, the database, no matter how large, cannot support any causal conclusions about the differential effects of the treatments. For example, comparing 5-year survival rates among 70-year-old smokers and 40-year-old nonsmokers gives essentially no information about the effect of smoking or nonsmoking for either 70-year-old or 40-year-old persons. The second reason for preferring subclassification to models concerns situations such as that found in Table 1, in which the groups overlap enough on the confounding covariate to make a comparison possible. Subclassification does not rely on any particular functional form, such as linearity, for the relation between the outcome (death) and the covariate (age) within each treatment group, whereas models do. If the groups have similar distributions of the covariate, such specific assumptions like linearity are usually harmless, but when the groups have different covariate distributions, model-based methods of adjustment are dependent on the specific form of the model (for example, linearity or log linearity) and their results are determined by untrustworthy extrapolations. If standard models can be so dangerous, why are they commonly used for such adjustments when large databases are examined for estimates of causal effects? One reason is the ease with which automatic data analysis can be done using existing, pervasive software on plentiful, speedy hardware. A second reason is the seeming difficulty of using subclassification when many confounding covariates need adjustment, which is the common case. Standard modeling software can automatically handle many regressor variables and produce results, although they can be remarkably misleading. With many confounding covariates, however, the issues of lack of adequate overlap and reliance on untrustworthy model-based extrapolations are even more serious than with only one confounding covariate. The reason is that small differences in many covariates can accumulate into a substantial overall difference. For example, if members of one treatment or exposure group are slightly older, have slightly higher cholesterol levels, and have slightly more familial history of cancer, that group may be substantially less healthy. Moreover, although standard comparisons of means between the groups like those in Table 1, or comparisons of histograms for each confounding covariate among groups are adequate with one covariate, they are inadequate with more than one. The groups may differ in a multivariate direction to an extent that cannot be discerned from separate analyses of each covariate. This multivariate direction is closely related to the statistical concept of the best linear discriminant and intuitively is the single combination of the covariates on which the treatment groups are farthest apart. Subclassification techniques can be applied with many covariates with almost the same reliability as with only one covariate. The key idea is to use propensity score techniques, as developed by Rosenbaum and Rubin [1]. These methods can be viewed as important extensions of discriminant matching techniques, which calculate the best linear discriminant between the treatment groups and match on it [4]. Since their introduction approximately 15 years ago, propensity score methods have been used in various applied problems in medical and other research disciplines [5-23] but not nearly as frequently as they should have been relative to model-based methods. Propensity Score Methods Propensity score methods must be applied to groups two at a time. Therefore, an example with three treatment or exposure conditions will generally yield three distinct propensity scores, one for each comparison (for the example in Table 1, non-smokers compared with cigarette smokers, non-smokers compared with cigar and pipe smokers, and cigarette smokers compared with cigar and pipe smokers). To describe the way propensity scores work, I first assume two treatment conditions. Cases with more than two treatment groups are considered later. The basic idea of propensity score methods is to replace


Journal of the American Statistical Association | 1996

Multiple Imputation after 18+ Years

Donald B. Rubin

Abstract Multiple imputation was designed to handle the problem of missing data in public-use data bases where the data-base constructor and the ultimate user are distinct entities. The objective is valid frequency inference for ultimate users who in general have access only to complete-data software and possess limited knowledge of specific reasons and models for nonresponse. For this situation and objective, I believe that multiple imputation by the data-base constructor is the method of choice. This article first provides a description of the assumed context and objectives, and second, reviews the multiple imputation framework and its standard results. These preliminary discussions are especially important because some recent commentaries on multiple imputation have reflected either misunderstandings of the practical objectives of multiple imputation or misunderstandings of fundamental theoretical results. Then, criticisms of multiple imputation are considered, and, finally, comparisons are made to alt...


Psychological Bulletin | 1992

Comparing correlated correlation coefficients

Xiao-Li Meng; Robert Rosenthal; Donald B. Rubin

The purpose of this article is to provide simple but accurate methods for comparing correlation coefficients between a dependent variable and a set of independent variables. The methods are simple extensions of Dunn & Clarks (1969) work using the Fisher z transformation and include a test and confidence interval for comparing two correlated correlations, a test for heterogeneity, and a test and confidence interval for a contrast among k (>2) correlated correlations. Also briefly discussed is why the traditional Hotellings t test for comparing correlated correlations is generally not appropriate in practice


Behavioral and Brain Sciences | 1978

Interpersonal expectancy effects: the first 345 studies

Robert Rosenthal; Donald B. Rubin

The research area of interpersonal expectancy effects originally derived from a general consideration of the effects of experimenters on the results of their research. One of these is the expectancy effect, the tendency for experimenters to obtain results they expect, not simply because they have correctly anticipated natures response but rather because they have helped to shape that response through their expectations. When behavioral researchers expect certain results from their human (or animal) subjects they appear unwittingly to treat them in such a way as to increase the probability that they will respond as expected. In the first few years of research on this problem of the interpersonal (or interorganism) self-fulfilling prophecy, the “prophet” was always an experimenter and the affected phenomenon was always the behavior of an experimental subject. In more recent years, however, the research has been extended from experimenters to teachers, employers, and therapists whose expectations for their pupils, employees, and patients might also come to serve as interpersonal self-fulfilling prophecies. Our general purpose is to summarize the results of 345 experiments investigating interpersonal expectancy effects. These studies fall into eight broad categories of research: reaction time, inkblot tests, animal learning, laboratory interviews, psychophysical judgments, learning and ability, person perception, and everyday life situations. For the entire sample of studies, as well as for each specific research area, we (1) determine the overall probability that interpersonal expectancy effects do in fact occur, (2) estimate their average magnitude so as to evaluate their substantive and methodological importance, and (3) illustrate some methods that may be useful to others wishing to summarize quantitatively entire bodies of research (a practice that is, happily, on the increase).


Biometrics | 1996

Matching Using Estimated Propensity Scores : Relating Theory to Practice

Donald B. Rubin; Neal Thomas

Matched sampling is a standard technique in the evaluation of treatments in observational studies. Matching on estimated propensity scores comprises an important class of procedures when there are numerous matching variables. Recent theoretical work (Rubin, D. B. and Thomas, N., 1992, The Annals of Statistics 20, 1079-1093) on affinely invariant matching methods with ellipsoidal distributions provides a general framework for evaluating the operating characteristics of such methods. Moreover, Rubin and Thomas (1992, Biometrika 79, 797-809) uses this framework to derive several analytic approximations under normality for the distribution of the first two moments of the matching variables in samples obtained by matching on estimated linear propensity scores. Here we provide a bridge between these theoretical approximations and actual practice. First, we complete and refine the nomal-based analytic approximations, thereby making it possible to apply these results to practice. Second, we perform Monte Carlo evaluations of the analytic results under normal and nonnormal ellipsoidal distributions, which confirm the accuracy of the analytic approximations, and demonstrate the predictable ways in which the approximations deviate from simulation results when normal assumptions are violated within the ellipsoidal family. Third, we apply the analytic approximations to real data with clearly nonellipsoidal distributions, and show that the theoretical expressions, although derived under artificial distributional conditions, produce useful guidance for practice. Our results delineate the wide range of settings in which matching on estimated linear propensity scores performs well, thereby providing useful information for the design of matching studies. When matching with a particular data set, our theoretical approximations provide benchmarks for expected performance under favorable conditions, thereby identifying matching variables requiring special treatment. After matching is complete and data analysis is at hand, our results provide the variances required to compute valid standard errors for common estimators.


Health Services and Outcomes Research Methodology | 2001

Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation

Donald B. Rubin

Propensity score methodology can be used to help design observational studies in a way analogous to the way randomized experiments are designed: without seeing any answers involving outcome variables. The typical models used to analyze observational data (e.g., least squares regressions, difference of difference methods) involve outcomes, and so cannot be used for design in this sense. Because the propensity score is a function only of covariates, not outcomes, repeated analyses attempting to balance covariate distributions across treatment groups do not bias estimates of the treatment effect on outcome variables. This theme will the primary focus of this article: how to use the techniques of matching, subclassification and/or weighting to help design observational studies. The article also proposes a new diagnostic table to aid in this endeavor, which is especially useful when there are many covariates under consideration. The conclusion of the initial design phase may be that the treatment and control groups are too far apart to produce reliable effect estimates without heroic modeling assumptions. In such cases, it may be wisest to abandon the intended observational study, and search for a more acceptable data set where such heroic modeling assumptions are not necessary. The ideas and techniques will be illustrated using the initial design of an observational study for use in the tobacco litigation based on the NMES data set.


Sociological Methods & Research | 1989

The Analysis of Social Science Data with Missing Values

Roderick J. A. Little; Donald B. Rubin

Methods for handling missing data in social science data sets are reviewed. Limitations of common practical approaches, including complete-case analysis, available-case analysis and imputation, are illustrated on a simple missing-data problem with one complete and one incomplete variable. Two more principled approaches, namely maximum likelihood under a model for the data and missing-data mechanism and multiple imputation, are applied to the bivariate problem. General properties of these methods are outlined, and applications to more complex missing-data problems are discussed. The EM algorithm, a convenient method for computing maximum likelihood estimates in missing-data problems, is described and applied to two common models, the multivariate normal model for continuous data and the multinomial model for discrete data. Multiple imputation under explicit or implicit models is recommended as a method that retains the advantages of imputation and overcomes its limitations.

Collaboration


Dive into the Donald B. Rubin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Paul R. Rosenbaum

University of Pennsylvania

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge