Joseph L. Schafer
Pennsylvania State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Joseph L. Schafer.
Statistical Science | 2007
Joseph Kang; Joseph L. Schafer
When outcomes are missing for reasons beyond an investigators control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the relationships between the covariates and the outcome and use those relationships to predict the missing values. Another is to model the probabilities of missingness given the covariates and incorporate them into a weighted or stratified estimate. Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified. In this article, we show that DR estimates can be constructed in many ways. We compare the performance of various DR and non-DR estimates of a population mean in a simulated example where both models are incorrect but neither is grossly misspecified. Methods that use inverse-probabilities as weights, whether they are DR or not, are sensitive to misspecification of the propensity model when some estimated propensities are small. Many DR methods perform better than simple inverse-probability weighting. None of the DR methods we tried, however, improved upon the performance of simple regression-based prediction of the missing values. This study does not represent every missing-data problem that will arise in practice. But it does demonstrate that, in at least some settings, two wrong models are not better than one.We congratulate Drs. Kang and Schafer (KS henceforth) for a careful and thought-provoking contribution to the literature regarding the so-called “double robustness” property, a topic that still engenders some confusion and disagreement. The authors’ approach of focusing on the simplest situation of estimation of the population mean μ of a response y when y is not observed on all subjects according to a missing at random (MAR) mechanism (equivalently, estimation of the mean of a potential outcome in a causal model under the assumption of no unmeasured confounders) is commendable, as the fundamental issues can be explored without the distractions of the messier notation and considerations required in more complicated settings. Indeed, as the article demonstrates, this simple setting is sufficient to highlight a number of key points. As noted eloquently by Molenberghs (2005), in regard to how such missing data/causal inference problems are best addressed, two “schools” may be identified: the “likelihood-oriented” school and the “weighting-based” school. As we have emphasized previously (Davidian, Tsiatis and Leon, 2005), we prefer to view inference from the vantage point of semi-parametric theory, focusing on the assumptions embedded in the statistical models leading to different “types” of estimators (i.e., “likelihood-oriented” or “weighting-based”) rather than on the forms of the estimators themselves. In this discussion, we hope to complement the presentation of the authors by elaborating on this point of view. Throughout, we use the same notation as in the paper.
Structural Equation Modeling | 2007
Stephanie T. Lanza; Linda M. Collins; David R. Lemmon; Joseph L. Schafer
Latent class analysis (LCA) is a statistical method used to identify a set of discrete, mutually exclusive latent classes of individuals based on their responses to a set of observed categorical variables. In multiple-group LCA, both the measurement part and structural part of the model can vary across groups, and measurement invariance across groups can be empirically tested. LCA with covariates extends the model to include predictors of class membership. In this article, we introduce PROC LCA, a new SAS procedure for conducting LCA, multiple-group LCA, and LCA with covariates. The procedure is demonstrated using data on alcohol use behavior in a national sample of high school seniors.
Psychological Methods | 2008
Joseph L. Schafer; Joseph Kang
In a well-designed experiment, random assignment of participants to treatments makes causal inference straightforward. However, if participants are not randomized (as in observational study, quasi-experiment, or nonequivalent control-group designs), group comparisons may be biased by confounders that influence both the outcome and the alleged cause. Traditional analysis of covariance, which includes confounders as predictors in a regression model, often fails to eliminate this bias. In this article, the authors review Rubins definition of an average causal effect (ACE) as the average difference between potential outcomes under different treatments. The authors distinguish an ACE and a regression coefficient. The authors review 9 strategies for estimating ACEs on the basis of regression, propensity scores, and doubly robust methods, providing formulas for standard errors not given elsewhere. To illustrate the methods, the authors simulate an observational study to assess the effects of dieting on emotional distress. Drawing repeated samples from a simulated population of adolescent girls, the authors assess each method in terms of bias, efficiency, and interval coverage. Throughout the article, the authors offer insights and practical guidance for researchers who attempt causal inference with observational data.
arXiv: Methodology | 2006
Joseph Kang; Joseph L. Schafer
When outcomes are missing for reasons beyond an investigators control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the relationships between the covariates and the outcome and use those relationships to predict the missing values. Another is to model the probabilities of missingness given the covariates and incorporate them into a weighted or stratified estimate. Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified. In this article, we show that DR estimates can be constructed in many ways. We compare the performance of various DR and non-DR estimates of a population mean in a simulated example where both models are incorrect but neither is grossly misspecified. Methods that use inverse-probabilities as weights, whether they are DR or not, are sensitive to misspecification of the propensity model when some estimated propensities are small. Many DR methods perform better than simple inverse-probability weighting. None of the DR methods we tried, however, improved upon the performance of simple regression-based prediction of the missing values. This study does not represent every missing-data problem that will arise in practice. But it does demonstrate that, in at least some settings, two wrong models are not better than one.We congratulate Drs. Kang and Schafer (KS henceforth) for a careful and thought-provoking contribution to the literature regarding the so-called “double robustness” property, a topic that still engenders some confusion and disagreement. The authors’ approach of focusing on the simplest situation of estimation of the population mean μ of a response y when y is not observed on all subjects according to a missing at random (MAR) mechanism (equivalently, estimation of the mean of a potential outcome in a causal model under the assumption of no unmeasured confounders) is commendable, as the fundamental issues can be explored without the distractions of the messier notation and considerations required in more complicated settings. Indeed, as the article demonstrates, this simple setting is sufficient to highlight a number of key points. As noted eloquently by Molenberghs (2005), in regard to how such missing data/causal inference problems are best addressed, two “schools” may be identified: the “likelihood-oriented” school and the “weighting-based” school. As we have emphasized previously (Davidian, Tsiatis and Leon, 2005), we prefer to view inference from the vantage point of semi-parametric theory, focusing on the assumptions embedded in the statistical models leading to different “types” of estimators (i.e., “likelihood-oriented” or “weighting-based”) rather than on the forms of the estimators themselves. In this discussion, we hope to complement the presentation of the authors by elaborating on this point of view. Throughout, we use the same notation as in the paper.
Statistics in Medicine | 2007
Coen Bernaards; Thomas R. Belin; Joseph L. Schafer
Journal of The Royal Statistical Society Series A-statistics in Society | 2006
Hwan Chung; Brian P. Flaherty; Joseph L. Schafer
Statistical Science | 1994
Joseph L. Schafer
Archive | 2008
Joseph L. Schafer; Joseph Kang
Archive | 2008
Joseph L. Schafer; Joseph Kang
Poster presented at the Society for Prevention Research conference | 2007
H. Jung; Joseph L. Schafer; Ty A. Ridenour; Stephanie T. Lanza