Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joseph L Schafer is active.

Publication


Featured researches published by Joseph L Schafer.


Statistical Methods in Medical Research | 1999

Multiple imputation: a primer

Joseph L Schafer

In recent years, multiple imputation has emerged as a convenient and flexible paradigm for analysing data with missing values. Essential features of multiple imputation are reviewed, with answers to frequently asked questions about using the method in practice.


Psychological Methods | 2001

A comparison of inclusive and restrictive strategies in modern missing data procedures.

Linda M. Collins; Joseph L Schafer; Chi-Ming Kam

Two classes of modern missing data procedures, maximum likelihood (ML) and multiple imputation (MI), tend to yield similar results when implemented in comparable ways. In either approach, it is possible to include auxiliary variables solely for the purpose of improving the missing data procedure. A simulation was presented to assess the potential costs and benefits of a restrictive strategy, which makes minimal use of auxiliary variables, versus an inclusive strategy, which makes liberal use of such variables. The simulation showed that the inclusive strategy is to be greatly preferred. With an inclusive strategy not only is there a reduced chance of inadvertently omitting an important cause of missingness, there is also the possibility of noticeable gains in terms of increased efficiency and reduced bias, with only minor costs. As implemented in currently available software, the ML approach tends to encourage the use of a restrictive strategy, whereas the MI approach makes it relatively simple to use an inclusive strategy.


Multivariate Behavioral Research | 1998

Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective

Joseph L Schafer; Maren K. Olsen

Analyses of multivariate data are frequently hampered by missing values. Until recently, the only missing-data methods available to most data analysts have been relatively ad1 hoc practices such as listwise deletion. Recent dramatic advances in theoretical and computational statistics, however, have produced anew generation of flexible procedures with a sound statistical basis. These procedures involve multiple imputation (Rubin, 1987), a simulation technique that replaces each missing datum with a set of m > 1 plausible values. The rn versions of the complete data are analyzed by standard complete-data methods, and the results are combined using simple rules to yield estimates, standard errors, and p-values that formally incorporate missing-data uncertainty. New computational algorithms and software described in a recent book (Schafer, 1997a) allow us to create proper multiple imputations in complex multivariate settings. This article reviews the key ideas of multiple imputation, discusses the software programs currently available, and demonstrates their use on data from the Adolescent Alcohol Prevention Trial (Hansen & Graham, 199 I).


Journal of the American Statistical Association | 2001

A Two-Part Random-Effects Model for Semicontinuous Longitudinal Data

Maren K. Olsen; Joseph L Schafer

A semicontinuous variable has a portion of responses equal to a single value (typically 0) and a continuous, often skewed, distribution among the remaining values. In cross-sectional analyses, variables of this type may be described by a pair of regression models; for example, a logistic model for the probability of nonzero response and a conditional linear model for the mean response given that it is nonzero. We extend this two-part regression approach to longitudinal settings by introducing random coefficients into both the logistic and the linear stages. Fitting a two-part random-effects model poses computational challenges similar to those found with generalized linear mixed models. We obtain maximum likelihood estimates for the fixed coefficients and variance components by an approximate Fisher scoring procedure based on high-order Laplace approximations. To illustrate, we apply the technique to data from the Adolescent Alcohol Prevention Trial, examining reported recent alcohol use for students in grades 7–11 and its relationships to parental monitoring and rebelliousness.


The science of prevention: Methodological advances from alcohol and substance abuse research | 1997

Analysis with missing data in prevention research

Stewart I. Donaldson; John W. Graham; Scott M. Hofer; David P. MacKinnon; Joseph L Schafer

Missing data problems have been a thorn in the side of prevention researchers for years. Although some solutions for these problems have been available in the statistical literature, these solutions have not found their way into mainstream prevention research. This chapter is meant to serve as an introduction to the systematic application of the missing data analysis solutions presented recently by Little and Rubin (1987) and others. The chapter does not describe a complete strategy, but it is relevant for (1) missing data analysis with continuous (but not categorical) data, (2) data that are reasonably normally distributed, and (3) solutions for missing data problems for analyses related to the general linear model in particular, analyses that use (or can use) a covariance matrix as input. The examples in the chapter come from drug prevention research. The chapter discusses (1) the problem of wanting to ask respondents more questions than most individuals can answer; (2) the problem of attrition and some solutions; and (3) the problem of special measurement procedures that are too expensive or time consuming to obtain for all subjects. The authors end with several conclusions: Whenever possible, researchers should use the Expectation-Maximization (EM) algorithm (or other maximum likelihood procedure, including the multiple-group structural equation-modeling procedure or, where appropriate, multiple imputation, for analyses involving missing data [the chapter provides concrete examples]); If researchers must use other analyses, they should keep in mind that these others produce biased results and should not be relied upon for final analyses; When data are missing, the appropriate missing data analysis procedures do not generate something out of nothing but do make the most out of the data available; When data are missing, researchers should work hard (especially when planning a study) to find the cause of missingness and include the cause in the analysis models; and Researchers should sample the cases originally missing (whenever possible) and adjust EM algorithm parameter estimates accordingly.


Journal of Computational and Graphical Statistics | 2002

Computational Strategies for Multivariate Linear Mixed-Effects Models With Missing Values

Joseph L Schafer; Recai Murat Yucel

This article presents new computational techniques for multivariate longitudinal or clustered data with missing values. Current methodology for linear mixed-effects models can accommodate imbalance or missing data in a single response variable, but it cannot handle missing values in multiple responses or additional covariates. Applying a multivariate extension of a popular linear mixed-effects model, we create multiple imputations of missing values for subsequent analyses by a straightforward and effective Markov chain Monte Carlo procedure. We also derive and implement a new EM algorithm for parameter estimation which converges more rapidly than traditional EM algorithms because it does not treat the random effects as “missing data,” but integrates them out of the likelihood function analytically. These techniques are illustrated on models for adolescent alcohol use in a large school-based prevention trial.


Statistica Neerlandica | 2003

Multiple Imputation in Multivariate Problems When the Imputation and Analysis Models Differ

Joseph L Schafer

Bayesian multiple imputation (MI) has become a highly useful paradigm for handling missing values in many settings. In this paper, I compare Bayesian MI with other methods – maximum likelihood, in particular—and point out some of its unique features. One key aspect of MI, the separation of the imputation phase from the analysis phase, can be advantageous in settings where the models underlying the two phases do not agree.


Journal of the American Statistical Association | 2000

Inference with imputed conditional means

Joseph L Schafer; Nathaniel Schenker

Abstract In this article we present analytic techniques for inference from a dataset in which missing values have been replaced by predictive means derived from an imputation model. The derivations are based on asymptotic expansions of point estimators and their associated variance estimators, and the resulting formulas can be thought of as first-order approximations to standard multiple-imputation procedures with an infinite number of imputations for the missing values. Our method, where applicable, may require substantially less computational effort than creating and managing a multiply imputed database; moreover, the resulting inferences can be more precise than those derived from multiple imputation, because they do not rely on simulation. Our techniques use components of the standard complete-data analysis, along with two summary measures from the fitted imputation model. If the imputation and analysis phases are carried out by the same person or organization, then the method provides a quick assessment of the variability due to missing data. If a data producer is supplying the imputed data set to outside analysts, then the necessary summary measures could be supplied to the analysts, enabling them to apply the method themselves. We emphasize situations with iid samples, univariate missing data, and complete-data point estimators that are smooth functions of means, but also discuss extensions to more complicated situations. We illustrate properties of our methods in several examples, including an application to a large dataset on fatal accidents maintained by the National Highway Traffic Safety Administration.


Psychological Methods | 2005

Using data augmentation to obtain standard errors and conduct hypothesis tests in latent class and latent transition analysis

Stephanie T. Lanza; Linda M. Collins; Joseph L Schafer; Brian P. Flaherty

Latent class analysis (LCA) provides a means of identifying a mixture of subgroups in a population measured by multiple categorical indicators. Latent transition analysis (LTA) is a type of LCA that facilitates addressing research questions concerning stage-sequential change over time in longitudinal data. Both approaches have been used with increasing frequency in the social sciences. The objective of this article is to illustrate data augmentation (DA), a Markov chain Monte Carlo procedure that can be used to obtain parameter estimates and standard errors for LCA and LTA models. By use of DA it is possible to construct hypothesis tests concerning not only standard model parameters but also combinations of parameters, affording tremendous flexibility. DA is demonstrated with an example involving tests of ethnic differences, gender differences, and an Ethnicity x Gender interaction in the development of adolescent problem behavior.


Journal of the American Statistical Association | 2003

Multiple Edit/Multiple Imputation for Multivariate Continuous Data

Bonnie Ghosh-Dastidar; Joseph L Schafer

Multiple imputation replaces an incomplete dataset with m > 1 simulated complete versions that are analyzed separately by standard methods. We present a natural extension of multiple imputation for handling the dual problems of nonresponse and response error. This extension, which we call multiple edit/multiple imputation (MEMI), replaces an observed dataset containing missing values and errors with m > 1 simulated versions of the ideal dataset that is complete and error-free. These ideal data sets are analyzed separately, and the results are combined using the same rules as for multiple imputation. The resulting inferences simultaneously reflect uncertainty due to nonresponse and response error. MEMI may be an attractive alternative to deterministic or quasi-statistical edit and imputation procedures used by many data-collecting agencies. Producing MEMIs requires assumptions about the distribution of the ideal data, the nature of nonresponse, and a model for the response error mechanism. However, fitting such a model does not necessarily require data from a follow-up study. In this article we develop and implement MEMI for preliminary data from the Third National Health and Nutrition Examination Survey, Phase I (1988–1991). Raw body measurements for 1,345 children age 2–3 years are imputed under a Bayesian model for intermittent or semicontinuous errors. The resulting population estimates are found to be quite insensitive to prior assumptions about the rates and magnitude of errors.

Collaboration


Dive into the Joseph L Schafer's collaboration.

Top Co-Authors

Avatar

Linda M. Collins

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

John W. Graham

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nathaniel Schenker

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Chi-Ming Kam

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Hakan Demirtas

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Joseph Kang

Northwestern University

View shared research outputs
Top Co-Authors

Avatar

Recai Murat Yucel

Pennsylvania State University

View shared research outputs
Researchain Logo
Decentralizing Knowledge