Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Nathaniel Schenker is active.

Publication


Featured researches published by Nathaniel Schenker.


The American Statistician | 2001

On Judging the Significance of Differences by Examining the Overlap Between Confidence Intervals

Nathaniel Schenker; Jane F. Gentleman

To judge whether the difference between two point estimates is statistically significant, data analysts often examine the overlap between the two associated confidence intervals. We compare this technique to the standard method of testing significance under the common assumptions of consistency, asymptotic normality, and asymptotic independence of the estimates. Rejection of the null hypothesis by the method of examining overlap implies rejection by the standard method, whereas failure to reject by the method of examining overlap does not imply failure to reject by the standard method. As a consequence, the method of examining overlap is more conservative (i.e., rejects the null hypothesis less often) than the standard method when the null hypothesis is true, and it mistakenly fails to reject the null hypothesis more frequently than does the standard method when the null hypothesis is false. Although the method of examining overlap is simple and especially convenient when lists or graphs of confidence intervals have been presented, we conclude that it should not be used for formal significance testing unless the data analyst is aware of its deficiencies and unless the information needed to carry out a more appropriate procedure is unavailable.


Journal of the American Statistical Association | 1986

Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse

Donald B. Rubin; Nathaniel Schenker

Abstract Several multiple imputation techniques are described for simple random samples with ignorable nonresponse on a scalar outcome variable. The methods are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean. Using m = 2 imputations per missing value gives accurate coverages in common cases and is clearly superior to single imputation (m = 1) in all cases. The performances of the methods for various m can be predicted well by linear interpolation in 1/(m — 1) between the results for m = 2 and m = ∞. As a rough guide, to assure coverages of interval estimates within 2% of the nominal level when using the preferred methods, the number of imputations per missing value should increase from 2 to 3 as the nonresponse rate increases from 10% to 60%.


The American Journal of Clinical Nutrition | 2009

Comparisons of percentage body fat, body mass index, waist circumference, and waist-stature ratio in adults

Katherine M. Flegal; John A. Shepherd; Anne C. Looker; Barry I. Graubard; Lori G. Borrud; Cynthia L. Ogden; Tamara B. Harris; James E. Everhart; Nathaniel Schenker

BACKGROUND Body mass index (BMI), waist circumference (WC), and the waist-stature ratio (WSR) are considered to be possible proxies for adiposity. OBJECTIVE The objective was to investigate the relations between BMI, WC, WSR, and percentage body fat (measured by dual-energy X-ray absorptiometry) in adults in a large nationally representative US population sample from the National Health and Nutrition Examination Survey (NHANES). DESIGN BMI, WC, and WSR were compared with percentage body fat in a sample of 12,901 adults. RESULTS WC, WSR, and BMI were significantly more correlated with each other than with percentage body fat (P < 0.0001 for all sex-age groups). Percentage body fat tended to be significantly more correlated with WC than with BMI in men but significantly more correlated with BMI than with WC in women (P < 0.0001 except in the oldest age group). WSR tended to be slightly more correlated with percentage body fat than was WC. Percentile values of BMI, WC, and WSR are shown that correspond to percentiles of percentage body fat increments of 5 percentage points. More than 90% of the sample could be categorized to within one category of percentage body fat by each measure. CONCLUSIONS BMI, WC, and WSR perform similarly as indicators of body fatness and are more closely related to each other than with percentage body fat. These variables may be an inaccurate measure of percentage body fat for an individual, but they correspond fairly well overall with percentage body fat within sex-age groups and distinguish categories of percentage body fat.


Computational Statistics & Data Analysis | 1996

Partially parametric techniques for multiple imputation

Nathaniel Schenker; Jeremy M. G. Taylor

Abstract Multiple imputation is a technique for handling data sets with missing values. The method fills in the missing values several times, creating several completed data sets for analysis. Each data set is analyzed separately using techniques designed for complete data, and the results are then combined in such a way that the variability due to imputation may be incorporated. Methods of imputing the missing values can vary from fully parametric to nonparametric. In this paper, we compare partially parametric and fully parametric regression-based multiple-imputation methods. The fully parametric method that we consider imputes missing regression outcomes by drawing them from their predictive distribution under the regression model, whereas the partially parametric methods are based on imputing outcomes or residuals for incomplete cases using values drawn from the complete cases. For the partially parametric methods, we suggest a new approach to choosing complete cases from which to draw values. In a Monte Carlo study in the regression setting, we investigate the robustness of the multiple-imputation schemes to misspecification of the underlying model for the data. Sources of model misspecification considered include incorrect modeling of the mean structure as well as incorrect specification of the error distribution with regard to heaviness of the tails and heteroscedasticity. The methods are compared with respect to the bias and efficiency of point estimates and the coverage rates of confidence intervals for the marginal mean and distribution function of the outcome. We find that when the mean structure is specified correctly, all of the methods perform well, even if the error distribution is misspecified. The fully parametric approach, however, produces slightly more efficient estimates of the marginal distribution function of the outcome than do the partially parametric approaches. When the mean structure is misspecified, all of the methods still perform well for estimating the marginal mean, although the fully parametric method shows slight increases in bias and variance. For estimating the marginal distribution function, however, the fully parametric method breaks down in several situations, whereas the partially parametric methods maintain their good performance. In an application to AIDS research in a setting that is similar to although slightly more complicated than that of the Monte Carlo study, we examine how estimates for the distribution of the time from infection with HIV to the onset of AIDS vary with the method used to impute the residual time to AIDS for subjects with right-censored data. The fully parametric and partially parametric techniques produce similar results, suggesting that the model selection used for fully parametric imputation was adequate. Our application provides an example of how multiple imputation can be used to combine information from two cohorts to estimate quantities that cannot be estimated directly from either one of the cohorts separately.


Journal of the American Statistical Association | 1991

Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression

Clifford C. Clogg; Donald B. Rubin; Nathaniel Schenker; Bradley D. Schultz; Lynn Weidman

Abstract We describe methods used to create a new Census data base that can be used to study comparability of industry and occupation classification systems. This project represents the most extensive application of multiple imputation to date, and the modeling effort was considerable as well—hundreds of logistic regressions were estimated. One goal of this article is to summarize the strategies used in the project so that researchers can better understand how the new data bases were created. Another goal is to show how modifications of maximum likelihood methods were made for the modeling and imputation phases of the project. To multiply-impute 1980 census-comparable codes for industries and occupations in two 1970 census public-use samples, logistic regression models were estimated with flattening constants. For many of the regression models considered, the data were too sparse to support conventional maximum likelihood analysis, so some alternative had to be employed. These methods solve existence and ...


Journal of the American Statistical Association | 2006

Multiple Imputation of Missing Income Data in the National Health Interview Survey

Nathaniel Schenker; Trivellore E. Raghunathan; Pei Lu Chiu; Diane M. Makuc; Guangyu Zhang; Alan J. Cohen

The National Health Interview Survey (NHIS) provides a rich source of data for studying relationships between income and health and for monitoring health and health care for persons at different income levels. However, the nonresponse rates are high for two key items, total family income in the previous calendar year and personal earnings from employment in the previous calendar year. To handle the missing data on family income and personal earnings in the NHIS, multiple imputation of these items, along with employment status and ratio of family income to the federal poverty threshold (derived from the imputed values of family income), has been performed for the survey years 1997–2004. (There are plans to continue this work for years beyond 2004 as well.) Files of the imputed values, as well as documentation, are available at the NHIS website (http://www.cdc.gov/nchs/nhis.htm). This article describes the approach used in the multiple-imputation project and evaluates the methods through analyses of the multiply imputed data. The analyses suggest that imputation corrects for biases that occur in estimates based on the data without imputation, and that multiple imputation results in gains in efficiency as well.


Journal of the American Statistical Association | 1985

Qualms about Bootstrap Confidence Intervals

Nathaniel Schenker

Abstract The percentile method and bias-corrected percentile method of Efron (1981, 1982) are discussed. When these methods are used to construct nonparametric confidence intervals for the variance of a normal distribution, the coverage probabilities are substantially below the nominal level for small to moderate samples. This is due to the inapplicability of assumptions underlying the methods. These assumptions are difficult or impossible to check in the complicated situations for which the bootstrap is intended. Therefore, bootstrap confidence intervals should be used with caution in complex problems.


Journal of the American Statistical Association | 2007

Combining Information From Two Surveys to Estimate County-Level Prevalence Rates of Cancer Risk Factors and Screening

Trivellore E. Raghunathan; Dawei Xie; Nathaniel Schenker; Van L. Parsons; William W. Davis; Kevin W. Dodd; Eric J. Feuer

Cancer surveillance research requires estimates of the prevalence of cancer risk factors and screening for small areas such as counties. Two popular data sources are the Behavioral Risk Factor Surveillance System (BRFSS), a telephone survey conducted by state agencies, and the National Health Interview Survey (NHIS), an area probability sample survey conducted through face-to-face interviews. Both data sources have advantages and disadvantages. The BRFSS is a larger survey and almost every county is included in the survey, but it has lower response rates as is typical with telephone surveys and it does not include subjects who live in households with no telephones. On the other hand, the NHIS is a smaller survey, with the majority of counties not included; but it includes both telephone and nontelephone households, and has higher response rates. A preliminary analysis shows that the distributions of cancer screening and risk factors are different for telephone and nontelephone households. Thus, information from the two surveys may be combined to address both nonresponse and noncoverage errors. A hierarchical Bayesian approach that combines information from both surveys is used to construct county-level estimates. The proposed model incorporates potential noncoverage and nonresponse biases in the BRFSS as well as complex sample design features of both surveys. A Markov chain Monte Carlo method is used to simulate draws from the joint posterior distribution of unknown quantities in the model that uses design-based direct estimates and county-level covariates. Yearly prevalence estimates at the county level for 49 states, as well as for the entire state of Alaska and the District of Columbia, are developed for six outcomes using BRFSS and NHIS data from the years 1997–2000. The outcomes include smoking and use of common cancer screening procedures. The NHIS/BRFSS combined county-level estimates are substantially different from those based on the BRFSS alone.


Journal of the American Statistical Association | 2000

Inference with imputed conditional means

Joseph L Schafer; Nathaniel Schenker

Abstract In this article we present analytic techniques for inference from a dataset in which missing values have been replaced by predictive means derived from an imputation model. The derivations are based on asymptotic expansions of point estimators and their associated variance estimators, and the resulting formulas can be thought of as first-order approximations to standard multiple-imputation procedures with an infinite number of imputations for the missing values. Our method, where applicable, may require substantially less computational effort than creating and managing a multiply imputed database; moreover, the resulting inferences can be more precise than those derived from multiple imputation, because they do not rely on simulation. Our techniques use components of the standard complete-data analysis, along with two summary measures from the fitted imputation model. If the imputation and analysis phases are carried out by the same person or organization, then the method provides a quick assessment of the variability due to missing data. If a data producer is supplying the imputed data set to outside analysts, then the necessary summary measures could be supplied to the analysts, enabling them to apply the method themselves. We emphasize situations with iid samples, univariate missing data, and complete-data point estimators that are smooth functions of means, but also discuss extensions to more complicated situations. We illustrate properties of our methods in several examples, including an application to a large dataset on fatal accidents maintained by the National Highway Traffic Safety Administration.


Statistics in Medicine | 2011

Multiple imputation of missing dual-energy X-ray absorptiometry data in the National Health and Nutrition Examination Survey†

Nathaniel Schenker; Lori G. Borrud; Vicki L. Burt; Lester R. Curtin; Katherine M. Flegal; Jeffery P. Hughes; Clifford L. Johnson; Anne C. Looker; Lisa B. Mirel

In 1999, dual-energy x-ray absorptiometry (DXA) scans were added to the National Health and Nutrition Examination Survey (NHANES) to provide information on soft tissue composition and bone mineral content. However, in 1999-2004, DXA data were missing in whole or in part for about 21 per cent of the NHANES participants eligible for the DXA examination; and the missingness is associated with important characteristics such as body mass index and age. To handle this missing-data problem, multiple imputation of the missing DXA data was performed. Several features made the project interesting and challenging statistically, including the relationship between missingness on the DXA measures and the values of other variables; the highly multivariate nature of the variables being imputed; the need to transform the DXA variables during the imputation process; the desire to use a large number of non-DXA predictors, many of which had small amounts of missing data themselves, in the imputation models; the use of lower bounds in the imputation procedure; and relationships between the DXA variables and other variables, which helped both in creating and evaluating the imputations. This paper describes the imputation models, methods, and evaluations for this publicly available data resource and demonstrates properties of the imputations via examples of analyses of the data. The analyses suggest that imputation helps to correct biases that occur in estimates based on the data without imputation, and that it helps to increase the precision of estimates as well. Moreover, multiple imputation usually yields larger estimated standard errors than those obtained with single imputation.

Collaboration


Dive into the Nathaniel Schenker's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jennifer D. Parker

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Van L. Parsons

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph L Schafer

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Alan J. Cohen

Centers for Disease Control and Prevention

View shared research outputs
Top Co-Authors

Avatar

Anne C. Looker

Centers for Disease Control and Prevention

View shared research outputs
Researchain Logo
Decentralizing Knowledge