Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Paul T. von Hippel is active.

Publication


Featured researches published by Paul T. von Hippel.


Sociological Methodology | 2007

REGRESSION WITH MISSING YS: AN IMPROVED STRATEGY FOR ANALYZING MULTIPLY IMPUTED DATA

Paul T. von Hippel

When fitting a generalized linear model—such as linear regression, logistic regression, or hierarchical linear modeling—analysts often wonder how to handle missing values of the dependent variable Y. If missing values have been filled in using multiple imputation, the usual advice is to use the imputed Y values in analysis. We show, however, that using imputed Ys can add needless noise to the estimates. Better estimates can usually be obtained using a modified strategy that we call multiple imputation, then deletion (MID). Under MID, all cases are used for imputation but, following imputation, cases with imputed Y values are excluded from the analysis. When there is something wrong with the imputed Y values, MID protects the estimates from the problematic imputations. And when the imputed Y values are acceptable, MID usually offers somewhat more efficient estimates than an ordinary MI strategy.When fitting a generalized linear model—such as linear regression, logistic regression, or hierarchical linear modeling—analysts often wonder how to handle missing values of the dependent variable Y. If missing values have been filled in using multiple imputation, the usual advice is to use the imputed Y values in analysis. We show, however, that using imputed Ys can add needless noise to the estimates. Better estimates can usually be obtained using a modified strategy that we call multiple imputation, then deletion (MID). Under MID, all cases are used for imputation but, following imputation, cases with imputed Y values are excluded from the analysis. When there is something wrong with the imputed Y values, MID protects the estimates from the problematic imputations. And when the imputed Y values are acceptable, MID usually offers somewhat more efficient estimates than an ordinary MI strategy.


American Sociological Review | 2004

Are Schools the Great Equalizer? Cognitive Inequality during the Summer Months and the School Year

Douglas B. Downey; Paul T. von Hippel; Beckett A. Broh

How does schooling affect inequality in cognitive skills? Reproductionist theorists have argued that schooling plays an important role in reproducing and even exacerbating existing disparities. But seasonal comparison research has shown that gaps in reading and math skills grow primarily during summer vacation, suggesting that non-school factors (e.g., family and neighborhood) are the main source of inequality. Using the Early Childhood Longitudinal Study—Kindergarten Cohort of 1998–99, this article improves upon past seasonal estimates of school and non-school effects on cognitive skill gains. Like past research, this study considers how socioeconomic and racial/ethnic gaps in skills change when school is in session versus when it is not. This study goes beyond past research, however, by examining the considerable inequality in learning that is not associated with socioeconomic status and race. This “unexplained” inequality is more than 90 percent of the total inequality in learning rates, and it is much smaller during school than during summer. The results suggest, therefore, that schools serve as important equalizers: nearly every gap grows faster during summer than during school. The black/white gap, however, represents a conspicuous exception.


American Journal of Public Health | 2007

The Effect of School on Overweight in Childhood: Gain in Body Mass Index During the School Year and During Summer Vacation

Paul T. von Hippel; Brian Powell; Douglas B. Downey; Nicholas J. Rowland

OBJECTIVES To determine whether school or nonschool environments contribute more to childhood overweight, we compared childrens gains in body mass index (BMI) when school is in session (during the kindergarten and first-grade school years) with their gains in BMI when school is out (during summer vacation). METHODS The BMIs of 5380 children in 310 schools were measured as part of the Early Childhood Longitudinal Study, Kindergarten Cohort. We used these measurements to estimate BMI gain rates during kindergarten, summer, and first grade. RESULTS Growth in BMI was typically faster and more variable during summer vacation than during the kindergarten and first-grade school years. The difference between school and summer gain rates was especially large for 3 at-risk subgroups: Black children, Hispanic children, and children who were already overweight at the beginning of kindergarten. CONCLUSIONS Although a schools diet and exercise policies may be less than ideal, it appears that early school environments contribute less to overweight than do nonschool environments.


Sociological Methodology | 2009

HOW TO IMPUTE INTERACTIONS, SQUARES, AND OTHER TRANSFORMED VARIABLES

Paul T. von Hippel

Researchers often carry out regression analysis using data that have missing values. Missing values can be filled in using multiple imputation, but imputation is tricky if the regression includes interactions, squares, or other transformations of the regressors. In this paper, we examine different approaches to imputing transformed variables; and we find one simple method that works well across a variety of circumstances. Our recommendation is to transform, then impute—i.e., calculate the interactions or squares in the incomplete data and then impute these transformations like any other variable. The transform-then-impute method yields good regression estimates, even though the imputed values are often inconsistent with one another. It is tempting to try and “fix” the inconsistencies in the imputed values, but methods that do so lead to biased regression estimates. Such biased methods include the passive imputation strategy implemented by the popular ice command for Stata.


Sociology Of Education | 2008

Are “Failing” Schools Really Failing? Using Seasonal Comparison to Evaluate School Effectiveness

Douglas B. Downey; Paul T. von Hippel; Melanie M. Hughes

To many, it seems obvious which schools are failing—schools whose students perform poorly on achievement tests. But since evaluating schools on achievement mixes the effects of school and nonschool influences, achievement-based evaluation likely underestimates the effectiveness of schools that serve disadvantaged populations. In this article, the authors discuss school-evaluation methods that more effectively separate school effects from nonschool effects. Specifically, the authors evaluate schools using 12-month (calendar-year) learning rates, 9-month (school-year) learning rates, and a provocative new measure, “impact”—which is the difference between the school-year learning rate and the summer learning rate. Using data from the Early Childhood Longitudinal Study of 1998–99, the authors show that learning- or impact-based evaluation methods substantially change conclusions about which schools are failing. In particular, among schools with failing (i.e., bottom-quintile) achievement levels, less than half are failing with respect to learning or impact. In addition, schools that serve disadvantaged students are much more likely to have low achievement levels than they are to have low levels of learning or impact. The implications of these findings are discussed in relation to market-based educational reform.


Journal of Statistics Education | 2005

Mean, Median, and Skew: Correcting a Textbook Rule.

Paul T. von Hippel

Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the median under left skew. This rule fails with surprising frequency. It can fail in...Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the median under left skew. This rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long but the other is heavy. Most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they also contradict the textbook interpretation of the median. We discuss ways to correct ideas about mean, median, and skew, while enhancing the desired intuition.


The American Statistician | 2004

Biases in SPSS 12.0 Missing Value Analysis

Paul T. von Hippel

In addition to SPSS Base software, SPSS Inc. sells a number of add-on packages, including a package called Missing Value Analysis (MVA). In version 12.0, MVAoffers four general methods for analyzing data with missing values. Unfortunately, none of these methods is wholly satisfactory when values are missing at random. The first two methods, listwise and pairwise deletion, are well known to be biased. The third method, regression imputation, uses a regression model to impute missing values, but the regression parameters are biased because they are derived using pairwise deletion. The final method, expectation maximization (EM), produces asymptotically unbiased estimates, but EMs implementation in MVA is limited to point estimates (without standard errors) of means, variances, and covariances. MVAcan also impute values using the EM algorithm, but values are imputed without residual variation, so analyses that use the imputed values can be biased.In addition to SPSS Base software, SPSS Inc. sells a number of add-on packages, including a package called Missing Value Analysis (MVA). In version 12.0, MVAoffers four general methods for analyzing data with missing values. Unfortunately, none of these methods is wholly satisfactory when values are missing at random. The first two methods, listwise and pairwise deletion, are well known to be biased. The third method, regression imputation, uses a regression model to impute missing values, but the regression parameters are biased because they are derived using pairwise deletion. The final method, expectation maximization (EM), produces asymptotically unbiased estimates, but EMs implementation in MVA is limited to point estimates (without standard errors) of means, variances, and covariances. MVAcan also impute values using the EM algorithm, but values are imputed without residual variation, so analyses that use the imputed values can be biased.


BMC Medical Research Methodology | 2015

The heterogeneity statistic I 2 can be biased in small meta-analyses

Paul T. von Hippel

BackgroundEstimated effects vary across studies, partly because of random sampling error and partly because of heterogeneity. In meta-analysis, the fraction of variance that is due to heterogeneity is estimated by the statistic I2. We calculate the bias of I2, focusing on the situation where the number of studies in the meta-analysis is small. Small meta-analyses are common; in the Cochrane Library, the median number of studies per meta-analysis is 7 or fewer.MethodsWe use Mathematica software to calculate the expectation and bias of I2.ResultsI2 has a substantial bias when the number of studies is small. The bias is positive when the true fraction of heterogeneity is small, but the bias is typically negative when the true fraction of heterogeneity is large. For example, with 7 studies and no true heterogeneity, I2 will overestimate heterogeneity by an average of 12 percentage points, but with 7 studies and 80 percent true heterogeneity, I2 can underestimate heterogeneity by an average of 28 percentage points. Biases of 12–28 percentage points are not trivial when one considers that, in the Cochrane Library, the median I2 estimate is 21 percent.ConclusionsThe point estimate I2 should be interpreted cautiously when a meta-analysis has few studies. In small meta-analyses, confidence intervals should supplement or replace the biased point estimate I2.In meta-analysis, the fraction of variance that is due to heterogeneity is known as I2. We show that the usual estimator of I2 is biased. The bias is largest when a meta-analysis has few studies and little heterogeneity. For example, with 7 studies and the true value of I2 at 0, the average estimate of I2 is .124. Estimates of I2 should be interpreted cautiously when the meta-analysis is small and the null hypothesis of homogeneity (I2=0) has not been rejected. In small meta-analyses, confidence intervals may be preferable to point estimates for I2.


Structural Equation Modeling | 2005

TEACHER'S CORNER: How Many Imputations Are Needed? A Comment on Hershberger and Fisher (2003)

Paul T. von Hippel

Multiple imputation is an increasingly popular strategy for analyzing data with missing values (Allison, 2002; Rubin, 1987). In multiple imputation, the analyst creates several different versions of a data set, replacing missing values with plausible random values, or imputations. The imputed data sets are analyzed separately, and the results are combined in a way that accounts for variation in the imputed values. Researchers often wish to know how many imputations are needed for each missing value. Recently Hershberger and Fisher (2003) argued that several hundred imputations are often required. The established advice, however, is that 2 to 10 imputations suffice under most realistic circumstances (Rubin, 1987). In this note we review evidence for the established advice, and explain why it is unnecessary and sometimes impractical to use hundreds of imputations. Hershberger and Fisher (2003, 649) asserted that the usual guidance is based exclusively on Monte Carlo simulations. Rubin (1987, chap. 4), however, presented rigorous theoretical calculations. The key value in Rubin’s calculations is the fraction of missing information γ. In a univariate sample with values missing at random, γ is approximately the fraction of cases with missing values. In a multivariate sample, γ is more complicated because different variables and cases contribute different amounts of information about different parameters. Together with the number of imputations m, the fraction of missing information γ governs quantities such as the relative efficiency of parameter estimates. Compared to an estimate based on infinite imputations, the relative efficiency of an estimate based on m imputations is approximately (1 + γ / m)–1/2 in standard error STRUCTURAL EQUATION MODELING, 12(2), 334–335 Copyright


Sociological Methods & Research | 2013

Should a Normal Imputation Model be Modified to Impute Skewed Variables

Paul T. von Hippel

Researchers often impute continuous variables under an assumption of normality–yet many incomplete variables are skewed. We find that imputing skewed continuous variables under a normal model can lead to bias. The bias is usually mild for popular estimands such as means, standard deviations, and linear regression coefficients, but the bias can be severe for more shape-dependent estimands such as percentiles or the coefficient of skewness. We test several methods for adapting a normal imputation model to accommodate skewness, including methods that transform, truncate, or censor (round) normally imputed values as well as methods that impute values from a quadratic or truncated regression. None of these modifications reliably reduces the biases of the normal model, and some modifications can make the biases much worse. We conclude that, if one has to impute a skewed variable under a normal model, it is usually safest to do so without modifications–unless you are more interested in estimating percentiles and shape than in estimating means, variances, and regressions. In the conclusion, we briefly discuss promising developments in the area of continuous imputation models that do not assume normality.Researchers often impute continuous variables under an assumption of normality–yet many incomplete variables are skewed. We find that imputing skewed continuous variables under a normal model can lead to bias. The bias is usually mild for popular estimands such as means, standard deviations, and linear regression coefficients, but the bias can be severe for more shape-dependent estimands such as percentiles or the coefficient of skewness. We test several methods for adapting a normal imputation model to accommodate skewness, including methods that transform, truncate, or censor (round) normally imputed values as well as methods that impute values from a quadratic or truncated regression. None of these modifications reliably reduces the biases of the normal model, and some modifications can make the biases much worse. We conclude that, if one has to impute a skewed variable under a normal model, it is usually safest to do so without modifications–unless you are more interested in estimating percentiles and shape than in estimating means, variances, and regressions. In the conclusion, we briefly discuss promising developments in the area of continuous imputation models that do not assume normality.

Collaboration


Dive into the Paul T. von Hippel's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Igor Holas

University of Texas at Austin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Rebecca Benson

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joseph Workman

University of Missouri–Kansas City

View shared research outputs
Researchain Logo
Decentralizing Knowledge