Ben B. Hansen
University of Michigan
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ben B. Hansen.
Journal of the American Statistical Association | 2004
Ben B. Hansen
Among matching techniques for observational studies, full matching is in principle the best, in the sense that its alignment of comparable treated and control subjects is as good as that of any alternate method, and potentially much better. This article evaluates the practical performance of full matching for the first time, modifying it in order to minimize variance as well as bias and then using it to compare coached and uncoached takers of the SAT. In this new version, with restrictions on the ratio of treated subjects to controls within matched sets, full matching makes use of many more observations than does pair matching, but achieves far closer matches than does matching with k≥ 2 controls. Prior to matching, the coached and uncoached groups are separated on the propensity score by 1.1 SDs. Full matching reduces this separation to 1% or 2% of an SD. In older literature comparing matching and regression, Cochran expressed doubts that any method of adjustment could substantially reduce observed bias of this magnitude. To accommodate missing data, regression-based analyses by ETS researchers rejected a subset of the available sample that differed significantly from the subsample they analyzed. Full matching on the propensity score handles the same problem simply and without rejecting observations. In addition, it eases the detection and handling of nonconstancy of treatment effects, which the regression-based analyses had obscured, and it makes fuller use of covariate information. It estimates a somewhat larger effect of coaching on the math score than did ETSs methods.
Statistical Science | 2008
Ben B. Hansen; Jake Bowers
In randomized experiments, treatment and control groups should be roughly the same—balanced—in their distributions of pre- treatment variables. But how nearly so? Can descriptive comparisons meaningfully be paired with significance tests? If so, should there be several such tests, one for each pretreatment variable, or should there be a single, omnibus test? Could such a test be engineered to give eas- ily computed p-values that are reliable in samples of moderate size, or would simulation be needed for reliable calibration? What new con- cerns are introduced by random assignment of clusters? Which tests of balance would be optimal? To address these questions, Fishers randomization inference is ap- plied to the question of balance. Its application suggests the reversal of published conclusions about two studies, one clinical and the other a field experiment in political participation.
Probability Surveys | 2007
Alexander Gnedin; Ben B. Hansen; Jim Pitman
This paper collects facts about the number of occupied boxes in the classical balls-in-boxes occupancy scheme with infinitely many positive frequencies: equivalently, about the number of species represented in sam- ples from populations with infinitely many species. We present moments of this random variable, discuss asymptotic relations among them and with re- lated random variables, and draw connections with regular variation, which appears in various manifestations. AMS 2000 subject classifications: Primary 60F05, 60F15; secondary
The Annals of Applied Statistics | 2010
Carrie Hosman; Ben B. Hansen; Paul W. Holland
Omitted variable bias can affect treatment effect estimates obtained from observational data due to the lack of random assignment to treatment groups. Sensitivity analyses adjust these estimates to quantify the impact of potential omitted variables. This paper presents methods of sensitivity analysis to adjust interval estimates of treatment effect---both the point estimate and standard error---obtained using multiple linear regression. Central to our approach is what we term benchmarking, the use of data to establish reference points for speculation about omitted confounders. The method adapts to treatment effects that may differ by subgroup, to scenarios involving omission of multiple variables, and to combinations of covariance adjustment with propensity score stratification. We illustrate it using data from an influential study of health outcomes of patients admitted to critical care.
Journal of the American Statistical Association | 2009
Ben B. Hansen; Jake Bowers
Early in the twentieth century, Fisher and Neyman demonstrated how to infer effects of agricultural interventions using only the very weakest of assumptions, by randomly varying which plots were to be manipulated. Although the methods permitted uncontrolled variation between experimental units, they required strict control over assignment of interventions; this hindered their application to field studies with human subjects, who ordinarily could not be compelled to comply with experimenters’ instructions. In 1996, however, Angrist, Imbens, and Rubin showed that inferences from randomized studies could accommodate noncompliance without significant strengthening of assumptions. Political scientists A. Gerber and D. Green responded quickly, fielding a randomized study of voter turnout campaigns in the November 1998 general election. Noncontacts and refusals were frequent, but Gerber and Green analyzed their data in the style of Angrist et al., avoiding the need to model nonresponse. They did use models for other purposes: to address complexities of the randomization scheme; to permit heterogeneity among voters and campaigners; to account for deviations from experimental protocol; and to take advantage of highly informative covariates. Although the added assumptions seemed straightforward and unassailable, a later analysis by Imai found them to be at odds with Gerber and Green’s data. Using a different model, he reaches the very opposite of Gerber and Green’s central conclusion about getting out the vote. This article shows that neither of the models are necessary, addressing all of the complications of Gerber and Green’s study using methods in the tradition of Fisher and Neyman. To do this, it merges recent developments in randomization-based inference for comparative studies with somewhat older developments in design-based analysis of sample surveys. The method involves regression, but large-sample analysis and simulations demonstrate its lack of dependence on regression assumptions. Its substantive results have consequences both for the design of campaigns to increase voter participation and for theories of political behavior more generally.
Statistics & Probability Letters | 2000
Ben B. Hansen; Jim Pitman
Suppose an exchangable sequence with values in a nice measurable space S admits a prediction rule of the following form: given the first n terms of the sequence, the next term equals the jth distinct value observed so far with probability pj,n, for j=1,2,... , and otherwise is a new value with distribution [nu] for some probability measure [nu] on S with no atoms. Then the pj,n depend only on the partitition of the first n integers induced by the first n values of the sequence. All possible distributions for such an exchangeable sequence are characterized in terms of constraints on the pj,n and in terms of their de Finetti representations.
Statistics in Medicine | 2008
Ben B. Hansen
Peter Austin has made an exacting, timely and eye-opening review of uses of propensity-score matching in medical research. Its Section 2.1 argues that the reports of propensity-matched analyses should include descriptive assessments of matched treatment-control differences on baseline variables. When propensity matching on covariates including BMI, for example, one should report the difference between matched cohorts’ mean BMIs, perhaps after inverse scaling by the pooled s.d. of BMIs prior to matching. The recommendation is a good one: matched differences on prognostic variables, and on variables that track selection into treatment, speak to the credibility of subsequent matched outcome analyses; and although the basic promise of propensity matching is that it should lessen such differences, the extent of the reduction varies greatly from case to case. Furthermore, since successful propensity matches or subclassifications enable comparisons similar to those which randomization would have given—in terms of observed covariates, at least [1], and should those covariates jointly suffice to remove confounding, then also in terms of outcomes [2]—it follows that balance is the basic mark of success of a propensity adjustment. Austin’s review also makes a negative recommendation: When appraising balance, avoid significance tests. Having compared means of BMI and other variables, Austin would not have us go on to calculate either paired/two-sample t-tests or other tests for a treatment-control difference on BMI. His pessimism about the state of reporting in the medical propensity-matching literature stems in large part from this opinion; only 2 of 47 papers reported balance properly, Austin reports, but it turns out that another 33 were disqualified on the basis of having tested balance, rather than reporting it using purely descriptive measures. This dim view of balance testing is driven by two complaints, complaints Austin shares with Imai, King and Stuart [3]:
Archive | 2011
Ben B. Hansen
During the 1995–1996 academic year, investigators from the College Board surveyed a random sample of high school junior and senior SAT® takers to probe how they had prepared for the SAT. Among other questions, students were asked whether they had taken extracurricular test-preparation classes. Some 12% of respondents said that they had; the comparison of these students’ SAT scores to those of the remaining 88% comprised the observational study reported by Powers and Rock (1999).
Journal of Computational and Graphical Statistics | 2006
Ben B. Hansen; Stephanie O. Klopfer
Social Science & Medicine | 2007
Jeffrey D. Morenoff; James S. House; Ben B. Hansen; David R. Williams; George A. Kaplan; Haslyn E. R. Hunte