Kelly Hallberg | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kelly Hallberg is active.

Explore More

Publication

Featured researches published by Kelly Hallberg.

American Journal of Evaluation | 2014

Examining the Internal Validity and Statistical Precision of the Comparative Interrupted Time Series Design by Comparison With a Randomized Experiment

Travis St. Clair; Thomas D. Cook; Kelly Hallberg

Although evaluators often use an interrupted time series (ITS) design to test hypotheses about program effects, there are few empirical tests of the design’s validity. We take a randomized experiment on an educational topic and compare its effects to those from a comparative ITS (CITS) design that uses the same treatment group as the experiment but a nonequivalent comparison group that is assessed at six time points before treatment. We estimate program effects with and without matching of the comparison schools, and we also systematically vary the number of pretest time points in the analysis. CITS designs produce impact estimates that are extremely close to the experimental benchmarks and, as implemented here, do so equally well with and without matching. Adding time points provides an advantage so long as the pretest trend differences in the treatment and comparison groups are correctly modeled. Otherwise, more time points can increase bias.

Evaluation Review | 2017

Implications of Small Samples for Generalization: Adjustments and Rules of Thumb.

Elizabeth Tipton; Kelly Hallberg; Larry V. Hedges; Wendy Chan

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.

Prevention Science | 2018

Pretest Measures of the Study Outcome and the Elimination of Selection Bias: Evidence from Three Within Study Comparisons

Kelly Hallberg; Thomas D. Cook; Peter M. Steiner; M. H. Clark

This paper examines how pretest measures of a study outcome reduce selection bias in observational studies in education. The theoretical rationale for privileging pretests in bias control is that they are often highly correlated with the outcome, and in many contexts, they are also highly correlated with the selection process. To examine the pretest’s role in bias reduction, we use the data from two within study comparisons and an especially strong quasi-experiment, each with an educational intervention that seeks to improve achievement. In each study, the pretest measures are consistently highly correlated with post-intervention measures of themselves, but the studies vary the correlation between the pretest and the process of selection into treatment. Across the three datasets with two outcomes each, there are three cases where this correlation is low and three where it is high. A single wave of pretest always reduces bias across the six instances examined, and it eliminates bias in three of them. Adding a second pretest wave eliminates bias in two more instances. However, the pattern of bias elimination does not follow the predicted pattern—that more bias reduction ensues as a function of how highly the pretest is correlated with selection. The findings show that bias is more complexly related to the pretest’s correlation with selection than we hypothesized, and we seek to explain why.

Journal of Educational and Behavioral Statistics | 2016

The Validity and Precision of the Comparative Interrupted Time-Series Design: Three Within-Study Comparisons.

Travis St. Clair; Kelly Hallberg; Thomas D. Cook

We explore the conditions under which short, comparative interrupted time-series (CITS) designs represent valid alternatives to randomized experiments in educational evaluations. To do so, we conduct three within-study comparisons, each of which uses a unique data set to test the validity of the CITS design by comparing its causal estimates to those from a randomized controlled trial (RCT) that shares the same treatment group. The degree of correspondence between RCT and CITS estimates depends on the observed pretest time trend differences and how they are modeled. Where the trend differences are clear and can be easily modeled, no bias results; where the trend differences are more volatile and cannot be easily modeled, the degree of correspondence is more mixed, and the best results come from matching comparison units on both pretest and demographic covariates.

International Encyclopedia of the Social & Behavioral Sciences (Second Edition) | 2015