Yajuan Si
University of Wisconsin-Madison
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yajuan Si.
Journal of Educational and Behavioral Statistics | 2013
Yajuan Si
In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian, joint modeling approach to multiple imputation for categorical data based on Dirichlet process mixtures of multinomial distributions. The approach automatically models complex dependencies while being computationally expedient. The Dirichlet process prior distributions enable analysts to avoid fixing the number of mixture components at an arbitrary number. We illustrate repeated sampling properties of the approach using simulated data. We apply the methodology to impute missing background data in the 2007 Trends in International Mathematics and Science Study.
Statistical Science | 2013
Yiting Deng; D. Sunshine Hillygus; Yajuan Si; Siyu Zheng
Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples—new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel—offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007-2008 Associated Press-Yahoo! News Election Poll.
Bayesian Analysis | 2015
Yajuan Si; Natesh S. Pillai; Andrew Gelman
Survey weighting adjusts for known or expected dierences between sample and population. Weights are constructed on design or benchmarking variables that are predictors of inclusion probability. In this paper, we assume that the only information we have about the weighting procedure is the values of the weights in the sample. We propose a hierarchical Bayesian approach in which we model the weights of the nonsampled units in the population and simultaneously include them as predictors in a nonparametric Gaussian process regression to yield valid inference for the underlying nite population and capture the uncertainty induced by sampling and the unobserved outcomes. We use simulation studies to evaluate the performance of our procedure and compare it to the classical design-based estimator. We apply our method to the Fragile Family Child Wellbeing Study. Our studies nd the Bayesian nonparametric nite population estimator to be more robust than the classical design-based estimator without loss in eciency.
Journal of statistical theory and practice | 2011
Yajuan Si
Multiple imputation is a common approach for handling missing data. It is also used by government agencies to protect confidential information in public use data files. One reason for the popularity of multiple imputation approaches is ease of use: analysts make inferences by combining point and variance estimates with simple rules. These combining rules are based on method of moments approximations to full Bayesian inference. With modern computing, however, it is as easy to perform the full Bayesian inference as it is to combine point and variance estimates. This begs the question: is there any advantage of using full Bayesian inference over multiple imputation combining rules? We use simulation studies to investigate this question. We find that, in general, the full Bayesian inference is not preferable to using the combining rules in multiple imputation for missing data. The full Bayesian inference can have advantages over the combining rules when using multiple imputation to protect confidential information.
Journal of Research on Educational Effectiveness | 2016
Diane M. Early; Juliette Berg; Stacey Alicea; Yajuan Si; J. Lawrence Aber; Richard M. Ryan
Abstract Every Classroom, Every Day (ECED) is a set of instructional improvement interventions designed to increase student achievement in math and English/language arts (ELA). ECED includes three primary components: (a) systematic classroom observations by school leaders, (b) intensive professional development and support for math teachers and instructional leaders to reorganize math instruction, assessment, and grading around mastery of benchmarks, and (c) a structured literacy curriculum that supplements traditional English courses, with accompanying professional development and support for teachers surrounding its use. The present study is a two-year trial, conducted by independent researchers, which employed a school-randomized design and included 20 high schools (10 treatment; 10 control) in five districts in four states. The students were ethnically diverse and most were eligible for free or reduced-price lunch. Results provided evidence that ECED improved scores on standardized tests of math achievement, but not standardized tests of ELA achievement. Findings are discussed in terms of differences between math and ELA and of implications for future large-scale school-randomized trials.
The Annals of Applied Statistics | 2016
Yajuan Si; D. Sunshine Hillygus
Many panel studies collect refreshment samples---new, randomly sampled respondents who complete the questionnaire at the same time as a subsequent wave of the panel. With appropriate modeling, these samples can be leveraged to correct inferences for biases caused by non-ignorable attrition. We present such a model when the panel includes many categorical survey variables. The model relies on a Bayesian latent pattern mixture model, in which an indicator for attrition and the survey variables are modeled jointly via a latent class model. We allow the multinomial probabilities within classes to depend on the attrition indicator, which offers additional flexibility over standard applications of latent class models. We present results of simulation studies that illustrate the benefits of this flexibility. We apply the model to correct attrition bias in an analysis of data from the 2007-2008 Associated Press/Yahoo News election panel study.
Political Analysis | 2015
Yajuan Si; D. Sunshine Hillygus
Archive | 2012
Yajuan Si
arXiv: Methodology | 2017
Yajuan Si; Rob Trangucci; Jonah Gabry; Andrew Gelman
Archive | 2015
Susanna Makela; Yajuan Si; Andrew Gelman