Yu-ng Su
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yu-ng Su.
The Annals of Applied Statistics | 2008
Andrew Gelman; Aleks Jakulin; Maria Grazia Pittau; Yu-Sung Su
We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors. We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation. We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.
Computational Statistics & Data Analysis | 2008
Yu-Sung Su
The purpose of default settings in a graphic tool is to make it easy to produce good graphics that accord with the principles of statistical graphics, e.g., [Tufte, E.R., 1990. Envisioning Information. Graphics Press, Cheshire, Conn, Tufte, E.R., 1997. Visual Explanations: Images and Quantities, Evidence and Narrative, 2nd Edition. Graphics Press, Cheshire, Conn, Cleveland, W.S., 1993. Visualizing Data. Hobart Press, N.J Cleveland, W.S., 1994. The Elements of Graphing Data, rev. edition. AT&T Bell Laboratories, Murray Hill, N.J, Wainer, H., 1997. Visual revelations: Graphical tales of fate and deception from Napoleon to Ross Perot. Copernicus, New York, Spence, R., 2001. Information Visualization. ACM Press & AddisonWesley, New York, and Few, S., 2004. Show Me the Numbers. Analytic Press, Hillsdale, NJ]. If the defaults do not embody these principles, then the only way to produce good graphics is to be sufficiently familiar with the principles of statistical graphics. This paper shows that Excel graphics defaults do not embody the appropriate principles. Users who want to use Excel are advised to know the principles of good graphics well enough so that they can choose the appropriate options to override the defaults. Microsoft^(R) should overhaul the Excel graphics engine so that its defaults embody the principles of statistical graphics and make it easy for non-experts to produce good graphs.
Archive | 2007
Andrew Gelman; Aleks Jakulin; Yu-Sung Su; M. Grazia Pittau
We propose a new prior distribution for classical (non-hierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. We implement a procedure to fit generalized linear models in R with this prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several examples, including a series of logistic regressions predicting voting preferences, an imputation model for a public health data set, and a hierarchical logistic regression in epidemiology. We recommend this default prior distribution for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small) and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.
The Annals of Applied Statistics | 2013
Jennifer Hill; Yu-Sung Su
Causal inference in observational studies typically requires making comparisons between groups that are dissimilar. For instance, researchers investigating the role of a prolonged duration of breastfeeding on child outcomes may be forced to make comparisons between women with substantially different characteristics on average. In the extreme there may exist neighborhoods of the covariate space where there are not sufficient numbers of both groups of women (those who breastfed for prolonged periods and those who did not) to make inferences about those women. This is referred to as lack of common support. Problems can arise when we try to estimate causal effects for units that lack common support, thus we may want to avoid inference for such units. If ignorability is satisfied with respect to a set of potential confounders, then identifying whether, or for which units, the common support assumption holds is an empirical question. However, in the high-dimensional covariate space often required to satisfy ignorability such identification may not be trivial. Existing methods used to address this problem often require reliance on parametric assumptions and most, if not all, ignore the information embedded in the response variable. We distinguish between the concepts of “common support” and “common causal support.” We propose a new approach for identifying common causal support that addresses some of the shortcomings of existing methods. We motivate and illustrate the approach using data from the National Longitudinal Survey of Youth to estimate the effect of breastfeeding at least nine months on reading and math achievement scores at age five or six. We also evaluate the comparative performance of this method in hypothetical examples and simulations where the true treatment effect is known.
Archive | 2006
Andrew Gelman; Yu-Sung Su
Journal of Statistical Software | 2011
Yu-Sung Su; Andrew Gelman; Jennifer Hill; Masanao Yajima
Research Policy | 2013
Wei Hong; Yu-Sung Su
Biometrika | 2014
Jingchen Liu; Andrew Gelman; Jennifer Hill; Yu-Sung Su; Jonathan Kropko
Archive | 2015
Yu-Sung Su; Ben Goodrich; Jon Kropko
Archive | 2010
Yu-Sung Su; Andrew Gelman