Xiangrong Yin
University of Georgia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Xiangrong Yin.
Australian & New Zealand Journal of Statistics | 2001
R. Dennis Cook; Xiangrong Yin
This paper discusses visualization methods for discriminant analysis. It does not address numerical methods for classification per se, but rather focuses on graphical methods that can be viewed as pre-processors, aiding the analysts understanding of the data and the choice of a final classifier. The methods are adaptations of recent results in dimension reduction for regression, including sliced inverse regression and sliced average variance estimation. A permutation test is suggested as a means of determining dimension, and examples are given throughout the discussion.
Journal of The Royal Statistical Society Series B-statistical Methodology | 2002
Xiangrong Yin; R. Dennis Cook
The idea of dimension reduction without loss of information can be quite helpful for guiding the construction of summary plots in regression without requiring a prespecified model. Central subspaces are designed to capture all the information for the regression and to provide a population structure for dimension reduction. Here, we introduce the central kth-moment subspace to capture information from the mean, variance and so on up to the kth conditional moment of the regression. New methods are studied for estimating these subspaces. Connections with sliced inverse regression are established, and examples illustrating the theory are presented.
Computational Statistics & Data Analysis | 2008
Qin Wang; Xiangrong Yin
Traditional variable selection methods are model based and may suffer from possible model misspecification. On the other hand, sufficient dimension reduction provides us with a way to find sufficient dimensions without a parametric model. However, the drawback is that each reduced variable is a linear combination of all the original variables, which may be difficult to interpret. In this paper, focusing on the sufficient dimensions in the regression mean function, we combine the ideas of sufficient dimension reduction and variable selection to propose a shrinkage estimation method, sparse MAVE. The sparse MAVE can exhaustively estimate dimensions in the mean function, while selecting informative covariates simultaneously without assuming any particular model or particular distribution on the predictor variables. Furthermore, we propose a modified BIC criterion for effectively estimating the dimension of the mean function. The efficacy of sparse MAVE is verified through simulation studies and via analysis of a real data set.
Journal of Multivariate Analysis | 2004
Xiangrong Yin
In this article, we propose a new canonical correlation method based on information theory. This method examines potential nonlinear relationships between p × 1 vector Y-set and q × 1 vector X-set. It finds canonical coefficient vectors a and b by maximizing a more general measure, the mutual information, between aTX and bTY. We use a permutation test to determine the pairs of the new canonical correlation variates, which requires no specific distributions for X and Y as long as one can estimate the densities of aTX and bTY nonparametrically. Examples illustrating the new method are presented.
Computational Statistics & Data Analysis | 2002
Douglas M. Hawkins; Xiangrong Yin
Regression data sets typically have many more cases than variables, but this is not always the case. Some current problems in chemometrics--for example fitting quantitative structure activity relationships--may involve fitting linear models to data sets in which the number of predictors far exceeds the number of cases. Ridge regression is an approach that has some theoretical foundation and has performed well in comparison with alternatives such as PLS and subset regression. Direct implementation of the regression formulation leads to a O(np2 + p3) calculation, which is substantial if p is large. We show that ridge regression may be performed in a O(np2) computation--a potentially large saving when p is larger than n. The algorithm lends itself to the use of case weights, to robust bounded influence fitting, and cross-validation. The method is illustrated with a chemometric data set with 255 predictors, but only 18 cases, a ratio not unusual in QSAR problems.
Annals of Statistics | 2011
Xiangrong Yin; Bing Li
We introduce a class of dimension reduction estimators based on an ensemble of the minimum average variance estimates of functions that characterize the central subspace, such as the characteristic functions, the Box―Cox transformations and wavelet basis. The ensemble estimators exhaustively estimate the central subspace without imposing restrictive conditions on the predictors, and have the same convergence rate as the minimum average variance estimates. They are flexible and easy to implement, and allow repeated use of the available sample, which enhances accuracy. They are applicable to both univariate and multivariate responses in a unified form. We establish the consistency and convergence rate of these estimators, and the consistency of a cross validation criterion for order determination. We compare the ensemble estimators with other estimators in a wide variety of models, and establish their competent performance.
Technometrics | 2007
Iain Pardoe; Xiangrong Yin; R. Dennis Cook
Sufficient dimension-reduction methods provide effective ways to visualize discriminant analysis problems. For example, Cook and Yin showed that the dimension-reduction method of sliced average variance estimation (SAVE) identifies variates that are equivalent to a quadratic discriminant analysis (QDA) solution. This article makes this connection explicit to motivate the use of SAVE variates in exploratory graphics for discriminant analysis. Classification can then be based on the SAVE variates using a suitable distance measure. If the chosen measure is Mahalanobis distance, then classification is identical to QDA using the original variables. Just as canonical variates provide a useful way to visualize linear discriminant analysis (LDA), so do SAVE variates help visualize QDA. This would appear to be particularly useful given the lack of graphical tools for QDA in current software. Furthermore, whereas LDA and QDA can be sensitive to nonnormality, SAVE is more robust.
Journal of Multivariate Analysis | 2013
Wenhui Sheng; Xiangrong Yin
We introduce a new method for estimating the direction in single-index models via distance covariance. Our method keeps model-free advantage as a dimension reduction approach. In addition, no smoothing technique is needed, which enables our method to work efficiently when many predictors are discrete or categorical. Under regularity conditions, we show that our estimator is root-n consistent and asymptotically normal. We compare the performance of our method with some dimension reduction methods and the single-index estimation method by simulations and show that our method is very competitive and robust across a number of models. Finally, we analyze the UCI Adult Data Set to demonstrate the efficacy of our method.
Journal of Computational and Graphical Statistics | 2004
Xiangrong Yin; R. Dennis Cook
The conditional mean of the response given the predictors is often of interest in regression problems. The central mean subspace, recently introduced by Cook and Li, allows inference about aspects of the mean function in a largely nonparametric context. We propose a marginal fourth moments method for estimating directions in the central mean subspace that might be missed by existing methods such as ordinary least squares (OLS) and principal Hessian directions (pHd). Our method, targeting higher order trends, particularly cubics, complements OLS and pHd because there is no inclusion among them. Theory, estimation and inferences as well as illustrative examples are presented.
Biometrics | 2010
Ross Iaci; T.N. Sriram; Xiangrong Yin
In this article, we propose a new generalized index to recover relationships between two sets of random vectors by finding the vector projections that minimize an L(2) distance between each projected vector and an unknown function of the other. The unknown functions are estimated using the Nadaraya-Watson smoother. Extensions to multiple sets and groups of multiple sets are also discussed, and a bootstrap procedure is developed to detect the number of significant relationships. All the proposed methods are assessed through extensive simulations and real data analyses. In particular, for environmental data from Los Angeles County, we apply our multiple-set methodology to study relationships between mortality, weather, and pollutants vectors. Here, we detect existence of both linear and nonlinear relationships between the dimension-reduced vectors, which are then used to build nonlinear time-series regression models for the dimension-reduced mortality vector. These findings also illustrate potential use of our method in many other applications. A comprehensive assessment of our methodologies along with their theoretical properties are given in a Web Appendix.