Wenyang Zhang
University of York
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Wenyang Zhang.
Journal of the American Statistical Association | 2009
Ming-Yen Cheng; Wenyang Zhang; Lu-Hung Chen
Multiparameter likelihood models (MLMs) with multiple covariates have a wide range of applications; however, they encounter the “curse of dimensionality” problem when the dimension of the covariates is large. We develop a generalized multiparameter likelihood model that copes with multiple covariates and adapts to dynamic structural changes well. It includes some popular models, such as the partially linear and varying-coefficient models, as special cases. We present a simple, effective two-step method to estimate both the parametric and the nonparametric components when the model is fixed. The proposed estimator of the parametric component has the n −1/2convergence rate, and the estimator of the nonparametric component enjoys an adaptivity property. We suggest a data-driven procedure for selecting the bandwidths, and propose an initial estimator in profile likelihood estimation of the parametric part to ensure stability of the approach in general settings. We further develop an automatic procedure to identify constant parameters in the underlying model. We provide a simulation study and an application to infant mortality data of China to demonstrate the performance of our proposed method.
Annals of Statistics | 2009
Wenyang Zhang; Jianqing Fan; Yan Sun
In the analysis of cluster data the regression coefficients are frequently assumed to be the same across all clusters. This hampers the ability to study the varying impacts of factors on each cluster. In this paper, a semiparametric model is introduced to account for varying impacts of factors over clusters by using cluster-level covariates. It achieves the parsimony of parametrization and allows the explorations of nonlinear interactions. The random effect in the semiparametric model accounts also for within cluster correlation. Local linear based estimation procedure is proposed for estimating functional coefficients, residual variance, and within cluster correlation matrix. The asymptotic properties of the proposed estimators are established and the method for constructing simultaneous confidence bands are proposed and studied. In addition, relevant hypothesis testing problems are addressed. Simulation studies are carried out to demonstrate the methodological power of the proposed methods in the finite sample. The proposed model and methods are used to analyse the second birth interval in Bangladesh, leading to some interesting findings.
Annals of Statistics | 2007
Yan Sun; Wenyang Zhang; Howell Tong
Longitudinal studies are often conducted to explore the cohort and age effects in many scientific areas. The within cluster correlation structure plays a very important role in longitudinal data analysis. This is because not only can an estimator be improved by incorporating the within cluster correlation structure into the estimation procedure, but also the within cluster correlation structure can sometimes provide valuable insights in practical problems. For example, it can reveal the correlation strengths among the impacts of various factors. Motivated by data typified by a set from Bangladesh pertinent to the use of contraceptives, we propose a random effect varying-coefficient model, and an estimation procedure for the within cluster correlation structure of the proposed model. The estimation procedure is optimization-free and the proposed estimators enjoy asymptotic normality under mild conditions. Simulations suggest that the proposed estimation is practicable for finite samples and resistent against mild forms of model m is specification. Finally, we analyze the data mentioned above with the new random effect varying-coefficient model together with the proposed estimation procedure, which reveals some interesting sociological dynamics.
Annals of Statistics | 2014
Yan Sun; Hongjia Yan; Wenyang Zhang; Zudi Lu
Stimulated by the Boston house price data, in this paper, we propose a semiparametric spatial dynamic model, which extends the ordinary spatial autoregressive models to accommodate the effects of some covariates associated with the house price. A profile likelihood based estimation procedure is proposed. The asymptotic normality of the proposed estimators are derived. We also investigate how to identify the parametric/nonparametric components in the proposed semiparametric model. We show how many unknown parameters an unknown bivariate function amounts to, and propose an AIC/BIC of nonparametric version for model selection. Simulation studies are conducted to examine the performance of the proposed methods. The simulation results show our methods work very well. We finally apply the proposed methods to analyze the Boston house price data, which leads to some interesting findings
Journal of the American Statistical Association | 2011
Jialiang Li; Wenyang Zhang
Motivated by an investigation of the relationship between blood pressure change and progression of microalbuminuria (MA) among individuals with type I diabetes, we propose a new semiparametric threshold model for censored longitudinal data analysis. We also study a new semiparametric Bayes information criterion-type criterion for identifying the parametric component of the proposed model. Cluster effects in the model are implemented as unknown fixed effects. Asymptotic properties are established for the proposed estimators. A quadratic approximation used to implement the estimation procedure makes the method very easy to implement by avoiding the computation of multiple integrals and the need for iterative algorithms. Simulation studies show that the proposed methods work well in practice. An illustration using the Wisconsin Diabetes Registry dataset suggests some interesting findings.
Journal of Multivariate Analysis | 2010
Wenyang Zhang; Heng Peng
Generalised varying-coefficient models (GVC) are very important models. There are a considerable number of literature addressing these models. However, most of the existing literature are devoted to the estimation procedure. In this paper, we systematically investigate the statistical inference for GVC, which includes confidence band as well as hypothesis test. We establish the asymptotic distribution of the maximum discrepancy between the estimated functional coefficient and the true functional coefficient. We compare different approaches for the construction of confidence band and hypothesis test. Finally, the proposed statistical inference methods are used to analyse the data from China about contraceptive use there, which leads to some interesting findings.
Journal of Nonparametric Statistics | 2011
Jialiang Li; Wenyang Zhang; Zhengxiao Wu
We study the general problem of bandwidth selection in semiparametric regression. By expanding the higher-order terms in the Taylor series for the asymptotic mean-squared error, we provide a theoretical justification for the earlier empirical observations of an optimal zone of bandwidths in the literature. Based on the idea of cross-validating parametrical estimates, we further introduce a novel bandwidth selector for semiparametric models. The method is demonstrated by numerical studies to be able to preserve the selected bandwidth within the optimal zone. This data-driven cross-validation method may also be applicable for model diagnosis and longitudinal data settings. Examples from two clinical trials are provided to illustrate the applications.
Annals of Statistics | 2015
Degui Li; Yuan Ke; Wenyang Zhang
In this paper, we study the model selection and structure specification for the generalised semi-varying coefficient models (GSVCMs), where the number of potential covariates is allowed to be larger than the sample size.We first propose a penalised likelihood method with the LASSO penalty function to obtain the preliminary estimates of the functional coefficients. Then, using the quadratic approximation for the local log-likelihood function and the adaptive group LASSO penalty (or the local linear approximation of the group SCAD penalty) with the help of the preliminary estimation of the functional coefficients, we introduce a novel penalised weighted least squares procedure to select the significant covariates and identify the constant coefficients among the coefficients of the selected covariates, which could thus specify the semiparametric modelling structure. The developed model selection and structure specification approach not only inherits many nice statistical properties from the local maximum likelihood estimation and nonconcave penalised likelihood method, but also computationally attractive thanks to the computational algorithm that is proposed to implement our method. Under some mild conditions, we establish the asymptotic properties for the proposed model selection and estimation procedure such as the sparsity and oracle property.We also conduct simulation studies to examine the finite sample performance of the proposed method, and finally apply the method to analyse a real data set, which leads to some interesting findings.
Lifetime Data Analysis | 2016
Xiaochao Xia; Binyan Jiang; Jialiang Li; Wenyang Zhang
High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for model building. Confounders, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. We propose covariate-adjusted screening and variable selection procedures under the accelerated failure time model. While penalizing the high-dimensional coefficients to achieve parsimonious model forms, our procedure also properly adjust the low-dimensional confounder effects to achieve more accurate estimation of regression coefficients. We establish the asymptotic properties of our proposed methods and carry out simulation studies to assess the finite sample performance. Our methods are illustrated with a real gene expression data analysis where proper adjustment of confounders produces more meaningful results.
British Journal of Mathematical and Statistical Psychology | 2002
Sik-Yum Lee; Wenyang Zhang; Xin-Yuan Song
This paper describes a two-step procedure for estimating the covariance function and its eigenvalues and eigenfunctions in situations where the data are curves or functions. The first step produces initial estimates of eigenfunctions using a standard principal components analysis. At the second step, these initial estimates are smoothed via local polynomial fitting, with the bandwidth in the kernel function being selected by a data-driven procedure. The results of a simulation study and three real examples are presented to illustrate the performance of the proposed methodology.