Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Guohua Zou is active.

Publication


Featured researches published by Guohua Zou.


Journal of the American Statistical Association | 2011

Optimal Weight Choice for Frequentist Model Average Estimators

Hua Liang; Guohua Zou; Alan T.K. Wan; Xinyu Zhang

There has been increasing interest recently in model averaging within the frequentist paradigm. The main benefit of model averaging over model selection is that it incorporates rather than ignores the uncertainty inherent in the model selection process. One of the most important, yet challenging, aspects of model averaging is how to optimally combine estimates from different models. In this work, we suggest a procedure of weight choice for frequentist model average estimators that exhibits optimality properties with respect to the estimator’s mean squared error (MSE). As a basis for demonstrating our idea, we consider averaging over a sequence of linear regression models. Building on this base, we develop a model weighting mechanism that involves minimizing the trace of an unbiased estimator of the model average estimator’s MSE. We further obtain results that reflect the finite sample as well as asymptotic optimality of the proposed mechanism. A Monte Carlo study based on simulated and real data evaluates and compares the finite sample properties of this mechanism with those of existing methods. The extension of the proposed weight selection scheme to general likelihood models is also considered. This article has supplementary material online.


Genetics | 2006

Two-stage designs in case-control association analysis

Yijun Zuo; Guohua Zou; Hongyu Zhao

DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), ∼3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.


Journal of Systems Science & Complexity | 2009

Frequentist model averaging estimation: a review

Haiying Wang; Xinyu Zhang; Guohua Zou

In applications, the traditional estimation procedure generally begins with model selection. Once a specific model is selected, subsequent estimation is conducted under the selected model without consideration of the uncertainty from the selection process. This often leads to the underreporting of variability and too optimistic confidence sets. Model averaging estimation is an alternative to this procedure, which incorporates model uncertainty into the estimation process. In recent years, there has been a rising interest in model averaging from the frequentist perspective, and some important progresses have been made. In this paper, the theory and methods on frequentist model averaging estimation are surveyed. Some future research topics are also discussed.


Computational Statistics & Data Analysis | 2008

Improved AIC selection strategy for survival analysis

Hua Liang; Guohua Zou

In survival analysis, it is of interest to appropriately select significant predictors. In this paper, we extend the AIC(C) selection procedure of Hurvich and Tsai to survival models to improve the traditional AIC for small sample sizes. A theoretical verification under a special case of the exponential distribution is provided. Simulation studies illustrate that the proposed method substantially outperforms its counterpart: AIC, in small samples, and competes it in moderate and large samples. Two real data sets are also analyzed.


Annals of Human Genetics | 2008

Optimal Two-Stage Design for Case-Control Association Analysis Incorporating Genotyping Errors

Yijun Zuo; Guohua Zou; Jiexun Wang; Hongyu Zhao; Hua Liang

Two‐stage design is a cost effective approach for identifying disease genes in genetic studies and it has received much attention recently. In general, there are two types of two‐stage designs that differ on the methods and samples used to measure allele frequencies in the first stage: (1) Individual genotyping is used in the first stage; (2) DNA pooling is used in the first stage. In this paper, we focus on the latter. Zuo et al. (2006) investigated statistical power of such a design, among other things, but the cost of the study was not taken into account. The purpose of this paper is to study the optimal design under the given overall cost. We investigate how to allocate the resources to the two stages. Note that in addition to the measurement errors associated with DNA pooling, genotyping errors are also unavoidable with individual genotyping. Therefore, we discuss the optimal design combining genotyping errors associated with individual genotyping. The joint statistical distributions of test statistics in the first and second stages are derived. For a fixed cost, our results show that the optimal design requires no additional samples in the second stage but only that the samples in the first stage be re‐used. When the second stage uses an entirely independent sample, however, the optimal design under a given cost depends on the population allele frequency and allele frequency difference between the case and control groups. For the current genotyping costs, we can roughly allocate 1/3 to 1/2 of the total sample size to the first stage for screening.


Electronic Journal of Statistics | 2012

Model averaging for varying-coefficient partially linear measurement error models

Haiying Wang; Guohua Zou; Alan T.K. Wan

In a 2003 paper, Hjort and Claeskens proposed a framework for studying the limiting distributions and asymptotic risk properties of model average estimators under parametric models. They also suggested a simple method for constructing confidence intervals for the parameters of interest estimated by model averaging. The purpose of this paper is to broaden the scope of the aforementioned study to include a semi-parametric varyingcoefficient partially linear measurement error model. Within this context, we develop a model averaging scheme for the unknowns, derive the model average estimator’s asymptotic distribution, and develop a confidence interval procedure of the unknowns with an actual coverage probability that tends toward the nominal level in large samples. We further show that confidence intervals that are constructed based on the model average estimators are asymptotically the same as those obtained under the full model. A simulation study examines the finite sample performance of the model average estimators, and a real data analysis illustrates the application of the method in practice. AMS 2000 subject classifications: Primary 62E20; secondary 62F10, 62F12.


Journal of Econometrics | 2003

Optimal critical values of pre-tests when estimating the regression error variance: analytical findings under a general loss structure

Alan T.K. Wan; Guohua Zou

This paper re-visits the problem of estimating the regression error variance in a linear multiple regression model after preliminary hypothesis tests for either linear restrictions on the coefficients or homogeneity of variances. There is an extensive literature that discusses these problems, particularly in terms of the sampling properties of the pre-test estimators using various loss functions as the basis for risk analysis. In this paper, a unified framework for analysing the risk properties of these estimators is developed under a general class of loss structures that incorporates virtually all first-order differentiable losses. Particular consideration is given to the choice of critical values for the pre-tests. Analytical results indicate that an α-level substantially higher than those normally used may be appropriate for optimal risk properties under a wide range of loss functions. The paper also generalizes some known analytical results in the pre-test literature and proves other results only previously shown numerically.


Genetics | 2005

On the Sample Size Requirement in Genetic Association Tests When the Proportion of False Positives Is Controlled

Guohua Zou; Yijun Zuo

With respect to the multiple-tests problem, recently an increasing amount of attention has been paid to control the false discovery rate (FDR), the positive false discovery rate (pFDR), and the proportion of false positives (PFP). The new approaches are generally believed to be more powerful than the classical Bonferroni one. This article focuses on the PFP approach. It demonstrates via examples in genetic association studies that the Bonferroni procedure can be more powerful than the PFP-control one and also shows the intrinsic connection between controlling the PFP and controlling the overall type I error rate. Since controlling the PFP does not necessarily lead to a desired power level, this article addresses the design issue and recommends the sample sizes that can attain the desired power levels when the PFP is controlled. The results in this article also provide rough guidance for the sample sizes to achieve the desired power levels when the FDR and especially the pFDR are controlled.


Statistics & Probability Letters | 2000

Minimax and Γ-minimax estimation for the Poisson distribution under LINEX loss when the parameter space is restricted

Alan T.K. Wan; Guohua Zou; Andy H. Lee

This paper considers the problems of minimax and [Gamma]-minimax estimation under the LINEX loss function when the parameter space is restricted. A general property of the risk of the Bayes estimator with respect to the two-point prior is presented. Minimax and [Gamma]-minimax estimators of the parameter of the Poisson distribution are obtained when the parameter of interest is known to lie in a small parameter space.


Econometrics Journal | 2007

On the sensitivity of the restricted least squares estimators to covariance misspecification

Alan T.K. Wan; Guohua Zou; Huaizhen Qin

Traditional econometrics has long stressed the serious consequences of non-spherical disturbances for the estimation and testing procedures under the spherical disturbance setting, that is, the procedures become invalid and can give rise to misleading results. In practice, it is not unusual, however, to find that the parameter estimates do not change much after fitting the more general structure. This suggests that the usual procedures may well be robust to covariance misspecification. Banerjee and Magnus (1999) proposed sensitivity statistics to decide if the Ordinary Least Squares estimators of the coefficients and the disturbance variance are sensitive to deviations from the spherical error assumption. This paper extends their work by investigating the sensitivity of the restricted least squares estimator to covariance misspecification where the restrictions may or may not be correct. Large sample results giving analytical evidence to some of the numerical findings reported in Banerjee and Magnus (1999) are also obtained. Copyright Royal Economic Society 2007

Collaboration


Dive into the Guohua Zou's collaboration.

Top Co-Authors

Avatar

Alan T.K. Wan

City University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Xinyu Zhang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Hua Liang

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Rong Zhu

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Huaizhen Qin

Michigan Technological University

View shared research outputs
Top Co-Authors

Avatar

Yijun Zuo

Michigan State University

View shared research outputs
Top Co-Authors

Avatar

Jiexun Wang

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Haiying Wang

University of New Hampshire

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge