Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhaohai Li is active.

Publication


Featured researches published by Zhaohai Li.


Human Heredity | 2002

Trend Tests for Case-Control Studies of Genetic Markers: Power, Sample Size and Robustness

Boris Freidlin; Gang Zheng; Zhaohai Li; Joseph L. Gastwirth

The Cochran-Armitage trend test is commonly used as a genotype-based test for candidate gene association. Corresponding to each underlying genetic model there is a particular set of scores assigned to the genotypes that maximizes its power. When the variance of the test statistic is known, the formulas for approximate power and associated sample size are readily obtained. In practice, however, the variance of the test statistic needs to be estimated. We present formulas for the required sample size to achieve a prespecified power that account for the need to estimate the variance of the test statistic. When the underlying genetic model is unknown one can incur a substantial loss of power when a test suitable for one mode of inheritance is used where another mode is the true one. Thus, tests having good power properties relative to the optimal tests for each model are useful. These tests are called efficiency robust and we study two of them: the maximin efficiency robust test is a linear combination of the standardized optimal tests that has high efficiency and the MAX test, the maximum of the standardized optimal tests. Simulation results of the robustness of these two tests indicate that the more computationally involved MAX test is preferable.


Annals of Human Genetics | 2008

Efficient Approximation of P‐value of the Maximum of Correlated Tests, with Applications to Genome‐Wide Association Studies

Qizhai Li; Gang Zheng; Zhaohai Li; Kai Yu

Genome‐wide association study (GWAS), typically involving 100,000 to 500,000 single‐nucleotide polymorphisms (SNPs), is a powerful approach to identify disease susceptibility loci. In a GWAS, single‐marker analysis, which tests one SNP at a time, is usually used as the first stage to screen SNPs across the genome in order to identify a small fraction of promising SNPs with relatively low p‐values for further and more focused studies. For single‐marker analysis, the trend test derived for an additive genetic model is often used. This may not be robust when the additive assumption is not appropriate for the true underlying disease model. A robust test, MAX, based on the maximum of three trend test statistics derived for recessive, additive, and dominant models, has been proposed recently for GWAS. But its p‐value has to be evaluated through a resampling‐based procedure, which is computationally challenging for the analysis of GWAS. Obtaining the p‐value for MAX with adjustment for the covariates can be even more time‐consuming. In this article, we provide a simple approximation for the p‐value of the MAX test with or without adjusting for the covariates. The new method avoids resampling steps and thus makes the MAX test readily applicable to GWAS. We use simulation studies as well as real datasets on 17 confirmed disease‐associated SNPs to assess the accuracy of the proposed method. We also apply the method to the GWAS of coronary artery disease.


Genetic Epidemiology | 2013

Genetic Association With Multiple Traits in the Presence of Population Stratification

Ting Yan; Qizhai Li; Yuanzhang Li; Zhaohai Li; Gang Zheng

Testing association between a genetic marker and multiple‐dependent traits is a challenging task when both binary and quantitative traits are involved. The inverted regression model is a convenient method, in which the traits are treated as predictors although the genetic marker is an ordinal response. It is known that population stratification (PS) often affects population‐based association studies. However, how it would affect the inverted regression for pleiotropic association, especially with the mixed types of traits (binary and quantitative), is not examined and the performance of existing methods to correct for PS using the inverted regression analysis is unknown. In this paper, we focus on the methods based on genomic control and principal component analysis, and investigate type I error of pleiotropic association using the inverted regression model in the presence of PS with allele frequencies and the distributions (or disease prevalences) of multiple traits varying across the subpopulations. We focus on common alleles but simulation results for a rare variant are also reported. An application to the HapMap data is used for illustration.


Genetics | 2005

A Logistic Regression Mixture Model for Interval Mapping of Genetic Trait Loci Affecting Binary Phenotypes

Weiping Deng; Hanfeng Chen; Zhaohai Li

Often in genetic research, presence or absence of a disease is affected by not only the trait locus genotypes but also some covariates. The finite logistic regression mixture models and the methods under the models are developed for detection of a binary trait locus (BTL) through an interval-mapping procedure. The maximum-likelihood estimates (MLEs) of the logistic regression parameters are asymptotically unbiased. The null asymptotic distributions of the likelihood-ratio test (LRT) statistics for detection of a BTL are found to be given by the supremum of a χ2-process. The limiting null distributions are free of the null model parameters and are determined explicitly through only four (backcross case) or nine (intercross case) independent standard normal random variables. Therefore a threshold for detecting a BTL in a flanking marker interval can be approximated easily by using a Monte Carlo method. It is pointed out that use of a threshold incorrectly determined by reading off a χ2-probability table can result in an excessive false BTL detection rate much more severely than many researchers might anticipate. Simulation results show that the BTL detection procedures based on the thresholds determined by the limiting distributions perform quite well when the sample sizes are moderately large.


Annals of Human Genetics | 2003

Tests for Candidate‐Gene Association Using Case‐Parents Design

Gang Zheng; Z. Chen; Zhaohai Li

In the case‐parents design for testing candidate‐gene association, the conditional likelihood method based on genotype relative risks has been developed recently. A specific relation of the genotype relative risks is referred to as a genetic model. The efficient score tests have been used when the genetic model is correctly specified under the alternative hypothesis. In practice, however, it is usually not able to specify the genetic model correctly. In the latter situation, tests such as the likelihood ratio test (LRT) and the MAX3 (the maximum of the three score statistics for dominant, additive, and recessive models) have been used. In this paper, we consider the restricted likelihood ratio test (RLRT). For a specific genetic model, simulation results demonstrate that RLRT is asymptotically equivalent to the score test, and both are more powerful than the LRT. When the genetic model cannot be correctly specified, the simulation results show that RLRT is most robust and powerful in the situations we studied. MAX3 is the next most robust and powerful test. The TDT is the easiest statistic to compute, compared to MAX3 and RLRT. When the recessive model can be eliminated, it is also as robust and powerful as RLRT for other genetic models.


Biometrics | 2010

Impact of population substructure on trend tests for genetic case-control association studies.

Gang Zheng; Zhaohai Li; Mitchell H. Gail; Joseph L. Gastwirth

Hidden population substructure in case-control data has the potential to distort the performance of Cochran-Armitage trend tests (CATTs) for genetic associations. Three possible scenarios that may arise are investigated here: (i) heterogeneity of genotype frequencies across unidentified subpopulations (PSI), (ii) heterogeneity of genotype frequencies and disease risk across unidentified subpopulations (PSII), and (iii) cryptic correlations within unidentified subpopulations. A unified approach is presented for deriving the bias and variance distortion under the three scenarios for any CATT in a general family. Using these analytical formulas, we evaluate the excess type I errors of the CATTs numerically in the presence of population substructure. Our results provide insight into the properties of some proposed corrections for bias and variance distortion and show why they may not fully correct for the effects of population substructure.


Annals of Human Genetics | 2008

Two-stage group sequential robust tests in family-based association studies: controlling type I error.

Lihan K. Yan; Gang Zheng; Zhaohai Li

In family‐based association studies, an optimal test statistic with asymptotic normal distribution is available when the underlying genetic model is known (e.g., recessive, additive, multiplicative, or dominant). In practice, however, genetic models for many complex diseases are usually unknown. Using a single test statistic optimal for one genetic model may lose substantial power when the model is mis‐specified. When a family of genetic models is scientifically plausible, the maximum of several tests, each optimal for a specific genetic model, is robust against the model mis‐specification. This robust test is preferred over a single optimal test. Recently, cost‐effective group sequential approaches have been introduced to genetic studies. The group sequential approach allows interim analyses and has been applied to many test statistics, but not to the maximum statistic. When the group sequential method is applied, type I error should be controlled. We propose and compare several approaches of controlling type I error rates when group sequential analysis is conducted with the maximum test for family‐based candidate‐gene association studies. For a two‐stage group sequential robust procedure with a single interim analysis, two critical values for the maximum tests are provided based on a given alpha spending function to control the desired overall type I error.


Annals of Human Genetics | 2005

Power and Related Statistical Properties of Conditional Likelihood Score Tests for Association Studies in Nuclear Families with Parental Genotypes: Power of Conditional Likelihood Tests of Association

Zhaohai Li; Joseph L. Gastwirth; Mitchell H. Gail

Both population based and family based case control studies are used to test whether particular genotypes are associated with disease. While population based studies have more power, cryptic population stratification can produce false‐positive results. Family‐based methods have been introduced to control for this problem. This paper presents the full likelihood function for family‐based association studies for nuclear families ascertained on the basis of their number of affected and unaffected children. The likelihood of a family factors into the probability of parental mating type, conditional on offspring phenotypes, times the probability of offspring genotypes given their phenotypes and the parental mating type. The first factor can be influenced by population stratification, whereas the latter factor, called the conditional likelihood, is not. The conditional likelihood is used to obtain score tests with proper size in the presence of population stratification (see also Clayton (1999) and Whittemore & Tu (2000) ). Under either the additive or multiplicative model, the TDT is known to be the optimal score test when the family has only one affected child. Thus, the class of score tests explored can be considered as a general family of TDT‐like procedures. The relative informativeness of the various mating types is assessed using the Fisher information, which depends on the number of affected and unaffected offspring and the penetrances. When the additive model is true, families with parental mating type Aa×Aa are most informative. Under the dominant (recessive) model, however, a family with mating type Aa×aa(AA×Aa) is more informative than a family with doubly heterozygous (Aa×Aa) parents. Because we derive explicit formulae for all components of the likelihood, we are able to present tables giving required sample sizes for dominant, additive and recessive inheritance models.


Computational Statistics & Data Analysis | 2009

Excess false positive rate caused by population stratification and disease rate heterogeneity in case-control association studies

Zhaohai Li; Hong Zhang; Gang Zheng; Joseph L. Gastwirth; Mitchell H. Gail

Case-control association studies using unrelated cases and controls may suffer from potential confounding due to population stratification. Bias and variance distortion caused by population stratification in the commonly used allele-based tests can considerably inflate the Type I error rate. It is shown that the bias vanishes in the absence of disease rate heterogeneity. If only population stratification exists, a proper estimate of the variance of the allele-based test statistic is developed. Using this estimated variance yields a valid Type I error. However, when the frequencies of the allele under study and the disease rates differ among the subpopulations, it is difficult to correct for this bias. Explicit expressions for the excess false positive rate (EFPR) of the test due to bias and variance distortion are derived. It turns out that the bias created when both population stratification and disease rate heterogeneity are present usually has a greater effect on the EFPR than variance distortion. Comprehensive simulation studies strongly support these results.


Journal of Multivariate Analysis | 2013

Empirical and weighted conditional likelihoods for matched case-control studies with missing covariates

Tianqing Liu; Xiaohui Yuan; Zhaohai Li; Yuanzhang Li

In clinical and epidemiological studies, matched case-control designs have been used extensively to investigate the relationships between disease/response and exposure/covariate. Due to the retrospective nature of the study, some covariates may not be observed for all study subjects and missing covariate information may create bias and reduce the efficiency of the parameter estimates. We explore the use of profile empirical likelihood (EL) to cope with this situation by combining unbiased estimating equations when the number of estimating equations is greater than the number of unknown parameters. For high dimensional covariates, we propose a weighted conditional likelihood (WCL) method to solve the computational problem of the profile EL method. The proposed EL and WCL methods can achieve semiparametric efficiency if the probability of missingness is correctly specified. Based on the EL and WCL functions, we also develop Wilks’ type tests and corresponding confidence regions for the model parameters. A simulation study is conducted to assess the performance of the proposed methods in terms of robustness and efficiency.

Collaboration


Dive into the Zhaohai Li's collaboration.

Top Co-Authors

Avatar

Gang Zheng

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Qizhai Li

Chinese Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar

Joseph L. Gastwirth

George Washington University

View shared research outputs
Top Co-Authors

Avatar

Aiyi Liu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Boris Freidlin

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Kai Yu

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Mitchell H. Gail

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Yuanzhang Li

Walter Reed Army Institute of Research

View shared research outputs
Top Co-Authors

Avatar

Hong Qin

Central China Normal University

View shared research outputs
Researchain Logo
Decentralizing Knowledge