Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Zhaogong Zhang is active.

Publication


Featured researches published by Zhaogong Zhang.


Genetic Epidemiology | 2008

An ensemble learning approach jointly modeling main and interaction effects in genetic association studies.

Zhaogong Zhang; Shuanglin Zhang; Man Yu Wong; Nicholas J. Wareham; Qiuying Sha

Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene‐gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for “base learners” and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P‐value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single‐marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi‐locus methods in almost all cases. In an application to a large‐scale case‐control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi‐locus effect (P‐value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two‐locus combinations showed significant two‐locus interaction effects. Genet. Epidemiol.


Genetic Epidemiology | 2011

An improved score test for genetic association studies.

Qiuying Sha; Zhaogong Zhang; Shuanglin Zhang

Large‐scale genome‐wide association studies (GWAS) have become feasible recently because of the development of bead and chip technology. However, the success of GWAS partially depends on the statistical methods that are able to manage and analyze this sort of large‐scale data. Currently, the commonly used tests for GWAS include the Cochran–Armitage trend test, the allelic χ2 test, the genotypic χ2 test, the haplotypic χ2 test, and the multi‐marker genotypic χ2 test among others. From a methodological point of view, it is a great challenge to improve the power of commonly used tests, since these tests are commonly used precisely because they are already among the most powerful tests. In this article, we propose an improved score test that is uniformly more powerful than the score test based on the generalized linear model. Since the score test based on the generalized linear model includes the aforementioned commonly used tests as its special cases, our proposed improved score test is thus uniformly more powerful than these commonly used tests. We evaluate the performance of the improved score test by simulation studies and application to a real data set. Our results show that the power increases of the improved score test over the score test cannot be neglected in most cases. Genet. Epidemiol. 2011.


BMC Genetics | 2007

A multi-marker test based on family data in genome-wide association study

Zhaogong Zhang; Shuanglin Zhang; Qiuying Sha

BackgroundComplex diseases are believed to be the results of many genes and environmental factors. Hence, multi-marker methods that can use the information of markers from different genes are appropriate for mapping complex disease genes. There already have been several multi-marker methods proposed for case-control studies. In this article, we propose a multi-marker test called a Multi-marker Pedigree Disequilibrium Test (MPDT) to analyze family data from genome-wide association studies. If the parental phenotypes are available, we also propose a two-stage test in which a genomic screening test is used to select SNPs, and then the MPDT is used to test the association of the selected SNPs.ResultsWe use simulation studies to evaluate the performance of the MPDT and the two-stage approach. The results show that the MPDT constantly outperforms the single marker transmission/disequilibrium test (TDT) [1]. Comparing the power of the two-stage approach with that of the one-stage approach, which approach is more powerful depends on the value of the prevalence; when the prevalence is no less than 10%, the two-stage approach may be more powerful than the one-stage approach. Otherwise, the one-stage approach is more powerful.ConclusionThe proposed MPDT, is more powerful than the single marker TDT. When the parental phenotypes are available and the prevalence is no less than 10%, the proposed two-stage approach is more powerful than the one-stage approach.


PLOS ONE | 2011

Joint Analysis for Genome-Wide Association Studies in Family-Based Designs

Qiuying Sha; Zhaogong Zhang; Shuanglin Zhang

In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683–691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209–213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification.


BMC Proceedings | 2007

Genome-wide association tests by two-stage approaches with unified analysis of families and unrelated individuals.

Xuexia Wang; Zhaogong Zhang; Shuanglin Zhang; Qiuying Sha

Multiple testing is a problem in genome-wide or region-wide association studies. In this report, we consider a study design given by the Genetic Analysis Workshop 15 (GAW15) Problem 3 – nuclear families (parents with their affected children) and unrelated controls. Based on this design, we propose three two-stage approaches to deal with the problem of multiple testing. The tests in the first stage, statistically independent of the association test used in the second stage, are used to screen or select single-nucleotide polymorphisms (SNPs). Then, in the second stage, a family-based association test is performed on a much smaller set of selected SNPs. Thus, the problem of multiple testing is much less severe. Our simulation studies and application to the dense SNP data of chromosome 6 in the GAW15 Problem 3 show that the two-stage methods are more powerful than the one-stage method (using the family-based association test only).


Genetic Epidemiology | 2011

Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini‐exome data

Joan E. Bailey-Wilson; Jennifer S. Brennan; Shelley B. Bull; Robert Culverhouse; Yoonhee Kim; Yuan Jiang; Jeesun Jung; Qing Li; Claudia Lamina; Ying Liu; Reedik Mägi; Yue S. Niu; Claire L. Simpson; Libo Wang; Yildiz E. Yilmaz; Heping Zhang; Zhaogong Zhang

Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus‐specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population‐specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow‐up in the presence of extreme locus heterogeneity and large numbers of potential predictors. Genet. Epidemiol. 35:S92–S100, 2011.


BMC Proceedings | 2011

Detection of rare variant effects in association studies: extreme values, iterative regression, and a hybrid approach

Zhaogong Zhang; Qiuying Sha; Xinli Wang; Shuanglin Zhang

We develop statistical methods for detecting rare variants that are associated with quantitative traits. We propose two strategies and their combination for this purpose: the iterative regression strategy and the extreme values strategy. In the iterative regression strategy, we use iterative regression on residuals and a multimarker association test to identify a group of significant variants. In the extreme values strategy, we use individuals with extreme trait values to select candidate genes and then test only these candidate genes. These two strategies are integrated into a hybrid approach through a weighting technology. We apply the proposed methods to analyze the Genetic Analysis Workshop 17 data set. The results show that the hybrid approach is the most powerful approach. Using the hybrid approach, the average power to detect causal genes for Q1 is about 40% and the powers to detect FLT1 and KDR are 100% and 68% for Q1, respectively. The powers to detect VNN3 and BCHE are 34% and 30% for Q2, respectively.


Annals of Human Genetics | 2010

Identification of Interacting Genes in Genome‐Wide Association Studies Using a Model‐Based Two‐Stage Approach

Zhaogong Zhang; Adan Niu; Qiuying Sha

In this paper, we propose a two‐stage approach based on 17 biologically plausible models to search for two‐locus combinations that have significant joint effects on the disease status in genome‐wide association (GWA) studies. In the two‐stage analyses, we only test two‐locus joint effects of SNPs that show modest marginal effects. We use simulation studies to compare the power of our two‐stage analysis with a single‐marker analysis and a two‐stage analysis by using a full model. We find that for most plausible interaction effects, our two‐stage analysis can dramatically increase the power to identify two‐locus joint effects compared to a single‐marker analysis and a two‐stage analysis based on the full model. We also compare two‐stage methods with one‐stage methods. Our simulation results indicate that two‐stage methods are more powerful than one‐stage methods. We applied our two‐stage approach to a GWA study for identifying genetic factors that might be relevant in the pathogenesis of sporadic Amyotrophic Lateral Sclerosis (ALS). Our proposed two‐stage approach found that two SNPs have significant joint effect on sporadic ALS while the single‐marker analysis and the two‐stage analysis based on the full model did not find any significant results.


BMC Proceedings | 2009

Application of seventeen two-locus models in genome-wide association studies by two-stage strategy

Adan Niu; Zhaogong Zhang; Qiuying Sha

The goal of this paper is to search for two-locus combinations that are jointly associated with rheumatoid arthritis using the data set of Genetic Analysis Workshop 16 Problem 1. We use a two-stage strategy to reduce the computational burden associated with performing an exhaustive two-locus search across the genome. In the first stage, the full set of 531,689 single-nucleotide polymorphisms was screened using univariate testing. In the second stage, all pairs made from the 500 single-nucleotide polymorphisms with the lowest p-values from the first stage were evaluated under each of 17 two-locus models. Our analyses identified a two-locus combination - rs6939589 and rs11634386 - that proved to be significantly associated with rheumatoid arthritis under a Rec × Rec model (p-value = 0.045 after adjusting for multiple tests and multiple models).


Annals of Human Genetics | 2010

Identify Interaction Genes in Genome-Wide Association Studies Using a Model-Based Two-Stage Approach

Zhaogong Zhang; Adan Niu; Qiuying Sha

In this paper, we propose a two‐stage approach based on 17 biologically plausible models to search for two‐locus combinations that have significant joint effects on the disease status in genome‐wide association (GWA) studies. In the two‐stage analyses, we only test two‐locus joint effects of SNPs that show modest marginal effects. We use simulation studies to compare the power of our two‐stage analysis with a single‐marker analysis and a two‐stage analysis by using a full model. We find that for most plausible interaction effects, our two‐stage analysis can dramatically increase the power to identify two‐locus joint effects compared to a single‐marker analysis and a two‐stage analysis based on the full model. We also compare two‐stage methods with one‐stage methods. Our simulation results indicate that two‐stage methods are more powerful than one‐stage methods. We applied our two‐stage approach to a GWA study for identifying genetic factors that might be relevant in the pathogenesis of sporadic Amyotrophic Lateral Sclerosis (ALS). Our proposed two‐stage approach found that two SNPs have significant joint effect on sporadic ALS while the single‐marker analysis and the two‐stage analysis based on the full model did not find any significant results.

Collaboration


Dive into the Zhaogong Zhang's collaboration.

Top Co-Authors

Avatar

Qiuying Sha

Michigan Technological University

View shared research outputs
Top Co-Authors

Avatar

Shuanglin Zhang

Michigan Technological University

View shared research outputs
Top Co-Authors

Avatar

Adan Niu

Michigan Technological University

View shared research outputs
Top Co-Authors

Avatar

Claire L. Simpson

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jeesun Jung

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar

Jennifer C. Schymick

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Joan E. Bailey-Wilson

National Institutes of Health

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge