Is this you? Create Your Porfile

Yun Zhu

University of Texas at Austin

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yun Zhu is active.

Explore More

Publication

Featured researches published by Yun Zhu.

European Journal of Human Genetics | 2010

Gene and pathway-based second-wave analysis of genome-wide association studies

Gang Peng; Li Luo; Hoicheong Siu; Yun Zhu; Pengfei Hu; Shengjun Hong; Jinying Zhao; Xiaodong Zhou; John D. Reveille; Li Jin; Christopher I. Amos; Momiao Xiong

Despite the great success of genome-wide association studies (GWAS) in identification of the common genetic variants associated with complex diseases, the current GWAS have focused on single-SNP analysis. However, single-SNP analysis often identifies only a few of the most significant SNPs that account for a small proportion of the genetic variants and offers only a limited understanding of complex diseases. To overcome these limitations, we propose gene and pathway-based association analysis as a new paradigm for GWAS. As a proof of concept, we performed a comprehensive gene and pathway-based association analysis of 13 published GWAS. Our results showed that the proposed new paradigm for GWAS not only identified the genes that include significant SNPs found by single-SNP analysis, but also detected new genes in which each single SNP conferred a small disease risk; however, their joint actions were implicated in the development of diseases. The results also showed that the new paradigm for GWAS was able to identify biologically meaningful pathways associated with the diseases, which were confirmed by a gene-set-rich analysis using gene expression data.

European Journal of Human Genetics | 2010

Genome-wide gene and pathway analysis.

Li Luo; Gang Peng; Yun Zhu; Hua Dong; Christopher I. Amos; Momiao Xiong

Current GWAS have primarily focused on testing association of single SNPs. To only test for association of single SNPs has limited utility and is insufficient to dissect the complex genetic structure of many common diseases. To meet conceptual and technical challenges raised by GWAS, we suggest gene and pathway-based GWAS as complementary to the current single SNP-based GWAS. This publication develops three statistics for testing association of genes and pathways with disease: linear combination test, quadratic test and decorrelation test, which take correlations among SNPs within a gene or genes within a pathway into account. The null distribution of the suggested statistics is examined and the statistics are applied to GWAS of rheumatoid arthritis in the Wellcome Trust Case–Control Consortium and the North American Rheumatoid Arthritis Consortium studies. The preliminary results show that the suggested gene and pathway-based GWAS offer several remarkable features. First, not only can they identify the genes that have large genetic effects, but also they can detect new genes in which each single SNP conferred a small amount of disease risk, and their joint actions can be implicated in the development of diseases. Second, gene and pathway-based analysis can allow the formation of the core of pathway definition of complex diseases and unravel the functional bases of an association finding. Third, replication of association findings at the gene or pathway level is much easier than replication at the individual SNP level.

PLOS Genetics | 2010

A Novel Statistic for Genome-Wide Interaction Analysis

Xuesen Wu; Hua Dong; Li Luo; Yun Zhu; Gang Peng; John D. Reveille; Momiao Xiong

Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were seen in two independent studies of psoriasis. These included five interacting pairs of SNPs in genes LST1/NCR3, CXCR5/BCL9L, and GLS2, some of which were located in the target sites of miR-324-3p, miR-433, and miR-382, as well as 15 pairs of interacting SNPs that had nonsynonymous substitutions. Our results demonstrated that genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

American Journal of Human Genetics | 2012

Family-Based Association Studies for Next-Generation Sequencing

Yun Zhu; Momiao Xiong

An individuals disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.

Journal of Medical Genetics | 2012

Quantitative trait locus analysis for next-generation sequencing with the functional linear models

Li Luo; Yun Zhu; Momiao Xiong

Background Although in the past few years we have witnessed the rapid development of novel statistical methods for association studies of qualitative traits using next generation sequencing (NGS) data, only a few statistics are proposed for testing the association of rare variants with quantitative traits. The quantitative trait locus (QTL) analysis of rare variants remains challenging. Analysis from low dimensional data to high dimensional genomic data demands changes in statistical methods from multivariate data analysis to functional data analysis. Methods We propose a functional linear model (FLM) as a general principle for developing novel and powerful QTL analysis methods designed for resequencing data. By simulations we calculated the type I error rates and evaluated the power of the FLM and other eight existing statistical methods, even in the presence of both positive and negative signs of effects. Results Since the FLM retains all of the genetic information in the data and explores the merits of both variant-by-variant and collective analysis and overcomes their limitation, the FLM has a much higher power than other existing statistics in all the scenarios considered. To evaluate its performance further, the FLM was applied to association analysis of six quantitative traits in the Dallas Heart Study, and RNA-seq eQTL analysis with genetic variation in the low coverage resequencing data of the 1000 Genomes Project. Real data analysis showed that the FLM had much smaller p values to identify significantly associated variants than other existing methods. Conclusions The FLM is expected to open a new route for QTL analysis.

BMC Genomics | 2011

Implication of next-generation sequencing on association studies

Hoicheong Siu; Yun Zhu; Li Jin; Momiao Xiong

BackgroundNext-generation sequencing technologies can effectively detect the entire spectrum of genomic variation and provide a powerful tool for systematic exploration of the universe of common, low frequency and rare variants in the entire genome. However, the current paradigm for genome-wide association studies (GWAS) is to catalogue and genotype common variants (5% < MAF). The methods and study design for testing the association of low frequency (0.5% < MAF ≤ 5%) and rare variation (MAF ≤ 0.5%) have not been thoroughly investigated. The 1000 Genomes Project represents one such endeavour to characterize the human genetic variation pattern at the MAF = 1% level as a foundation for association studies. In this report, we explore different strategies and study designs for the near future GWAS in the post-era, based on both low coverage pilot data and exon pilot data in 1000 Genomes Project.ResultsWe investigated the linkage disequilibrium (LD) pattern among common and low frequency SNPs and its implication for association studies. We found that the LD between low frequency alleles and low frequency alleles, and low frequency alleles and common alleles are much weaker than the LD between common and common alleles. We examined various tagging designs with and without statistical imputation approaches and compare their power against de novo resequencing in mapping causal variants under various disease models. We used the low coverage pilot data which contain ~14 M SNPs as a hypothetical genotype-array platform (Pilot 14 M) to interrogate its impact on the selection of tag SNPs, mapping coverage and power of association tests. We found that even after imputation we still observed 45.4% of low frequency SNPs which were untaggable and only 67.7% of the low frequency variation was covered by the Pilot 14 M array.ConclusionsThis suggested GWAS based on SNP arrays would be ill-suited for association studies of low frequency variation.

European Journal of Human Genetics | 2013

Smoothed functional principal component analysis for testing association of the entire allelic spectrum of genetic variation

Li Luo; Yun Zhu; Momiao Xiong

Fast and cheaper next-generation sequencing technologies will generate unprecedentedly massive and highly dimensional genetic variation data that allow nearly complete evaluation of genetic variation including both common and rare variants. There are two types of association tests: variant-by-variant test and group test. The variant-by-variant test is designed to test the association of common variants, while the group test is suitable to collectively test the association of multiple rare variants. We propose here a smoothed functional principal component analysis (SFPCA) statistic as a general approach for testing association of the entire allelic spectrum of genetic variation (both common and rare variants), which utilizes the merits of both variant-by-variant analysis and group tests. By intensive simulations, we demonstrate that the SFPCA statistic has the correct type 1 error rates and much higher power than the existing methods to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants and (4) variants with opposite directions of effects. To further evaluate its performance, the SFPCA statistic is applied to ANGPTL4 sequence and six continuous phenotypes data from the Dallas Heart Study as an example for testing association of rare variants and a GWAS of schizophrenia data as an example for testing association of common variants. The results show that the SFPCA statistic has much smaller P-values than many existing statistics in both real data analysis examples.

European Journal of Human Genetics | 2015

Pathway analysis with next-generation sequencing data

Jinying Zhao; Yun Zhu; Eric Boerwinkle; Momiao Xiong

Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.

BMC Genomics | 2012

Weighted pedigree-based statistics for testing the association of rare variants

Yin Yao Shugart; Yun Zhu; Wei Guo; Momiao Xiong

BackgroundWith the advent of next-generation sequencing (NGS) technologies, researchers are now generating a deluge of data on high dimensional genomic variations, whose analysis is likely to reveal rare variants involved in the complex etiology of disease. Standing in the way of such discoveries, however, is the fact that statistics for rare variants are currently designed for use with population-based data. In this paper, we introduce a pedigree-based statistic specifically designed to test for rare variants in family-based data. The additional power of pedigree-based statistics stems from the fact that while rare variants related to diseases or traits of interest occur only infrequently in populations, in families with multiple affected individuals, such variants are enriched. Note that while the proposed statistic can be applied with and without statistical weighting, our simulations show that its power increases when weighting (WSS and VT) are applied.ResultsOur working hypothesis was that, since rare variants are concentrated in families with multiple affected individuals, pedigree-based statistics should detect rare variants more powerfully than population-based statistics. To evaluate how well our new pedigree-based statistics perform in association studies, we develop a general framework for sequence-based association studies capable of handling data from pedigrees of various types and also from unrelated individuals. In short, we developed a procedure for transforming population-based statistics into tests for family-based associations. Furthermore, we modify two existing tests, the weighted sum-square test and the variable-threshold test, and apply both to our family-based collapsing methods. We demonstrate that the new family-based tests are more powerful than corresponding population-based test and they generate a reasonable type I error rate.To demonstrate feasibility, we apply the newly developed tests to a pedigree-based GWAS data set from the Framingham Heart Study (FHS). FHS-GWAS data contain approximately 5000 uncommon variants with frequencies less than 0.05. Potential association findings in these data demonstrate the feasibility of the software PB-STAR (note, PB-STAR is now freely available to the public).ConclusionOur tests show that when analyzing for rare variants, a pedigree-based design is more powerful than a population-based case–control design. We further demonstrate that a pedigree-based statistic’s power to detect rare variants increases in direct relation to the proportion of affected individuals within the pedigree.

European Journal of Human Genetics | 2016

Genome-wide gene–gene interaction analysis for next-generation sequencing

Jinying Zhao; Yun Zhu; Momiao Xiong

The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI’s Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study.

Explore More