Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yongtao Guan is active.

Publication


Featured researches published by Yongtao Guan.


American Journal of Human Genetics | 2008

Polymorphisms of the HNF1A Gene Encoding Hepatocyte Nuclear Factor-1α are Associated with C-Reactive Protein

Alex P. Reiner; Mathew Barber; Yongtao Guan; Paul M. Ridker; Leslie A. Lange; Daniel I. Chasman; Jeremy D. Walston; Gregory M. Cooper; Nancy S. Jenny; Mark J. Rieder; J. Peter Durda; Joshua D. Smith; John Novembre; Russell P. Tracy; Jerome I. Rotter; Matthew Stephens; Deborah A. Nickerson; Ronald M. Krauss

Data from the Pharmacogenomics and Risk of Cardiovascular Disease (PARC) study and the Cardiovascular Health Study (CHS) provide independent and confirmatory evidence for association between common polymorphisms of the HNF1A gene encoding hepatocyte nuclear factor-1 alpha and plasma C-reactive protein (CRP) concentration. Analyses with the use of imputation-based methods to combine genotype data from both studies and to test untyped SNPs from the HapMap database identified several SNPs within a 5 kb region of HNF1A intron 1 with the strongest evidence of association with CRP phenotype.


PLOS Genetics | 2008

Practical Issues in Imputation-Based Association Mapping

Yongtao Guan; Matthew Stephens

Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption—specifically, that difficult-to-impute SNPs tend to have larger effects—and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate—their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.


PLOS ONE | 2011

Variation in Human Recombination Rates and Its Genetic Determinants

Adi Fledel-Alon; Ellen M. Leffler; Yongtao Guan; Matthew Stephens; Graham Coop; Molly Przeworski

Background Despite the fundamental role of crossing-over in the pairing and segregation of chromosomes during human meiosis, the rates and placements of events vary markedly among individuals. Characterizing this variation and identifying its determinants are essential steps in our understanding of the human recombination process and its evolution. Study Design/Results Using three large sets of European-American pedigrees, we examined variation in five recombination phenotypes that capture distinct aspects of crossing-over patterns. We found that the mean recombination rate in males and females and the historical hotspot usage are significantly heritable and are uncorrelated with one another. We then conducted a genome-wide association study in order to identify loci that influence them. We replicated associations of RNF212 with the mean rate in males and in females as well as the association of Inversion 17q21.31 with the female mean rate. We also replicated the association of PRDM9 with historical hotspot usage, finding that it explains most of the genetic variance in this phenotype. In addition, we identified a set of new candidate regions for further validation. Significance These findings suggest that variation at broad and fine scales is largely separable and that, beyond three known loci, there is no evidence for common variation with large effects on recombination phenotypes.


Genetics | 2014

Detecting Structure of Haplotypes and Local Ancestry

Yongtao Guan

We present a two-layer hidden Markov model to detect the structure of haplotypes for unrelated individuals. This allows us to model two scales of linkage disequilibrium (one within a group of haplotypes and one between groups), thereby taking advantage of rich haplotype information to infer local ancestry of admixed individuals. Our method outperforms competing state-of-the-art methods, particularly for regions of small ancestral track lengths. Applying our method to Mexican samples in HapMap3, we found two regions on chromosomes 6 and 8 that show significant departure of local ancestry from the genome-wide average. A software package implementing the methods described in this article is freely available at http://bcm.edu/cnrc/mcmcmc.


Annals of Applied Probability | 2007

Small-world MCMC and convergence to multi-modal distributions: From slow mixing to fast mixing

Yongtao Guan; Stephen M. Krone

We compare convergence rates of Metropolis‐Hastings chains to multi-modal target distributions when the proposal distributions can be of “local” and “small world” type. In particular, we show that by adding occasional long-range jumps to a given local proposal distribution, one can turn a chain that is “slowly mixing” (in the complexity of the problem) into a chain that is “rapidly mixing.” To do this, we obtain spectral gap estimates via a new state decomposition theorem and apply an isoperimetric inequality for log-concave probability measures. We discuss potential applicability of our result to Metropolis-coupled Markov chain Monte Carlo schemes.


PLOS Genetics | 2016

Strong Selection at MHC in Mexicans since Admixture

Quan Zhou; Liang Zhao; Yongtao Guan

Mexicans are a recent admixture of Amerindians, Europeans, and Africans. We performed local ancestry analysis of Mexican samples from two genome-wide association studies obtained from dbGaP, and discovered that at the MHC region Mexicans have excessive African ancestral alleles compared to the rest of the genome, which is the hallmark of recent selection for admixed samples. The estimated selection coefficients are 0.05 and 0.07 for two datasets, which put our finding among the strongest known selections observed in humans, namely, lactase selection in northern Europeans and sickle-cell trait in Africans. Using inaccurate Amerindian training samples was a major concern for the credibility of previously reported selection signals in Latinos. Taking advantage of the flexibility of our statistical model, we devised a model fitting technique that can learn Amerindian ancestral haplotype from the admixed samples, which allows us to infer local ancestries for Mexicans using only European and African training samples. The strong selection signal at the MHC remains without Amerindian training samples. Finally, we note that medical history studies suggest such a strong selection at MHC is plausible in Mexicans.


Genetics | 2014

Detecting Local Haplotype Sharing and Haplotype Association

Hanli Xu; Yongtao Guan

A novel haplotype association method is presented, and its power is demonstrated. Relying on a statistical model for linkage disequilibrium (LD), the method first infers ancestral haplotypes and their loadings at each marker for each individual. The loadings are then used to quantify local haplotype sharing between individuals at each marker. A statistical model was developed to link the local haplotype sharing and phenotypes to test for association. We devised a novel method to fit the LD model, reducing the complexity from putatively quadratic to linear (in the number of ancestral haplotypes). Therefore, the LD model can be fitted to all study samples simultaneously, and, consequently, our method is applicable to big data sets. Compared to existing haplotype association methods, our method integrated out phase uncertainty, avoided arbitrariness in specifying haplotypes, and had the same number of tests as the single-SNP analysis. We applied our method to data from the Wellcome Trust Case Control Consortium and discovered eight novel associations between seven gene regions and five disease phenotypes. Among these, GRIK4, which encodes a protein that belongs to the glutamate-gated ionic channel family, is strongly associated with both coronary artery disease and rheumatoid arthritis. A software package implementing methods described in this article is freely available at http://www.haplotype.org.


Journal of the American Statistical Association | 2018

On the Null Distribution of Bayes Factors in Linear Regression

Quan Zhou; Yongtao Guan

ABSTRACT We show that under the null, the is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and the normal prior. Our results have three immediate impacts. First, we can compute analytically a p-value associated with a Bayes factor without the need of permutation. We provide a software package that can evaluate the p-value associated with Bayes factor efficiently and accurately. Second, the null distribution is illuminating to some intrinsic properties of Bayes factor, namely, how Bayes factor quantitatively depends on prior and the genesis of Bartlett’s paradox. Third, enlightened by the null distribution of Bayes factor, we formulate a novel scaled Bayes factor that depends less on the prior and is immune to Bartlett’s paradox. When two tests have an identical p-value, the test with a larger power tends to have a larger scaled Bayes factor, a desirable property that is missing for the (unscaled) Bayes factor. Supplementary materials for this article are available online.


Genetics in Medicine | 2018

Informative priors on fetal fraction increase power of the noninvasive prenatal screen

Hanli Xu; Shaowei Wang; Lin-Lin Ma; Shuai Huang; Lin Liang; Qian Liu; Yang-Yang Liu; Ke-Di Liu; Ze-Min Tan; Hao Ban; Yongtao Guan; Zuhong Lu

PurposeNoninvasive prenatal screening (NIPS) sequences a mixture of the maternal and fetal cell-free DNA. Fetal trisomy can be detected by examining chromosomal dosages estimated from sequencing reads. The traditional method uses the Z-test, which compares a subject against a set of euploid controls, where the information of fetal fraction is not fully utilized. Here we present a Bayesian method that leverages informative priors on the fetal fraction.MethodOur Bayesian method combines the Z-test likelihood and informative priors of the fetal fraction, which are learned from the sex chromosomes, to compute Bayes factors. Bayesian framework can account for nongenetic risk factors through the prior odds, and our method can report individual positive/negative predictive values.ResultsOur Bayesian method has more power than the Z-test method. We analyzed 3,405 NIPS samples and spotted at least 9 (of 51) possible Z-test false positives.ConclusionBayesian NIPS is more powerful than the Z-test method, is able to account for nongenetic risk factors through prior odds, and can report individual positive/negative predictive values.


Bayesian Analysis | 2018

Fast Model-Fitting of Bayesian Variable Selection Regression Using the Iterative Complex Factorization Algorithm

Quan Zhou; Yongtao Guan

Bayesian variable selection regression (BVSR) is able to jointly analyze genome-wide genetic datasets, but the slow computation via Markov chain Monte Carlo (MCMC) hampered its wide-spread usage. Here we present a novel iterative method to solve a special class of linear systems, which can increase the speed of the BVSR model-fitting tenfold. The iterative method hinges on the complex factorization of the sum of two matrices and the solution path resides in the complex domain (instead of the real domain). Compared to the Gauss-Seidel method, the complex factorization converges almost instantaneously and its error is several magnitude smaller than that of the Gauss-Seidel method. More importantly, the error is always within the pre-specified precision while the Gauss-Seidel method is not. For large problems with thousands of covariates, the complex factorization is 10-100 times faster than either the Gauss-Seidel method or the direct method via the Cholesky decomposition. In BVSR, one needs to repetitively solve large penalized regression systems whose design matrices only change slightly between adjacent MCMC steps. This slight change in design matrix enables the adaptation of the iterative complex factorization method. The computational innovation will facilitate the wide-spread use of BVSR in reanalyzing genome-wide association datasets.

Collaboration


Dive into the Yongtao Guan's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Quan Zhou

Baylor College of Medicine

View shared research outputs
Top Co-Authors

Avatar

Hanli Xu

Southeast University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex P. Reiner

University of Washington

View shared research outputs
Top Co-Authors

Avatar

Daniel I. Chasman

Brigham and Women's Hospital

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Graham Coop

University of California

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge