Chuong B. Do
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Chuong B. Do.
Nature Genetics | 2014
Michael A. Nalls; Nathan Pankratz; Christina M. Lill; Chuong B. Do; Dena Hernandez; Mohamad Saad; Anita L. DeStefano; Eleanna Kara; Jose Bras; Manu Sharma; Claudia Schulte; Margaux F. Keller; Sampath Arepalli; Christopher Letson; Connor Edsall; Hreinn Stefansson; Xinmin Liu; Hannah Pliner; Joseph H. Lee; Rong Cheng; M. Arfan Ikram; John P. A. Ioannidis; Georgios M. Hadjigeorgiou; Joshua C. Bis; Maria Martinez; Joel S. Perlmutter; Alison Goate; Karen Marder; Brian K. Fiske; Margaret Sutherland
We conducted a meta-analysis of Parkinsons disease genome-wide association studies using a common set of 7,893,274 variants across 13,708 cases and 95,282 controls. Twenty-six loci were identified as having genome-wide significant association; these and 6 additional previously reported loci were then tested in an independent set of 5,353 cases and 5,551 controls. Of the 32 tested SNPs, 24 replicated, including 6 newly identified loci. Conditional analyses within loci showed that four loci, including GBA, GAK-DGKQ, SNCA and the HLA region, contain a secondary independent risk variant. In total, we identified and replicated 28 independent risk variants for Parkinsons disease across 24 loci. Although the effect of each individual locus was small, risk profile analysis showed substantial cumulative risk in a comparison of the highest and lowest quintiles of genetic risk (odds ratio (OR) = 3.31, 95% confidence interval (CI) = 2.55–4.30; P = 2 × 10−16). We also show six risk loci associated with proximal gene expression or DNA methylation.
PLOS Genetics | 2012
Christina M. Lill; Johannes T. Roehr; Matthew B. McQueen; Fotini K. Kavvoura; Sachin Bagade; Brit-Maren M. Schjeide; Leif Schjeide; Esther Meissner; Ute Zauft; Nicole C. Allen; Tian-Jing Liu; Marcel Schilling; Kari J. Anderson; Gary W. Beecham; Daniela Berg; Joanna M. Biernacka; Alexis Brice; Anita L. DeStefano; Chuong B. Do; Nicholas Eriksson; Stewart A. Factor; Matthew J. Farrer; Tatiana Foroud; Thomas Gasser; Taye H. Hamza; John Hardy; Peter Heutink; Erin M. Hill-Burns; Christine Klein; Jeanne C. Latourelle
More than 800 published genetic association studies have implicated dozens of potential risk loci in Parkinsons disease (PD). To facilitate the interpretation of these findings, we have created a dedicated online resource, PDGene, that comprehensively collects and meta-analyzes all published studies in the field. A systematic literature screen of ∼27,000 articles yielded 828 eligible articles from which relevant data were extracted. In addition, individual-level data from three publicly available genome-wide association studies (GWAS) were obtained and subjected to genotype imputation and analysis. Overall, we performed meta-analyses on more than seven million polymorphisms originating either from GWAS datasets and/or from smaller scale PD association studies. Meta-analyses on 147 SNPs were supplemented by unpublished GWAS data from up to 16,452 PD cases and 48,810 controls. Eleven loci showed genome-wide significant (P<5×10−8) association with disease risk: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, and SYT11/RAB25. In addition, we identified novel evidence for genome-wide significant association with a polymorphism in ITGA8 (rs7077361, OR 0.88, P = 1.3×10−8). All meta-analysis results are freely available on a dedicated online database (www.pdgene.org), which is cross-linked with a customized track on the UCSC Genome Browser. Our study provides an exhaustive and up-to-date summary of the status of PD genetics research that can be readily scaled to include the results of future large-scale genetics projects, including next-generation sequencing studies.
PLOS Genetics | 2011
Chuong B. Do; Joyce Y. Tung; Elizabeth Dorfman; Amy K. Kiefer; Emily M. Drabant; Uta Francke; Joanna L. Mountain; Samuel M. Goldman; Caroline M. Tanner; J. William Langston; Anne Wojcicki; Nicholas Eriksson
Although the causes of Parkinsons disease (PD) are thought to be primarily environmental, recent studies suggest that a number of genes influence susceptibility. Using targeted case recruitment and online survey instruments, we conducted the largest case-control genome-wide association study (GWAS) of PD based on a single collection of individuals to date (3,426 cases and 29,624 controls). We discovered two novel, genome-wide significant associations with PD–rs6812193 near SCARB2 (, ) and rs11868035 near SREBF1/RAI1 (, )—both replicated in an independent cohort. We also replicated 20 previously discovered genetic associations (including LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region), providing support for our novel study design. Relying on a recently proposed method based on genome-wide sharing estimates between distantly related individuals, we estimated the heritability of PD to be at least 0.27. Finally, using sparse regression techniques, we constructed predictive models that account for 6%–7% of the total variance in liability and that suggest the presence of true associations just beyond genome-wide significance, as confirmed through both internal and external cross-validation. These results indicate a substantial, but by no means total, contribution of genetics underlying susceptibility to both early-onset and late-onset PD, suggesting that, despite the novel associations discovered here and elsewhere, the majority of the genetic component for Parkinsons disease remains to be discovered.
intelligent systems in molecular biology | 2006
Chuong B. Do; Daniel A. Woods; Serafim Batzoglou
MOTIVATION For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.
Nature Biotechnology | 2008
Chuong B. Do; Serafim Batzoglou
The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?
Nature Genetics | 2013
David A. Hinds; George McMahon; Amy K. Kiefer; Chuong B. Do; Nicholas Eriksson; David Evans; Beate St Pourcain; Susan M. Ring; Joanna L. Mountain; Uta Francke; George Davey-Smith; Nicholas J. Timpson; Joyce Y. Tung
Allergic disease is very common and carries substantial public-health burdens. We conducted a meta-analysis of genome-wide associations with self-reported cat, dust-mite and pollen allergies in 53,862 individuals. We used generalized estimating equations to model shared and allergy-specific genetic effects. We identified 16 shared susceptibility loci with association P < 5 × 10−8, including 8 loci previously associated with asthma, as well as 4p14 near TLR1, TLR6 and TLR10 (rs2101521, P = 5.3 × 10−21); 6p21.33 near HLA-C and MICA (rs9266772, P = 3.2 × 10−12); 5p13.1 near PTGER4 (rs7720838, P = 8.2 × 10−11); 2q33.1 in PLCL1 (rs10497813, P = 6.1 × 10−10), 3q28 in LPP (rs9860547, P = 1.2 × 10−9); 20q13.2 in NFATC2 (rs6021270, P = 6.9 × 10−9), 4q27 in ADAD1 (rs17388568, P = 3.9 × 10−8); and 14q21.1 near FOXA1 and TTC6 (rs1998359, P = 4.8 × 10−8). We identified one locus with substantial evidence of differences in effects across allergies at 6p21.32 in the class II human leukocyte antigen (HLA) region (rs17533090, P = 1.7 × 10−12), which was strongly associated with cat allergy. Our study sheds new light on the shared etiology of immune and autoimmune disease.
PLOS Genetics | 2013
Amy K. Kiefer; Joyce Y. Tung; Chuong B. Do; David A. Hinds; Joanna L. Mountain; Uta Francke; Nicholas Eriksson
Myopia, or nearsightedness, is the most common eye disorder, resulting primarily from excess elongation of the eye. The etiology of myopia, although known to be complex, is poorly understood. Here we report the largest ever genome-wide association study (45,771 participants) on myopia in Europeans. We performed a survival analysis on age of myopia onset and identified 22 significant associations (), two of which are replications of earlier associations with refractive error. Ten of the 20 novel associations identified replicate in a separate cohort of 8,323 participants who reported if they had developed myopia before age 10. These 22 associations in total explain 2.9% of the variance in myopia age of onset and point toward a number of different mechanisms behind the development of myopia. One association is in the gene PRSS56, which has previously been linked to abnormally small eyes; one is in a gene that forms part of the extracellular matrix (LAMA2); two are in or near genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are near genes known to be involved in the growth and guidance of retinal ganglion cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal signaling or development. These novel findings point toward multiple genetic factors involved in the development of myopia and suggest that complex interactions between extracellular matrix remodeling, neuronal development, and visual signals from the retina may underlie the development of myopia in humans.
research in computational molecular biology | 2008
Jason Flannick; Antal F. Novak; Chuong B. Do; Balaji S. Srinivasan; Serafim Batzoglou
We developed Graemlin 2.0, a new multiple network aligner with (1) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (2) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (3) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Graemlin 2.0s accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database.We show that, on each of these datasets, Graemlin 2.0 has higher sensitivity and specificity than existing network aligners. Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu.
PLOS ONE | 2012
Nicholas Eriksson; Joyce Y. Tung; Amy K. Kiefer; David A. Hinds; Uta Francke; Joanna L. Mountain; Chuong B. Do
Hypothyroidism is the most common thyroid disorder, affecting about 5% of the general population. Here we present the current largest genome-wide association study of hypothyroidism, in 3,736 cases and 35,546 controls. Hypothyroidism was assessed via web-based questionnaires. We identify five genome-wide significant associations, three of which are well known to be involved in a large spectrum of autoimmune diseases: rs6679677 near PTPN22, rs3184504 in SH2B3, and rs2517532 in the HLA class I region (-values , , and , respectively). We also report associations with rs4915077 near VAV3 (-value ) and rs925489 near FOXE1 (-value ). VAV3 is involved in immune function, and FOXE1 and PTPN22 have previously been associated with hypothyroidism. Although the HLA class I region and SH2B3 have previously been linked with a number of autoimmune diseases, this is the first report of their association with thyroid disease. The VAV3 association is also novel. We also show suggestive evidence of association for hypothyroidism with a SNP in the HLA class II region (independent of the other HLA association) as well as SNPs in CAPZB, PDE8B, and CTLA4. CAPZB and PDE8B have been linked to TSH levels and CTLA4 to a variety of autoimmune diseases. These results suggest heterogeneity in the genetic etiology of hypothyroidism, implicating genes involved in both autoimmune disorders and thyroid function. Using a genetic risk profile score based on the top association from each of the five genome-wide significant regions in our study, the relative risk between the highest and lowest deciles of genetic risk is 2.0.
Genome Biology | 2007
Samuel S. Gross; Chuong B. Do; Marina Sirota; Serafim Batzoglou
We describe CONTRAST, a gene predictor which directly incorporates information from multiple alignments rather than employing phylogenetic models. This is accomplished through the use of discriminative machine learning techniques, including a novel training algorithm. We use a two-stage approach, in which a set of binary classifiers designed to recognize coding region boundaries is combined with a global model of gene structure. CONTRAST predicts exact coding region structures for 65% more human genes than the previous state-of-the-art method, misses 46% fewer exons and displays comparable gains in specificity.