Chuong B. Do | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chuong B. Do is active.

Explore More

Publication

Featured researches published by Chuong B. Do.

Nature Genetics | 2014

Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease

Michael A. Nalls; Nathan Pankratz; Christina M. Lill; Chuong B. Do; Dena Hernandez; Mohamad Saad; Anita L. DeStefano; Eleanna Kara; Jose Bras; Manu Sharma; Claudia Schulte; Margaux F. Keller; Sampath Arepalli; Christopher Letson; Connor Edsall; Hreinn Stefansson; Xinmin Liu; Hannah Pliner; Joseph H. Lee; Rong Cheng; M. Arfan Ikram; John P. A. Ioannidis; Georgios M. Hadjigeorgiou; Joshua C. Bis; Maria Martinez; Joel S. Perlmutter; Alison Goate; Karen Marder; Brian K. Fiske; Margaret Sutherland

We conducted a meta-analysis of Parkinsons disease genome-wide association studies using a common set of 7,893,274 variants across 13,708 cases and 95,282 controls. Twenty-six loci were identified as having genome-wide significant association; these and 6 additional previously reported loci were then tested in an independent set of 5,353 cases and 5,551 controls. Of the 32 tested SNPs, 24 replicated, including 6 newly identified loci. Conditional analyses within loci showed that four loci, including GBA, GAK-DGKQ, SNCA and the HLA region, contain a secondary independent risk variant. In total, we identified and replicated 28 independent risk variants for Parkinsons disease across 24 loci. Although the effect of each individual locus was small, risk profile analysis showed substantial cumulative risk in a comparison of the highest and lowest quintiles of genetic risk (odds ratio (OR) = 3.31, 95% confidence interval (CI) = 2.55–4.30; P = 2 × 10−16). We also show six risk loci associated with proximal gene expression or DNA methylation.

PLOS Genetics | 2012

Comprehensive research synopsis and systematic meta-analyses in Parkinson's disease genetics : The PDGene database

Christina M. Lill; Johannes T. Roehr; Matthew B. McQueen; Fotini K. Kavvoura; Sachin Bagade; Brit-Maren M. Schjeide; Leif Schjeide; Esther Meissner; Ute Zauft; Nicole C. Allen; Tian-Jing Liu; Marcel Schilling; Kari J. Anderson; Gary W. Beecham; Daniela Berg; Joanna M. Biernacka; Alexis Brice; Anita L. DeStefano; Chuong B. Do; Nicholas Eriksson; Stewart A. Factor; Matthew J. Farrer; Tatiana Foroud; Thomas Gasser; Taye H. Hamza; John Hardy; Peter Heutink; Erin M. Hill-Burns; Christine Klein; Jeanne C. Latourelle

More than 800 published genetic association studies have implicated dozens of potential risk loci in Parkinsons disease (PD). To facilitate the interpretation of these findings, we have created a dedicated online resource, PDGene, that comprehensively collects and meta-analyzes all published studies in the field. A systematic literature screen of ∼27,000 articles yielded 828 eligible articles from which relevant data were extracted. In addition, individual-level data from three publicly available genome-wide association studies (GWAS) were obtained and subjected to genotype imputation and analysis. Overall, we performed meta-analyses on more than seven million polymorphisms originating either from GWAS datasets and/or from smaller scale PD association studies. Meta-analyses on 147 SNPs were supplemented by unpublished GWAS data from up to 16,452 PD cases and 48,810 controls. Eleven loci showed genome-wide significant (P<5×10−8) association with disease risk: BST1, CCDC62/HIP1R, DGKQ/GAK, GBA, LRRK2, MAPT, MCCC1/LAMP3, PARK16, SNCA, STK39, and SYT11/RAB25. In addition, we identified novel evidence for genome-wide significant association with a polymorphism in ITGA8 (rs7077361, OR 0.88, P = 1.3×10−8). All meta-analysis results are freely available on a dedicated online database (www.pdgene.org), which is cross-linked with a customized track on the UCSC Genome Browser. Our study provides an exhaustive and up-to-date summary of the status of PD genetics research that can be readily scaled to include the results of future large-scale genetics projects, including next-generation sequencing studies.

PLOS Genetics | 2011

Web-based genome-wide association study identifies two novel loci and a substantial genetic component for Parkinson's disease.

Chuong B. Do; Joyce Y. Tung; Elizabeth Dorfman; Amy K. Kiefer; Emily M. Drabant; Uta Francke; Joanna L. Mountain; Samuel M. Goldman; Caroline M. Tanner; J. William Langston; Anne Wojcicki; Nicholas Eriksson

Although the causes of Parkinsons disease (PD) are thought to be primarily environmental, recent studies suggest that a number of genes influence susceptibility. Using targeted case recruitment and online survey instruments, we conducted the largest case-control genome-wide association study (GWAS) of PD based on a single collection of individuals to date (3,426 cases and 29,624 controls). We discovered two novel, genome-wide significant associations with PD–rs6812193 near SCARB2 (, ) and rs11868035 near SREBF1/RAI1 (, )—both replicated in an independent cohort. We also replicated 20 previously discovered genetic associations (including LRRK2, GBA, SNCA, MAPT, GAK, and the HLA region), providing support for our novel study design. Relying on a recently proposed method based on genome-wide sharing estimates between distantly related individuals, we estimated the heritability of PD to be at least 0.27. Finally, using sparse regression techniques, we constructed predictive models that account for 6%–7% of the total variance in liability and that suggest the presence of true associations just beyond genome-wide significance, as confirmed through both internal and external cross-validation. These results indicate a substantial, but by no means total, contribution of genetics underlying susceptibility to both early-onset and late-onset PD, suggesting that, despite the novel associations discovered here and elsewhere, the majority of the genetic component for Parkinsons disease remains to be discovered.

intelligent systems in molecular biology | 2006

CONTRAfold: RNA secondary structure prediction without physics-based models

Chuong B. Do; Daniel A. Woods; Serafim Batzoglou

MOTIVATION For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters. Despite this advantage, however, probabilistic methods have not replaced free energy minimization methods as the tool of choice for secondary structure prediction, as the accuracies of the best current SCFGs have yet to match those of the best physics-based models. RESULTS In this paper, we present CONTRAfold, a novel secondary structure prediction method based on conditional log-linear models (CLLMs), a flexible class of probabilistic models which generalize upon SCFGs by using discriminative training and feature-rich scoring. In a series of cross-validation experiments, we show that grammar-based secondary structure prediction methods formulated as CLLMs consistently outperform their SCFG analogs. Furthermore, CONTRAfold, a CLLM incorporating most of the features found in typical thermodynamic models, achieves the highest single sequence prediction accuracies to date, outperforming currently available probabilistic and physics-based techniques. Our result thus closes the gap between probabilistic and thermodynamic models, demonstrating that statistical learning procedures provide an effective alternative to empirical measurement of thermodynamic parameters for RNA secondary structure prediction. AVAILABILITY Source code for CONTRAfold is available at http://contra.stanford.edu/contrafold/.

Nature Biotechnology | 2008

What is the expectation maximization algorithm

Chuong B. Do; Serafim Batzoglou

The expectation maximization algorithm arises in many computational biology applications that involve probabilistic models. What is it good for, and how does it work?

Nature Genetics | 2013

A genome-wide association meta-analysis of self-reported allergy identifies shared and allergy-specific susceptibility loci

David A. Hinds; George McMahon; Amy K. Kiefer; Chuong B. Do; Nicholas Eriksson; David Evans; Beate St Pourcain; Susan M. Ring; Joanna L. Mountain; Uta Francke; George Davey-Smith; Nicholas J. Timpson; Joyce Y. Tung

Allergic disease is very common and carries substantial public-health burdens. We conducted a meta-analysis of genome-wide associations with self-reported cat, dust-mite and pollen allergies in 53,862 individuals. We used generalized estimating equations to model shared and allergy-specific genetic effects. We identified 16 shared susceptibility loci with association P < 5 × 10−8, including 8 loci previously associated with asthma, as well as 4p14 near TLR1, TLR6 and TLR10 (rs2101521, P = 5.3 × 10−21); 6p21.33 near HLA-C and MICA (rs9266772, P = 3.2 × 10−12); 5p13.1 near PTGER4 (rs7720838, P = 8.2 × 10−11); 2q33.1 in PLCL1 (rs10497813, P = 6.1 × 10−10), 3q28 in LPP (rs9860547, P = 1.2 × 10−9); 20q13.2 in NFATC2 (rs6021270, P = 6.9 × 10−9), 4q27 in ADAD1 (rs17388568, P = 3.9 × 10−8); and 14q21.1 near FOXA1 and TTC6 (rs1998359, P = 4.8 × 10−8). We identified one locus with substantial evidence of differences in effects across allergies at 6p21.32 in the class II human leukocyte antigen (HLA) region (rs17533090, P = 1.7 × 10−12), which was strongly associated with cat allergy. Our study sheds new light on the shared etiology of immune and autoimmune disease.

PLOS Genetics | 2013

Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia.

Amy K. Kiefer; Joyce Y. Tung; Chuong B. Do; David A. Hinds; Joanna L. Mountain; Uta Francke; Nicholas Eriksson

Myopia, or nearsightedness, is the most common eye disorder, resulting primarily from excess elongation of the eye. The etiology of myopia, although known to be complex, is poorly understood. Here we report the largest ever genome-wide association study (45,771 participants) on myopia in Europeans. We performed a survival analysis on age of myopia onset and identified 22 significant associations (), two of which are replications of earlier associations with refractive error. Ten of the 20 novel associations identified replicate in a separate cohort of 8,323 participants who reported if they had developed myopia before age 10. These 22 associations in total explain 2.9% of the variance in myopia age of onset and point toward a number of different mechanisms behind the development of myopia. One association is in the gene PRSS56, which has previously been linked to abnormally small eyes; one is in a gene that forms part of the extracellular matrix (LAMA2); two are in or near genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are near genes known to be involved in the growth and guidance of retinal ganglion cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal signaling or development. These novel findings point toward multiple genetic factors involved in the development of myopia and suggest that complex interactions between extracellular matrix remodeling, neuronal development, and visual signals from the retina may underlie the development of myopia in humans.

research in computational molecular biology | 2008

Automatic parameter learning for multiple network alignment

Jason Flannick; Antal F. Novak; Chuong B. Do; Balaji S. Srinivasan; Serafim Batzoglou

We developed Graemlin 2.0, a new multiple network aligner with (1) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (2) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (3) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Graemlin 2.0s accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database.We show that, on each of these datasets, Graemlin 2.0 has higher sensitivity and specificity than existing network aligners. Graemlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu.

PLOS ONE | 2012

Novel Associations for Hypothyroidism Include Known Autoimmune Risk Loci

Nicholas Eriksson; Joyce Y. Tung; Amy K. Kiefer; David A. Hinds; Uta Francke; Joanna L. Mountain; Chuong B. Do

Hypothyroidism is the most common thyroid disorder, affecting about 5% of the general population. Here we present the current largest genome-wide association study of hypothyroidism, in 3,736 cases and 35,546 controls. Hypothyroidism was assessed via web-based questionnaires. We identify five genome-wide significant associations, three of which are well known to be involved in a large spectrum of autoimmune diseases: rs6679677 near PTPN22, rs3184504 in SH2B3, and rs2517532 in the HLA class I region (-values , , and , respectively). We also report associations with rs4915077 near VAV3 (-value ) and rs925489 near FOXE1 (-value ). VAV3 is involved in immune function, and FOXE1 and PTPN22 have previously been associated with hypothyroidism. Although the HLA class I region and SH2B3 have previously been linked with a number of autoimmune diseases, this is the first report of their association with thyroid disease. The VAV3 association is also novel. We also show suggestive evidence of association for hypothyroidism with a SNP in the HLA class II region (independent of the other HLA association) as well as SNPs in CAPZB, PDE8B, and CTLA4. CAPZB and PDE8B have been linked to TSH levels and CTLA4 to a variety of autoimmune diseases. These results suggest heterogeneity in the genetic etiology of hypothyroidism, implicating genes involved in both autoimmune disorders and thyroid function. Using a genetic risk profile score based on the top association from each of the five genome-wide significant regions in our study, the relative risk between the highest and lowest deciles of genetic risk is 2.0.

Genome Biology | 2007

CONTRAST: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction

Samuel S. Gross; Chuong B. Do; Marina Sirota; Serafim Batzoglou

We describe CONTRAST, a gene predictor which directly incorporates information from multiple alignments rather than employing phylogenetic models. This is accomplished through the use of discriminative machine learning techniques, including a novel training algorithm. We use a two-stage approach, in which a set of binary classifiers designed to recognize coding region boundaries is combined with a global model of gene structure. CONTRAST predicts exact coding region structures for 65% more human genes than the previous state-of-the-art method, misses 46% fewer exons and displays comparable gains in specificity.

Explore More