Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Yiyang Wu is active.

Publication


Featured researches published by Yiyang Wu.


Genome Medicine | 2013

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Jason O'Rawe; Tao Jiang; Guangqing Sun; Yiyang Wu; Wei Min Wang; Jingchu Hu; Paul Bodily; Lifeng Tian; Hakon Hakonarson; W. Evan Johnson; Zhi Wei; Kai Wang; Gholson J. Lyon

BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.ResultsSNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.ConclusionsOur results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.


Nature Methods | 2014

accurate de novo and transmitted indel detection in exome-capture data using microassembly

Giuseppe Narzisi; Jason O'Rawe; Ivan Iossifov; Han Fang; Yoon-ha Lee; Zihua Wang; Yiyang Wu; Gholson J. Lyon; Michael Wigler; Michael C. Schatz

We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpels power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.


Genome Medicine | 2014

Reducing INDEL calling errors in whole genome and exome sequencing data

Han Fang; Yiyang Wu; Giuseppe Narzisi; Jason O'Rawe; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

BackgroundINDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.MethodsWe characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%).ResultsSimulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data.ConclusionsOverall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


Human Molecular Genetics | 2015

Biochemical and cellular analysis of Ogden syndrome reveals downstream Nt-acetylation defects

Line M. Myklebust; Petra Van Damme; Svein Isungset Støve; Max J. Dörfel; Angèle Abboud; Thomas Vikestad Kalvik; Cédric Grauffel; Veronique Jonckheere; Yiyang Wu; Jeffrey Swensen; Hanna Kaasa; Glen Liszczak; Ronen Marmorstein; Nathalie Reuter; Gholson J. Lyon; Kris Gevaert; Thomas Arnesen

The X-linked lethal Ogden syndrome was the first reported human genetic disorder associated with a mutation in an N-terminal acetyltransferase (NAT) gene. The affected males harbor an Ser37Pro (S37P) mutation in the gene encoding Naa10, the catalytic subunit of NatA, the major human NAT involved in the co-translational acetylation of proteins. Structural models and molecular dynamics simulations of the human NatA and its S37P mutant highlight differences in regions involved in catalysis and at the interface between Naa10 and the auxiliary subunit hNaa15. Biochemical data further demonstrate a reduced catalytic capacity and an impaired interaction between hNaa10 S37P and Naa15 as well as Naa50 (NatE), another interactor of the NatA complex. N-Terminal acetylome analyses revealed a decreased acetylation of a subset of NatA and NatE substrates in Ogden syndrome cells, supporting the genetic findings and our hypothesis regarding reduced Nt-acetylation of a subset of NatA/NatE-type substrates as one etiology for Ogden syndrome. Furthermore, Ogden syndrome fibroblasts display abnormal cell migration and proliferation capacity, possibly linked to a perturbed retinoblastoma pathway. N-Terminal acetylation clearly plays a role in Ogden syndrome, thus revealing the in vivo importance of N-terminal acetylation in human physiology and disease.


American Journal of Human Genetics | 2015

TAF1 Variants Are Associated with Dysmorphic Features, Intellectual Disability, and Neurological Manifestations.

Jason A. O’Rawe; Yiyang Wu; Max J. Dörfel; Alan F. Rope; P.Y. Billie Au; Jillian S. Parboosingh; Sungjin Moon; Maria Kousi; Konstantina Kosma; Christopher Smith; Maria Tzetis; Jane L. Schuette; Robert B. Hufnagel; Carlos E. Prada; Francisco Venegas Martínez; Carmen Orellana; Jonathan Crain; Alfonso Caro-Llopis; Silvestre Oltra; Sandra Monfort; Laura T. Jiménez-Barrón; Jeffrey Swensen; Sara Ellingwood; Rosemarie Smith; Han Fang; Sandra Ospina; Sander Stegmann; Nicolette S. den Hollander; David Mittelman; Gareth Highnam

We describe an X-linked genetic syndrome associated with mutations in TAF1 and manifesting with global developmental delay, intellectual disability (ID), characteristic facial dysmorphology, generalized hypotonia, and variable neurologic features, all in male individuals. Simultaneous studies using diverse strategies led to the identification of nine families with overlapping clinical presentations and affected by de novo or maternally inherited single-nucleotide changes. Two additional families harboring large duplications involving TAF1 were also found to share phenotypic overlap with the probands harboring single-nucleotide changes, but they also demonstrated a severe neurodegeneration phenotype. Functional analysis with RNA-seq for one of the families suggested that the phenotype is associated with downregulation of a set of genes notably enriched with genes regulated by E-box proteins. In addition, knockdown and mutant studies of this gene in zebrafish have shown a quantifiable, albeit small, effect on a neuronal phenotype. Our results suggest that mutations in TAF1 play a critical role in the development of this X-linked ID syndrome.


Nature Protocols | 2016

Indel variant analysis of short-read sequencing data with Scalpel

Han Fang; Ewa A. Bergmann; Kanika Arora; Vladimir Vacic; Michael C. Zody; Ivan Iossifov; Jason O'Rawe; Yiyang Wu; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Yoon-ha Lee; Zihua Wang; Esra Dikoglu; Vaidehi Jobanputra; Gholson J. Lyon; Michael Wigler; Michael C. Schatz; Giuseppe Narzisi

As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ∼5 h after read mapping.


bioRxiv | 2014

Reducing INDEL errors in whole-genome and exome sequencing

Han Fang; Giuseppe Narzisi; Jason O'Rawe; Yiyang Wu; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

Background INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. Methods We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low quality INDELs (7% vs. 51%). Results Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (52%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (85% vs. 54%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. Conclusions Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (e.g. capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


bioRxiv | 2015

Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline

Laura T. Jiménez-Barrón; Jason O'Rawe; Yiyang Wu; Margaret Yoon; Han Fang; Ivan Iossifov; Gholson J. Lyon

Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations.


Experimental and Molecular Medicine | 2018

NAA10 -related syndrome

Yiyang Wu; Gholson J. Lyon

NAA10-related syndrome is an X-linked condition with a broad spectrum of findings ranging from a severe phenotype in males with p.Ser37Pro in NAA10, originally described as Ogden syndrome, to the milder NAA10-related intellectual disability found with different variants in both males and females. Although developmental impairments/intellectual disability may be the presenting feature (and in some cases the only finding), many individuals have additional cardiovascular, growth, and dysmorphic findings that vary in type and severity. Therefore, this set of disorders has substantial phenotypic variability and, as such, should be referred to more broadly as NAA10-related syndrome. NAA10 encodes an enzyme NAA10 that is certainly involved in the amino-terminal acetylation of proteins, alongside other proposed functions for this same protein. The mechanistic basis for how variants in NAA10 lead to the various phenotypes in humans is an active area of investigation, some of which will be reviewed herein.Developmental disorders: Finding mutations associated with a rare syndromeA detailed overview of a rare X-linked hereditary disorder gives clinicians a resource for making an informed diagnosis based on genetic data and developmental abnormalities. Around 80% of all human proteins are modified on their amino terminus via tagging with an acetyl group, and the NAA10 enzyme plays a major role in this process. Mutations in the gene encoding NAA10 produce severe neurological and cardiovascular effects. Yiyang Wu and Gholson Lyon at the Cold Spring Harbor Laboratory, Woodbury, USA, have reviewed current research to facilitate accurate identification of ‘NAA10-related syndrome’. Since this gene resides on the X chromosome, mutations strongly affect males, although some female carriers also show symptoms. NAA10-related syndrome is exceedingly rare, with only 26 cases reported to date, and the researchers describe both known causative mutations and unrelated disorders that produce similar developmental defects.


Frontiers in Optics | 2016

Disease Modeling in Human Induced Pluripotent Stem Cell Derived Cardiomyocytes Using High-Throughput All-Optical Dynamic Cardiac Electrophysiology

Aleksandra Klimas; Yiyang Wu; Christina M. Ambrosi; Jinzhu Yu; John C. Williams; Harold Bien; Gholson J. Lyon; Emilia Entcheva

We present an all-optical high-throughput system for phenotyping and monitoring iPSC-CMs, with capabilities for performing personalized cardiotoxicity screening. We demonstrate the system’s utility for characterizing a new disease model in iPSC-CMs.

Collaboration


Dive into the Yiyang Wu's collaboration.

Top Co-Authors

Avatar

Gholson J. Lyon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Jason O'Rawe

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Han Fang

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

David Mittelman

Virginia Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Ivan Iossifov

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Giuseppe Narzisi

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Julie Rosenbaum

Cold Spring Harbor Laboratory

View shared research outputs
Researchain Logo
Decentralizing Knowledge