Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jason O'Rawe is active.

Publication


Featured researches published by Jason O'Rawe.


Genome Medicine | 2013

Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing

Jason O'Rawe; Tao Jiang; Guangqing Sun; Yiyang Wu; Wei Min Wang; Jingchu Hu; Paul Bodily; Lifeng Tian; Hakon Hakonarson; W. Evan Johnson; Zhi Wei; Kai Wang; Gholson J. Lyon

BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.ResultsSNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.ConclusionsOur results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.


Nature Methods | 2014

accurate de novo and transmitted indel detection in exome-capture data using microassembly

Giuseppe Narzisi; Jason O'Rawe; Ivan Iossifov; Han Fang; Yoon-ha Lee; Zihua Wang; Yiyang Wu; Gholson J. Lyon; Michael Wigler; Michael C. Schatz

We present an open-source algorithm, Scalpel (http://scalpel.sourceforge.net/), which combines mapping and assembly for sensitive and specific discovery of insertions and deletions (indels) in exome-capture data. A detailed repeat analysis coupled with a self-tuning k-mer strategy allows Scalpel to outperform other state-of-the-art approaches for indel discovery, particularly in regions containing near-perfect repeats. We analyzed 593 families from the Simons Simplex Collection and demonstrated Scalpels power to detect long (≥30 bp) transmitted events and enrichment for de novo likely gene-disrupting indels in autistic children.


Genome Medicine | 2014

Reducing INDEL calling errors in whole genome and exome sequencing data

Han Fang; Yiyang Wu; Giuseppe Narzisi; Jason O'Rawe; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

BackgroundINDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts.MethodsWe characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low-quality INDELs (7% vs. 51%).ResultsSimulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (53%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (84% vs. 57%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data.ConclusionsOverall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (for example, capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


Nature Protocols | 2016

Indel variant analysis of short-read sequencing data with Scalpel

Han Fang; Ewa A. Bergmann; Kanika Arora; Vladimir Vacic; Michael C. Zody; Ivan Iossifov; Jason O'Rawe; Yiyang Wu; Laura Jimenez Barron; Julie Rosenbaum; Michael Ronemus; Yoon-ha Lee; Zihua Wang; Esra Dikoglu; Vaidehi Jobanputra; Gholson J. Lyon; Michael Wigler; Michael C. Schatz; Giuseppe Narzisi

As the second most common type of variation in the human genome, insertions and deletions (indels) have been linked to many diseases, but the discovery of indels of more than a few bases in size from short-read sequencing data remains challenging. Scalpel (http://scalpel.sourceforge.net) is an open-source software for reliable indel detection based on the microassembly technique. It has been successfully used to discover mutations in novel candidate genes for autism, and it is extensively used in other large-scale studies of human diseases. This protocol gives an overview of the algorithm and describes how to use Scalpel to perform highly accurate indel calling from whole-genome and whole-exome sequencing data. We provide detailed instructions for an exemplary family-based de novo study, but we also characterize the other two supported modes of operation: single-sample and somatic analysis. Indel normalization, visualization and annotation of the mutations are also illustrated. Using a standard server, indel discovery and characterization in the exonic regions of the example sequencing data can be completed in ∼5 h after read mapping.


Journal of Medical Genetics | 2015

SeqHBase: a big data toolset for family based sequencing data analysis

Min He; Thomas N. Person; Scott J. Hebbring; Ethan Heinzen; Zhan Ye; Steven J. Schrodi; Elizabeth McPherson; Simon M. Lin; Peggy L. Peissig; Murray H. Brilliant; Jason O'Rawe; Reid J. Robison; Gholson J. Lyon; Kai Wang

Background Whole-genome sequencing (WGS) and whole-exome sequencing (WES) technologies are increasingly used to identify disease-contributing mutations in human genomic studies. It can be a significant challenge to process such data, especially when a large family or cohort is sequenced. Our objective was to develop a big data toolset to efficiently manipulate genome-wide variants, functional annotations and coverage, together with conducting family based sequencing data analysis. Methods Hadoop is a framework for reliable, scalable, distributed processing of large data sets using MapReduce programming models. Based on Hadoop and HBase, we developed SeqHBase, a big data-based toolset for analysing family based sequencing data to detect de novo, inherited homozygous, or compound heterozygous mutations that may contribute to disease manifestations. SeqHBase takes as input BAM files (for coverage at every site), variant call format (VCF) files (for variant calls) and functional annotations (for variant prioritisation). Results We applied SeqHBase to a 5-member nuclear family and a 10-member 3-generation family with WGS data, as well as a 4-member nuclear family with WES data. Analysis times were almost linearly scalable with number of data nodes. With 20 data nodes, SeqHBase took about 5 secs to analyse WES familial data and approximately 1 min to analyse WGS familial data. Conclusions These results demonstrate SeqHBases high efficiency and scalability, which is necessary as WGS and WES are rapidly becoming standard methods to study the genetics of familial disorders.


PeerJ | 2013

Integrating precision medicine in the study and clinical treatment of a severely mentally ill person

Jason O'Rawe; Han Fang; Shawn Rynearson; Reid J. Robison; Edward S. Kiruluta; Gerald Higgins; Karen Eilbeck; Martin G. Reese; Gholson J. Lyon

Background. In recent years, there has been an explosion in the number of technical and medical diagnostic platforms being developed. This has greatly improved our ability to more accurately, and more comprehensively, explore and characterize human biological systems on the individual level. Large quantities of biomedical data are now being generated and archived in many separate research and clinical activities, but there exists a paucity of studies that integrate the areas of clinical neuropsychiatry, personal genomics and brain-machine interfaces. Methods. A single person with severe mental illness was implanted with the Medtronic Reclaim® Deep Brain Stimulation (DBS) Therapy device for Obsessive Compulsive Disorder (OCD), targeting his nucleus accumbens/anterior limb of the internal capsule. Programming of the device and psychiatric assessments occurred in an outpatient setting for over two years. His genome was sequenced and variants were detected in the Illumina Whole Genome Sequencing Clinical Laboratory Improvement Amendments (CLIA)-certified laboratory. Results. We report here the detailed phenotypic characterization, clinical-grade whole genome sequencing (WGS), and two-year outcome of a man with severe OCD treated with DBS. Since implantation, this man has reported steady improvement, highlighted by a steady decline in his Yale-Brown Obsessive Compulsive Scale (YBOCS) score from ∼38 to a score of ∼25. A rechargeable Activa RC neurostimulator battery has been of major benefit in terms of facilitating a degree of stability and control over the stimulation. His psychiatric symptoms reliably worsen within hours of the battery becoming depleted, thus providing confirmatory evidence for the efficacy of DBS for OCD in this person. WGS revealed that he is a heterozygote for the p.Val66Met variant in BDNF, encoding a member of the nerve growth factor family, and which has been found to predispose carriers to various psychiatric illnesses. He carries the p.Glu429Ala allele in methylenetetrahydrofolate reductase (MTHFR) and the p.Asp7Asn allele in ChAT, encoding choline O-acetyltransferase, with both alleles having been shown to confer an elevated susceptibility to psychoses. We have found thousands of other variants in his genome, including pharmacogenetic and copy number variants. This information has been archived and offered to this person alongside the clinical sequencing data, so that he and others can re-analyze his genome for years to come. Conclusions. To our knowledge, this is the first study in the clinical neurosciences that integrates detailed neuropsychiatric phenotyping, deep brain stimulation for OCD and clinical-grade WGS with management of genetic results in the medical treatment of one person with severe mental illness. We offer this as an example of precision medicine in neuropsychiatry including brain-implantable devices and genomics-guided preventive health care.


bioRxiv | 2014

Reducing INDEL errors in whole-genome and exome sequencing

Han Fang; Giuseppe Narzisi; Jason O'Rawe; Yiyang Wu; Julie Rosenbaum; Michael Ronemus; Ivan Iossifov; Michael C. Schatz; Gholson J. Lyon

Background INDELs, especially those disrupting protein-coding regions of the genome, have been strongly associated with human diseases. However, there are still many errors with INDEL variant calling, driven by library preparation, sequencing biases, and algorithm artifacts. Methods We characterized whole genome sequencing (WGS), whole exome sequencing (WES), and PCR-free sequencing data from the same samples to investigate the sources of INDEL errors. We also developed a classification scheme based on the coverage and composition to rank high and low quality INDEL calls. We performed a large-scale validation experiment on 600 loci, and find high-quality INDELs to have a substantially lower error rate than low quality INDELs (7% vs. 51%). Results Simulation and experimental data show that assembly based callers are significantly more sensitive and robust for detecting large INDELs (>5 bp) than alignment based callers, consistent with published data. The concordance of INDEL detection between WGS and WES is low (52%), and WGS data uniquely identifies 10.8-fold more high-quality INDELs. The validation rate for WGS-specific INDELs is also much higher than that for WES-specific INDELs (85% vs. 54%), and WES misses many large INDELs. In addition, the concordance for INDEL detection between standard WGS and PCR-free sequencing is 71%, and standard WGS data uniquely identifies 6.3-fold more low-quality INDELs. Furthermore, accurate detection with Scalpel of heterozygous INDELs requires 1.2-fold higher coverage than that for homozygous INDELs. Lastly, homopolymer A/T INDELs are a major source of low-quality INDEL calls, and they are highly enriched in the WES data. Conclusions Overall, we show that accuracy of INDEL detection with WGS is much greater than WES even in the targeted region. We calculated that 60X WGS depth of coverage from the HiSeq platform is needed to recover 95% of INDELs detected by Scalpel. While this is higher than current sequencing practice, the deeper coverage may save total project costs because of the greater accuracy and sensitivity. Finally, we investigate sources of INDEL errors (e.g. capture deficiency, PCR amplification, homopolymers) with various data that will serve as a guideline to effectively reduce INDEL errors in genome sequencing.


bioRxiv | 2015

Genome-wide variant analysis of simplex autism families with an integrative clinical-bioinformatics pipeline

Laura T. Jiménez-Barrón; Jason O'Rawe; Yiyang Wu; Margaret Yoon; Han Fang; Ivan Iossifov; Gholson J. Lyon

Autism spectrum disorders (ASDs) are a group of developmental disabilities that affect social interaction and communication and are characterized by repetitive behaviors. There is now a large body of evidence that suggests a complex role of genetics in ASDs, in which many different loci are involved. Although many current population-scale genomic studies have been demonstrably fruitful, these studies generally focus on analyzing a limited part of the genome or use a limited set of bioinformatics tools. These limitations preclude the analysis of genome-wide perturbations that may contribute to the development and severity of ASD-related phenotypes. To overcome these limitations, we have developed and utilized an integrative clinical and bioinformatics pipeline for generating a more complete and reliable set of genomic variants for downstream analyses. Our study focuses on the analysis of three simplex autism families consisting of one affected child, unaffected parents, and one unaffected sibling. All members were clinically evaluated and widely phenotyped. Genotyping arrays and whole-genome sequencing were performed on each member, and the resulting sequencing data were analyzed using a variety of available bioinformatics tools. We searched for rare variants of putative functional impact that were found to be segregating according to de novo, autosomal recessive, X-linked, mitochondrial, and compound heterozygote transmission models. The resulting candidate variants included three small heterozygous copy-number variations (CNVs), a rare heterozygous de novo nonsense mutation in MYBBP1A located within exon 1, and a novel de novo missense variant in LAMB3. Our work demonstrates how more comprehensive analyses that include rich clinical data and whole-genome sequencing data can generate reliable results for use in downstream investigations.


bioRxiv | 2015

Whole genome analysis of an extended pedigree with Prader–Willi Syndrome, hereditary hemochromatosis, and dysautonomia-like symptoms

Han Fang; Yiyang Wu; Margaret Yoon; Laura Jimenez-Barron; Jason O'Rawe; Gareth Highnam; David Mittelman; Gholson J. Lyon

This report includes the discovery and analysis of a pedigree with Prader–Willi Syndrome (PWS), hereditary hemochromatosis (HH), and dysautonomia-like symptoms. Nine members of the family participated in whole genome sequencing (WGS), which enabled a wide scope of variant calling from single-nucleotide polymorphisms to copy number variations. First, a 5.5 Mb de novo deletion is identified in the chromosome region 15q11.2 to 15q13.1 in the boy with PWS. Second, a female invididual with HH is homozygous for the p.C282Y variant in HFE, a mutation known to be associated with HH. Her brother is homozygous for the same variant, although he has yet to be clinically diagnosed with HH. Third, none of the people with dysautonomia-like symptoms carry any reported or novel rare variants in IKBKAP that are implicated in familial dysautonomia (FD - HSAN III). Although two people with dysautonomia-like symptoms carry two heterozygous variants in NTRK1, a gene that has been shown to contribute to HSAN IV (congenital insensitivity to pain with anhidrosis, a disease that closely resembles FD), this variant is not present in the third proband. Fourth, WGS revealed pharmacogenetic variants influencing the metabolism of warfarin and simvastatin, which are being routinely prescribed to the proband. Finally, reports of the phenotypes were standardized with the Human Phenotype Ontology annotation, which may facilitate the search for other families with similar phenotypes. Due to the extreme heterogeneity and insufficient knowledge of human diseases, it is of crucial importance that both phenotypic data and genomic data are standardized and shared.


bioRxiv | 2015

Human genetics and clinical aspects of neurodevelopmental disorders

Gholson J. Lyon; Jason O'Rawe

Collaboration


Dive into the Jason O'Rawe's collaboration.

Top Co-Authors

Avatar

Gholson J. Lyon

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Han Fang

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Martin G. Reese

Lawrence Berkeley National Laboratory

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yiyang Wu

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Ivan Iossifov

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar

Giuseppe Narzisi

Cold Spring Harbor Laboratory

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge