Aaron R. Quinlan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Aaron R. Quinlan is active.

Explore More

Publication

Featured researches published by Aaron R. Quinlan.

Bioinformatics | 2010

BEDTools: a flexible suite of utilities for comparing genomic features

Aaron R. Quinlan; Ira M. Hall

Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner. Results: This article introduces a new software suite for the comparison, manipulation and annotation of genomic features in Browser Extensible Data (BED) and General Feature Format (GFF) format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g. next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets. Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools Contact: [email protected]; [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Nature Methods | 2008

Whole-genome sequencing and variant discovery in C. elegans

LaDeana W. Hillier; Gabor T. Marth; Aaron R. Quinlan; David J. Dooling; Ginger Fewell; Derek Barnett; Paul Fox; Jarret Glasscock; Matthew Hickenbotham; Weichun Huang; Vincent Magrini; Ryan Richt; Sacha Sander; Donald A Stewart; Michael Stromberg; Eric F. Tsung; Todd Wylie; Tim Schedl; Richard Wilson; Elaine R. Mardis

Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.

Genome Research | 2012

Copy number variation detection and genotyping from exome sequence data

Niklas Krumm; Peter H. Sudmant; Arthur Ko; Brian J. O'Roak; Maika Malig; Bradley P. Coe; Aaron R. Quinlan; Deborah A. Nickerson; Evan E. Eichler

While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r(2) = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.

Bioinformatics | 2017

cyvcf2: fast, flexible variant analysis with Python

Brent S. Pedersen; Aaron R. Quinlan

Motivation: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. Results: We introduce cyvcf2, a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. Contact: [email protected] or [email protected] Availability and Implementation: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/

Genome Biology | 2014

LUMPY: a probabilistic framework for structural variant discovery

Ryan M. Layer; Colby Chiang; Aaron R. Quinlan; Ira M. Hall

Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.

Genome Research | 2008

Rapid whole-genome mutational profiling using next-generation sequencing technologies.

Douglas R. Smith; Aaron R. Quinlan; Heather E. Peckham; Kathryn Makowsky; Wei Tao; Betty Woolf; Lei Shen; William F. Donahue; Nadeem Tusneem; Michael Stromberg; Donald A Stewart; Lu Zhang; Swati Ranade; Jason Warner; Clarence Lee; Brittney E. Coleman; Zheng Zhang; Stephen F. McLaughlin; Joel A. Malek; Jon M. Sorenson; Alan Blanchard; Jarrod Chapman; David Hillman; Feng Chen; Daniel S. Rokhsar; Kevin McKernan; Thomas W. Jeffries; Gabor T. Marth; Paul M. Richardson

Forward genetic mutational studies, adaptive evolution, and phenotypic screening are powerful tools for creating new variant organisms with desirable traits. However, mutations generated in the process cannot be easily identified with traditional genetic tools. We show that new high-throughput, massively parallel sequencing technologies can completely and accurately characterize a mutant genome relative to a previously sequenced parental (reference) strain. We studied a mutant strain of Pichia stipitis, a yeast capable of converting xylose to ethanol. This unusually efficient mutant strain was developed through repeated rounds of chemical mutagenesis, strain selection, transformation, and genetic manipulation over a period of seven years. We resequenced this strain on three different sequencing platforms. Surprisingly, we found fewer than a dozen mutations in open reading frames. All three sequencing technologies were able to identify each single nucleotide mutation given at least 10-15-fold nominal sequence coverage. Our results show that detecting mutations in evolved and engineered organisms is rapid and cost-effective at the whole-genome level using new sequencing technologies. Identification of specific mutations in strains with altered phenotypes will add insight into specific gene functions and guide further metabolic engineering efforts.

Current protocols in human genetics | 2014

BEDTools: the Swiss-army tool for genome feature analysis

Aaron R. Quinlan

Technological advances have enabled the use of DNA sequencing as a flexible tool to characterize genetic variation and to measure the activity of diverse cellular phenomena such as gene isoform expression and transcription factor binding. Extracting biological insight from the experiments enabled by these advances demands the analysis of large, multi‐dimensional datasets. This unit describes the use of the BEDTools toolkit for the exploration of high‐throughput genomics datasets. Several protocols are presented for common genomic analyses, demonstrating how simple BEDTools operations may be combined to create bespoke pipelines addressing complex questions. Curr. Protoc. Bioinform. 47:11.12.1‐11.12.34.

Bioinformatics | 2011

BamTools: a C++ API and toolkit for analyzing and managing BAM files

Derek Barnett; Erik K. Garrison; Aaron R. Quinlan; Michael Stromberg; Gabor T. Marth

MOTIVATION Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. RESULTS We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. AVAILABILITY BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.

Nature Methods | 2008

Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Aaron R. Quinlan; Donald A Stewart; Michael Stromberg; Gabor T. Marth

Previously reported applications of the 454 Life Sciences pyrosequencing technology have relied on deep sequence coverage for accurate polymorphism discovery because of frequent insertion and deletion sequence errors. Here we report a new base calling program, Pyrobayes, for pyrosequencing reads. Pyrobayes permits accurate single-nucleotide polymorphism (SNP) calling in resequencing applications, even in shallow read coverage, primarily because it produces more confident base calls than the native base calling program.

Genome Research | 2010

Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome

Aaron R. Quinlan; Royden A. Clark; Svetlana Sokolova; Mitchell L. Leibowitz; Yujun Zhang; Joshua Chang Mell; Ira M. Hall

Structural variation (SV) is a rich source of genetic diversity in mammals, but due to the challenges associated with mapping SV in complex genomes, basic questions regarding their genomic distribution and mechanistic origins remain unanswered. We have developed an algorithm (HYDRA) to localize SV breakpoints by paired-end mapping, and a general approach for the genome-wide assembly and interpretation of breakpoint sequences. We applied these methods to two inbred mouse strains: C57BL/6J and DBA/2J. We demonstrate that HYDRA accurately maps diverse classes of SV, including those involving repetitive elements such as transposons and segmental duplications; however, our analysis of the C57BL/6J reference strain shows that incomplete reference genome assemblies are a major source of noise. We report 7196 SVs between the two strains, more than two-thirds of which are due to transposon insertions. Of the remainder, 59% are deletions (relative to the reference), 26% are insertions of unlinked DNA, 9% are tandem duplications, and 6% are inversions. To investigate the origins of SV, we characterized 3316 breakpoint sequences at single-nucleotide resolution. We find that approximately 16% of non-transposon SVs have complex breakpoint patterns consistent with template switching during DNA replication or repair, and that this process appears to preferentially generate certain classes of complex variants. Moreover, we find that SVs are significantly enriched in regions of segmental duplication, but that this effect is largely independent of DNA sequence homology and thus cannot be explained by non-allelic homologous recombination (NAHR) alone. This result suggests that the genetic instability of such regions is often the cause rather than the consequence of duplicated genomic architecture.

Explore More