Brent S. Pedersen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Brent S. Pedersen is active.

Explore More

Publication

Featured researches published by Brent S. Pedersen.

Bioinformatics | 2017

cyvcf2: fast, flexible variant analysis with Python

Brent S. Pedersen; Aaron R. Quinlan

Motivation: Variant call format (VCF) files document the genetic variation observed after DNA sequencing, alignment and variant calling of a sample cohort. Given the complexity of the VCF format as well as the diverse variant annotations and genotype metadata, there is a need for fast, flexible methods enabling intuitive analysis of the variant data within VCF and BCF files. Results: We introduce cyvcf2, a Python library and software package for fast parsing and querying of VCF and BCF files and illustrate its speed, simplicity and utility. Contact: [email protected] or [email protected] Availability and Implementation: cyvcf2 is available from https://github.com/brentp/cyvcf2 under the MIT license and from common python package managers. Detailed documentation is available at http://brentp.github.io/cyvcf2/

Bioinformatics | 2011

Pybedtools: a flexible Python library for manipulating genomic datasets and annotations

Ryan K. Dale; Brent S. Pedersen; Aaron R. Quinlan

Summary: pybedtools is a flexible Python software library for manipulating and exploring genomic datasets in many common formats. It provides an intuitive Python interface that extends upon the popular BEDTools genome arithmetic tools. The library is well documented and efficient, and allows researchers to quickly develop simple, yet powerful scripts that enable complex genomic analyses. Availability: pybedtools is maintained under the GPL license. Stable versions of pybedtools as well as documentation are available on the Python Package Index at http://pypi.python.org/pypi/pybedtools. Contact: [email protected]; [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online.

Nature Biotechnology | 2018

Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain; Sergey Koren; Karen H. Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R. Tyson; Andrew D. Beggs; Alexander Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E. Olsen; Brent S. Pedersen; Arang Rhie; Hollian Richardson; Aaron R. Quinlan; Terrance P. Snutch; Louise Tee; Benedict Paten; Adam M. Phillippy; Jared T. Simpson; Nicholas J. Loman; Matthew Loose

We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

Genome Biology | 2016

Vcfanno: fast, flexible annotation of genetic variants

Brent S. Pedersen; Ryan M. Layer; Aaron R. Quinlan

The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits. However, comprehensive variant annotation with diverse file formats is difficult with existing methods. Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file. By leveraging a parallel “chromosome sweeping” algorithm, we demonstrate substantial performance gains by annotating ~85,000 variants per second with 50 attributes from 17 commonly used genome annotation resources. Vcfanno is available at https://github.com/brentp/vcfanno under the MIT license.

American Journal of Human Genetics | 2017

Who’s Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy

Brent S. Pedersen; Aaron R. Quinlan

The potential for genetic discovery in human DNA sequencing studies is greatly diminished if DNA samples from a cohort are mislabeled, swapped, or contaminated or if they include unintended individuals. Unfortunately, the potential for such errors is significant since DNA samples are often manipulated by several protocols, labs, or scientists in the process of sequencing. We have developed a software package, peddy, to identify and facilitate the remediation of such errors via interactive visualizations and reports comparing the stated sex, relatedness, and ancestry to what is inferred from the individual genotypes derived from whole-genome (WGS) or whole-exome (WES) sequencing. Peddy predicts a sample’s ancestry using a machine learning model trained on individuals of diverse ancestries from the 1000 Genomes Project reference panel. Peddy facilitates both automated and interactive, visual detection of sample swaps, poor sequencing quality, and other indicators of sample problems that, if left undetected, would inhibit discovery.

Nature Communications | 2017

Combating subclonal evolution of resistant cancer phenotypes

Samuel W. Brady; Jasmine A. McQuerry; Yi Qiao; Stephen R. Piccolo; Gajendra Shrestha; David Jenkins; Ryan M. Layer; Brent S. Pedersen; Ryan H. Miller; Amanda Esch; Sara R. Selitsky; Joel S. Parker; Layla A. Anderson; Brian Dalley; Rachel E. Factor; Chakravarthy Reddy; Jonathan Boltax; Dean Y. Li; Philip J. Moos; Joe W. Gray; Laura M. Heiser; Saundra S. Buys; Adam L. Cohen; W. Evan Johnson; Aaron R. Quinlan; Gabor T. Marth; Theresa L. Werner; Andrea Bild

Metastatic breast cancer remains challenging to treat, and most patients ultimately progress on therapy. This acquired drug resistance is largely due to drug-refractory sub-populations (subclones) within heterogeneous tumors. Here, we track the genetic and phenotypic subclonal evolution of four breast cancers through years of treatment to better understand how breast cancers become drug-resistant. Recurrently appearing post-chemotherapy mutations are rare. However, bulk and single-cell RNA sequencing reveal acquisition of malignant phenotypes after treatment, including enhanced mesenchymal and growth factor signaling, which may promote drug resistance, and decreased antigen presentation and TNF-α signaling, which may enable immune system avoidance. Some of these phenotypes pre-exist in pre-treatment subclones that become dominant after chemotherapy, indicating selection for resistance phenotypes. Post-chemotherapy cancer cells are effectively treated with drugs targeting acquired phenotypes. These findings highlight cancer’s ability to evolve phenotypically and suggest a phenotype-targeted treatment strategy that adapts to cancer as it evolves.In metastatic breast cancer, subclonal evolution can drive drug resistance. Here, the authors genetically and transcriptionally follow the evolution of four breast cancers over time and treatment, and suggest a phenotype-targeted treatment strategy to adapt to cancer as it evolves.

PeerJ | 2015

Efficient "pythonic" access to FASTA files using pyfaidx

Matthew D. Shirley; Zhaorong Ma; Brent S. Pedersen; Sarah J. Wheelan

The pyfaidx Python module provides memory and time-efficient indexing, subsetting, and in-place modification of subsequences of FASTA files. pyfaidx provides Python classes that expose a dictionary interface where sequences from an indexed FASTA can be accessed by their header name and then sliced by position without reading the full file into memory. pyfaidx includes an extensive test suite to ensure correct and reproducible behavior. A command-line program (faidx) is also provided as an alternative interface, with significant enhancements to functionality, while maintaining full index file compatibility with samtools. The pyfaidx module is installable from PyPI (https://pypi.python. org/pypi/pyfaidx), and development versions can be found at Github (https://github.com/ mdshw5/pyfaidx).

Nature Methods | 2018

GIGGLE: a search engine for large-scale integrated genome analysis

Ryan M. Layer; Brent S. Pedersen; Tonya DiSera; Gabor T. Marth; Jason Gertz; Aaron R. Quinlan

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.

Bioinformatics | 2018

Mosdepth: Quick coverage calculation for genomes and exomes

Brent S. Pedersen; Aaron R. Quinlan

Summary Mosdepth is a new command‐line tool for rapidly calculating genome‐wide sequencing coverage. It measures depth from BAM or CRAM files at either each nucleotide position in a genome or for sets of genomic regions. Genomic regions may be specified as either a BED file to evaluate coverage across capture regions, or as a fixed‐size window as required for copy‐number calling. Mosdepth uses a simple algorithm that is computationally efficient and enables it to quickly produce coverage summaries. We demonstrate that mosdepth is faster than existing tools and provides flexibility in the types of coverage profiles produced. Availability and implementation mosdepth is available from https://github.com/brentp/mosdepth under the MIT license. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Scientific Reports | 2018

GOATOOLS: A Python library for Gene Ontology analyses

Dv Klopfenstein; Liangsheng Zhang; Brent S. Pedersen; Fidel Ramírez; Alex Warwick Vesztrocy; Aurélien Naldi; Christopher J. Mungall; Jeffrey M. Yunes; Olga Botvinnik; Mark Weigel; Will Dampier; Christophe Dessimoz; Patrick Flick; Haibao Tang

The biological interpretation of gene lists with interesting shared properties, such as up- or down-regulation in a particular experiment, is typically accomplished using gene ontology enrichment analysis tools. Given a list of genes, a gene ontology (GO) enrichment analysis may return hundreds of statistically significant GO results in a “flat” list, which can be challenging to summarize. It can also be difficult to keep pace with rapidly expanding biological knowledge, which often results in daily changes to any of the over 47,000 gene ontologies that describe biological knowledge. GOATOOLS, a Python-based library, makes it more efficient to stay current with the latest ontologies and annotations, perform gene ontology enrichment analyses to determine over- and under-represented terms, and organize results for greater clarity and easier interpretation using a novel GOATOOLS GO grouping method. We performed functional analyses on both stochastic simulation data and real data from a published RNA-seq study to compare the enrichment results from GOATOOLS to two other popular tools: DAVID and GOstats. GOATOOLS is freely available through GitHub: https://github.com/tanghaibao/goatools.

Explore More