Ben Langmead | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ben Langmead is active.

Explore More

Publication

Featured researches published by Ben Langmead.

Genome Biology | 2009

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome

Ben Langmead; Cole Trapnell; Mihai Pop

Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

Nature Methods | 2012

Fast gapped-read alignment with Bowtie 2

Ben Langmead

As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

Nature Methods | 2015

HISAT: a fast spliced aligner with low memory requirements

Daehwan Kim; Ben Langmead

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISATs hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

Genome Biology | 2009

Searching for SNPs with cloud computing.

Ben Langmead; Michael C. Schatz; Jimmy J. Lin; Mihai Pop

As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about

Current protocols in human genetics | 2010

Aligning Short Sequencing Reads with Bowtie

Ben Langmead

85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/.

Genome Biology | 2010

Cloud-scale RNA-sequencing differential expression analysis with Myrna

Ben Langmead; Kasper D. Hansen; Jeffrey T. Leek

This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second‐generation sequencing instruments. It also includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools. Curr. Protoc. Bioinform. 32:11.7.1‐11.7.14.

Nature Biotechnology | 2010

Cloud computing and the DNA data race

Michael C. Schatz; Ben Langmead

As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna.

Nature Neuroscience | 2012

Reversible switching between epigenetic states in honeybee behavioral subcastes

Brian Herb; Florian Wolschin; Kasper D. Hansen; Martin J. Aryee; Ben Langmead; Rafael A. Irizarry; Gro V. Amdam; Andrew P. Feinberg

Given the accumulation of DNA sequence data sets at ever-faster rates, what are the key factors you should consider when using distributed and multicore computing systems for analysis?

BMC Bioinformatics | 2011

ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets

Ben Langmead; Jeffrey T. Leek

In honeybee societies, distinct caste phenotypes are created from the same genotype, suggesting a role for epigenetics in deriving these behaviorally different phenotypes. We found no differences in DNA methylation between irreversible worker and queen castes, but substantial differences between nurses and forager subcastes. Reverting foragers back to nurses reestablished methylation levels for a majority of genes and provides, to the best of our knowledge, the first evidence in any organism of reversible epigenetic changes associated with behavior.

Genome Biology | 2014

Lighter: fast and memory-efficient sequencing error correction without counting

Li Song; Liliana Florea; Ben Langmead

Abstract1 BackgroundRNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets.2 DescriptionReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values.3 ConclusionsBy combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.

Explore More