Ben Langmead
Johns Hopkins University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ben Langmead.
Genome Biology | 2009
Ben Langmead; Cole Trapnell; Mihai Pop
Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.
Nature Methods | 2012
Ben Langmead
As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Nature Methods | 2015
Daehwan Kim; Ben Langmead
HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISATs hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of ∼64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.
Genome Biology | 2009
Ben Langmead; Michael C. Schatz; Jimmy J. Lin; Mihai Pop
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about
Current protocols in human genetics | 2010
Ben Langmead
85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/.
Genome Biology | 2010
Ben Langmead; Kasper D. Hansen; Jeffrey T. Leek
This unit shows how to use the Bowtie package to align short sequencing reads, such as those output by second‐generation sequencing instruments. It also includes protocols for building a genome index and calling consensus sequences from Bowtie alignments using SAMtools. Curr. Protoc. Bioinform. 32:11.7.1‐11.7.14.
Nature Biotechnology | 2010
Michael C. Schatz; Ben Langmead
As sequencing throughput approaches dozens of gigabases per day, there is a growing need for efficient software for analysis of transcriptome sequencing (RNA-Seq) data. Myrna is a cloud-computing pipeline for calculating differential gene expression in large RNA-Seq datasets. We apply Myrna to the analysis of publicly available data sets and assess the goodness of fit of standard statistical models. Myrna is available from http://bowtie-bio.sf.net/myrna.
Nature Neuroscience | 2012
Brian Herb; Florian Wolschin; Kasper D. Hansen; Martin J. Aryee; Ben Langmead; Rafael A. Irizarry; Gro V. Amdam; Andrew P. Feinberg
Given the accumulation of DNA sequence data sets at ever-faster rates, what are the key factors you should consider when using distributed and multicore computing systems for analysis?
BMC Bioinformatics | 2011
Ben Langmead; Jeffrey T. Leek
In honeybee societies, distinct caste phenotypes are created from the same genotype, suggesting a role for epigenetics in deriving these behaviorally different phenotypes. We found no differences in DNA methylation between irreversible worker and queen castes, but substantial differences between nurses and forager subcastes. Reverting foragers back to nurses reestablished methylation levels for a majority of genes and provides, to the best of our knowledge, the first evidence in any organism of reversible epigenetic changes associated with behavior.
Genome Biology | 2014
Li Song; Liliana Florea; Ben Langmead
Abstract1 BackgroundRNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets.2 DescriptionReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values.3 ConclusionsBy combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.