Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jared T. Simpson is active.

Publication


Featured researches published by Jared T. Simpson.


Genome Research | 2009

ABySS: A parallel assembler for short read sequence data

Jared T. Simpson; Kim Wong; Shaun D. Jackman; Jacqueline E. Schein; Steven J.M. Jones; Inanc Birol

Widespread adoption of massively parallel deoxyribonucleic acid (DNA) sequencing instruments has prompted the recent development of de novo short read assembly algorithms. A common shortcoming of the available tools is their inability to efficiently assemble vast amounts of data generated from large-scale sequencing projects, such as the sequencing of individual human genomes to catalog natural genetic variation. To address this limitation, we developed ABySS (Assembly By Short Sequences), a parallelized sequence assembler. As a demonstration of the capability of our software, we assembled 3.5 billion paired-end reads from the genome of an African male publicly released by Illumina, Inc. Approximately 2.76 million contigs > or =100 base pairs (bp) in length were created with an N50 size of 1499 bp, representing 68% of the reference human genome. Analysis of these contigs identified polymorphic and novel sequences not present in the human reference assembly, which were validated by alignment to alternate human assemblies and to other primate genomes.


Nature | 2009

Complex landscapes of somatic rearrangement in human breast cancer genomes.

Philip Stephens; David J. McBride; Meng-Lay Lin; Ignacio Varela; Erin Pleasance; Jared T. Simpson; Lucy Stebbings; Catherine Leroy; Sarah Edkins; Laura Mudie; Christopher Greenman; Mingming Jia; Calli Latimer; Jon Teague; King Wai Lau; John Burton; Michael A. Quail; Harold Swerdlow; Carol Churcher; Rachael Natrajan; Anieta M. Sieuwerts; John W.M. Martens; Daniel P. Silver; Anita Langerød; Hege G. Russnes; John A. Foekens; Jorge S. Reis-Filho; Laura J. van 't Veer; Andrea L. Richardson; Anne Lise Børresen-Dale

Multiple somatic rearrangements are often found in cancer genomes; however, the underlying processes of rearrangement and their contribution to cancer development are poorly characterized. Here we use a paired-end sequencing strategy to identify somatic rearrangements in breast cancer genomes. There are more rearrangements in some breast cancers than previously appreciated. Rearrangements are more frequent over gene footprints and most are intrachromosomal. Multiple rearrangement architectures are present, but tandem duplications are particularly common in some cancers, perhaps reflecting a specific defect in DNA maintenance. Short overlapping sequences at most rearrangement junctions indicate that these have been mediated by non-homologous end-joining DNA repair, although varying sequence patterns indicate that multiple processes of this type are operative. Several expressed in-frame fusion genes were identified but none was recurrent. The study provides a new perspective on cancer genomes, highlighting the diversity of somatic rearrangements and their potential contribution to cancer development.


Genome Research | 2012

Efficient de novo assembly of large genomes using compressed data structures

Jared T. Simpson; Richard Durbin

De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data. SGA uses the overlap-based string graph model of assembly, unlike most de novo assemblers that rely on de Bruijn graphs, and is simply parallelizable. We demonstrate the error correction and assembly performance of SGA on 1.2 billion sequence reads from a human genome, which we are able to assemble using 54 GB of memory. The resulting contigs are highly accurate and contiguous, while covering 95% of the reference genome (excluding contigs <200 bp in length). Because of the low memory requirements and parallelization without requiring inter-process communication, SGA provides the first practical assembler to our knowledge for a mammalian-sized genome on a low-end computing cluster.


Nature | 2012

Insights into hominid evolution from the gorilla genome sequence.

Aylwyn Scally; Julien Y. Dutheil; LaDeana W. Hillier; Gregory Jordan; Ian Goodhead; Javier Herrero; Asger Hobolth; Tuuli Lappalainen; Thomas Mailund; Tomas Marques-Bonet; Shane McCarthy; Stephen H. Montgomery; Petra C. Schwalie; Y. Amy Tang; Michelle C. Ward; Yali Xue; Bryndis Yngvadottir; Can Alkan; Lars Nørvang Andersen; Qasim Ayub; Edward V. Ball; Kathryn Beal; Brenda J. Bradley; Yuan Chen; Chris Clee; Stephen Fitzgerald; Tina Graves; Yong Gu; Paul Heath; Andreas Heger

Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human–chimpanzee and human–chimpanzee–gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Genome Research | 2011

Assemblathon 1: A competitive assessment of de novo short read assembly methods

Dent Earl; Keith Bradnam; John St. John; Aaron E. Darling; Dawei Lin; Joseph Fass; Hung On Ken Yu; Vince Buffalo; Daniel R. Zerbino; Mark Diekhans; Ngan Nguyen; Pramila Ariyaratne; Wing-Kin Sung; Zemin Ning; Matthias Haimel; Jared T. Simpson; Nuno A. Fonseca; Inanc Birol; T. Roderick Docking; Isaac Ho; Daniel S. Rokhsar; Rayan Chikhi; Dominique Lavenier; Guillaume Chapuis; Delphine Naquin; Nicolas Maillet; Michael C. Schatz; David R. Kelley; Adam M. Phillippy; Sergey Koren

Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.


GigaScience | 2013

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

Keith Bradnam; Joseph Fass; Anton Alexandrov; Paul Baranay; Michael Bechner; Inanc Birol; Sébastien Boisvert; Jarrod Chapman; Guillaume Chapuis; Rayan Chikhi; Hamidreza Chitsaz; Wen Chi Chou; Jacques Corbeil; Cristian Del Fabbro; Roderick R. Docking; Richard Durbin; Dent Earl; Scott J. Emrich; Pavel Fedotov; Nuno A. Fonseca; Ganeshkumar Ganapathy; Richard A. Gibbs; Sante Gnerre; Élénie Godzaridis; Steve Goldstein; Matthias Haimel; Giles Hall; David Haussler; Joseph Hiatt; Isaac Ho

BackgroundThe process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.ResultsIn Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.ConclusionsMany current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.


Nature Methods | 2015

A complete bacterial genome assembled de novo using only nanopore sequencing data

Nicholas J. Loman; Joshua Quick; Jared T. Simpson

We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera Assembler; and (iii) the assembly is polished using a probabilistic model of the signal-level data. The assembly reconstructs gene order and has 99.5% nucleotide identity.


PLOS Genetics | 2011

Trait Variation in Yeast Is Defined by Population History

Jonas Warringer; Enikö Zörgö; Francisco A. Cubillos; Amin Zia; Arne B. Gjuvsland; Jared T. Simpson; Annabelle Forsmark; Richard Durbin; Stig W. Omholt; Edward J. Louis; Gianni Liti; Alan M. Moses; Anders Blomberg

A fundamental goal in biology is to achieve a mechanistic understanding of how and to what extent ecological variation imposes selection for distinct traits and favors the fixation of specific genetic variants. Key to such an understanding is the detailed mapping of the natural genomic and phenomic space and a bridging of the gap that separates these worlds. Here we chart a high-resolution map of natural trait variation in one of the most important genetic model organisms, the budding yeast Saccharomyces cerevisiae, and its closest wild relatives and trace the genetic basis and timing of major phenotype changing events in its recent history. We show that natural trait variation in S. cerevisiae exceeds that of its relatives, despite limited genetic variation, and follows the population history rather than the source environment. In particular, the West African population is phenotypically unique, with an extreme abundance of low-performance alleles, notably a premature translational termination signal in GAL3 that cause inability to utilize galactose. Our observations suggest that many S. cerevisiae traits may be the consequence of genetic drift rather than selection, in line with the assumption that natural yeast lineages are remnants of recent population bottlenecks. Disconcertingly, the universal type strain S288C was found to be highly atypical, highlighting the danger of extrapolating gene-trait connections obtained in mosaic, lab-domesticated lineages to the species as a whole. Overall, this study represents a step towards an in-depth understanding of the causal relationship between co-variation in ecology, selection pressure, natural traits, molecular mechanism, and alleles in a key model organism.


Genome Research | 2011

Revealing the genetic structure of a trait by sequencing a population under selection

Leopold Parts; Francisco A. Cubillos; Jonas Warringer; Kanika Jain; Francisco Salinas; Suzannah Bumpstead; Mikael Molin; Amin Zia; Jared T. Simpson; Michael A. Quail; Alan M. Moses; Edward J. Louis; Richard Durbin; Gianni Liti

One approach to understanding the genetic basis of traits is to study their pattern of inheritance among offspring of phenotypically different parents. Previously, such analysis has been limited by low mapping resolution, high labor costs, and large sample size requirements for detecting modest effects. Here, we present a novel approach to map trait loci using artificial selection. First, we generated populations of 10-100 million haploid and diploid segregants by crossing two budding yeast strains of different heat tolerance for up to 12 generations. We then subjected these large segregant pools to heat stress for up to 12 d, enriching for beneficial alleles. Finally, we sequenced total DNA from the pools before and during selection to measure the changes in parental allele frequency. We mapped 21 intervals with significant changes in genetic background in response to selection, which is several times more than found with traditional linkage methods. Nine of these regions contained two or fewer genes, yielding much higher resolution than previous genomic linkage studies. Multiple members of the RAS/cAMP signaling pathway were implicated, along with genes previously not annotated with heat stress response function. Surprisingly, at most selected loci, allele frequencies stopped changing before the end of the selection experiment, but alleles did not become fixed. Furthermore, we were able to detect the same set of trait loci in a population of diploid individuals with similar power and resolution, and observed primarily additive effects, similar to what is seen for complex trait genetics in other diploid organisms such as humans.


Bioinformatics | 2010

Efficient construction of an assembly string graph using the FM-index

Jared T. Simpson; Richard Durbin

Motivation: Sequence assembly is a difficult problem whose importance has grown again recently as the cost of sequencing has dramatically dropped. Most new sequence assembly software has started by building a de Bruijn graph, avoiding the overlap-based methods used previously because of the computational cost and complexity of these with very large numbers of short reads. Here, we show how to use suffix array-based methods that have formed the basis of recent very fast sequence mapping algorithms to find overlaps and generate assembly string graphs asymptotically faster than previously described algorithms. Results: Standard overlap assembly methods have time complexity O(N2), where N is the sum of the lengths of the reads. We use the Ferragina–Manzini index (FM-index) derived from the Burrows–Wheeler transform to find overlaps of length at least τ among a set of reads. As well as an approach that finds all overlaps then implements transitive reduction to produce a string graph, we show how to output directly only the irreducible overlaps, significantly shrinking memory requirements and reducing compute time to O(N), independent of depth. Overlap-based assembly methods naturally handle mixed length read sets, including capillary reads or long reads promised by the third generation sequencing technologies. The algorithms we present here pave the way for overlap-based assembly approaches to be developed that scale to whole vertebrate genome de novo assembly. Contact: js18@sanger.ac.uk

Collaboration


Dive into the Jared T. Simpson's collaboration.

Top Co-Authors

Avatar

Richard Durbin

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Rayan Chikhi

Pennsylvania State University

View shared research outputs
Top Co-Authors

Avatar

Inanc Birol

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Matthew Loose

University of Nottingham

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Matei David

Ontario Institute for Cancer Research

View shared research outputs
Top Co-Authors

Avatar

Paul C. Boutros

Ontario Institute for Cancer Research

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nuno A. Fonseca

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

Guillaume Chapuis

École normale supérieure de Cachan

View shared research outputs
Researchain Logo
Decentralizing Knowledge