Anthony J. Cox | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Anthony J. Cox is active.

Explore More

Publication

Featured researches published by Anthony J. Cox.

Nature | 2010

A comprehensive catalogue of somatic mutations from a human cancer genome

Erin Pleasance; R. Keira Cheetham; Philip Stephens; David J. McBride; Sean Humphray; Christopher Greenman; Ignacio Varela; Meng-Lay Lin; Gonzalo R. Ordóñez; Graham R. Bignell; Kai Ye; Julie A Alipaz; Markus J. Bauer; David Beare; Adam Butler; Richard J. Carter; Lina Chen; Anthony J. Cox; Sarah Edkins; Paula Kokko-Gonzales; Niall Anthony Gormley; Russell Grocock; Christian D. Haudenschild; Matthew M. Hims; Terena James; Mingming Jia; Zoya Kingsbury; Catherine Leroy; John Marshall; Andrew Menzies

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.

Cell | 2012

Genome Sequencing and Analysis of the Tasmanian Devil and Its Transmissible Cancer

Elizabeth P. Murchison; Ole Schulz-Trieglaff; Zemin Ning; Ludmil B. Alexandrov; Markus J. Bauer; Beiyuan Fu; Matthew M. Hims; Zhihao Ding; Sergii Ivakhno; Caitlin Stewart; Bee Ling Ng; Wendy Wong; Bronwen Aken; Simon White; Amber E. Alsop; Jennifer Becq; Graham R. Bignell; R. Keira Cheetham; William Cheng; Thomas Richard Connor; Anthony J. Cox; Zhi-Ping Feng; Yong Gu; Russell Grocock; Simon R. Harris; Irina Khrebtukova; Zoya Kingsbury; Mark Kowarsky; Alexandre Kreiss; Shujun Luo

Summary The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations. PaperClip

PLOS ONE | 2009

Genomic Diversity among Drug Sensitive and Multidrug Resistant Isolates of Mycobacterium tuberculosis with Identical DNA Fingerprints

Stefan Niemann; Claudio U. Köser; Sebastien Gagneux; Claudia Plinke; Helen Rachel Bignell; Richard J. Carter; R. Keira Cheetham; Anthony J. Cox; Niall Anthony Gormley; Paula Kokko-Gonzales; Lisa Murray; Roberto Rigatti; Vincent Peter Smith; Felix P. M. Arends; Helen S. Cox; Geoff Smith; John A. C. Archer

Background Mycobacterium tuberculosis complex (MTBC), the causative agent of tuberculosis (TB), is characterized by low sequence diversity making this bacterium one of the classical examples of a genetically monomorphic pathogen. Because of this limited DNA sequence variation, routine genotyping of clinical MTBC isolates for epidemiological purposes relies on highly discriminatory DNA fingerprinting methods based on mobile and repetitive genetic elements. According to the standard view, isolates exhibiting the same fingerprinting pattern are considered direct progeny of the same bacterial clone, and most likely reflect ongoing transmission or disease relapse within individual patients. Methodology/Principal Findings Here we further investigated this assumption and used massively parallel whole-genome sequencing to compare one drug-susceptible (K-1) and one multidrug resistant (MDR) isolate (K-2) of a rapidly spreading M. tuberculosis Beijing genotype clone from a high incidence region (Karakalpakstan, Uzbekistan). Both isolates shared the same IS6110 RFLP pattern and the same allele at 23 out of 24 MIRU-VNTR loci. We generated 23.9 million (K-1) and 33.0 million (K-2) paired 50 bp purity filtered reads corresponding to a mean coverage of 483.5 fold and 656.1 fold respectively. Compared with the laboratory strain H37Rv both Beijing isolates shared 1,209 SNPs. The two Beijing isolates differed by 130 SNPs and one large deletion. The susceptible isolate had 55 specific SNPs, while the MDR variant had 75 specific SNPs, including the five known resistance-conferring mutations. Conclusions Our results suggest that M. tuberculosis isolates exhibiting identical DNA fingerprinting patterns can harbour substantial genomic diversity. Because this heterogeneity is not captured by traditional genotyping of MTBC, some aspects of the transmission dynamics of tuberculosis could be missed or misinterpreted. Furthermore, a valid differentiation between disease relapse and exogenous reinfection might be impossible using standard genotyping tools if the overall diversity of circulating clones is limited. These findings have important implications for clinical trials of new anti-tuberculosis drugs.

Bioinformatics | 2016

Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications

Xiaoyu Chen; Ole Schulz-Trieglaff; Richard Shaw; Bret Barnes; Felix Schlesinger; Morten Källberg; Anthony J. Cox; Semyon Kruglyak; Christopher T. Saunders

UNLABELLED : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50× genomic coverage is analyzed in less than 20 min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. AVAILABILITY AND IMPLEMENTATION Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta. CONTACT [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Bioinformatics | 2010

CNAseg—a novel framework for identification of copy number changes in cancer from second-generation sequencing data

Sergii Ivakhno; Tom Royce; Anthony J. Cox; Dirk Evers; R. Keira Cheetham; Simon Tavaré

MOTIVATION Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. RESULTS We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. AVAILABILITY The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html.

Theoretical Computer Science | 2013

Lightweight algorithms for constructing and inverting the BWT of string collections

Markus J. Bauer; Anthony J. Cox; Giovanna Rosone

Recent progress in the field of DNA sequencing motivates us to consider the problem of computing the Burrows-Wheeler transform (BWT) of a collection of strings. A human genome sequencing experiment might yield a billion or more sequences, each 100 characters in length. Such a dataset can now be generated in just a few days on a single sequencing machine. Many algorithms and data structures for compression and indexing of text have the BWT at their heart, and it would be of great interest to explore their applications to sequence collections such as these. However, computing the BWT for 100 billion characters or more of data remains a computational challenge. In this work we address this obstacle by presenting a methodology for computing the BWT of a string collection in a lightweight fashion. A first implementation of our algorithm needs O(mlogm) bits of memory to process m strings, while a second variant makes additional use of external memory to achieve RAM usage that is constant with respect to m and negligible in size for a small alphabet such as DNA. The algorithms work on any number of strings and any size. We evaluate our algorithms on collections of up to 1 billion strings and compare their performance to other approaches on smaller datasets. We take further steps toward making the BWT a practical tool for processing string collections on this scale. First, we give two algorithms for recovering the strings in a collection from its BWT. Second, we show that if sequences are added to or removed from the collection, then the BWT of the original collection can be efficiently updated to obtain the BWT of the revised collection.

combinatorial pattern matching | 2011

Lightweight BWT construction for very large string collections

Markus J. Bauer; Anthony J. Cox; Giovanna Rosone

A modern DNA sequencing machine can generate a billion or more sequence fragments in a matter of days. The many uses of the BWT in compression and indexing are well known, but the computational demands of creating the BWT of datasets this large have prevented its applications from being widely explored in this context. We address this obstacle by presenting two algorithms capable of computing the BWT of very large string collections. The algorithms are lightweight in that the first needs O(m log m) bits of memory to process m strings and the memory requirements of the second are constant with respect to m. We evaluate our algorithms on collections of up to 1 billion strings and compare their performance to other approaches on smaller datasets. Although our tests were on collections of DNA sequences of uniform length, the algorithms themselves apply to any string collection over any alphabet.

Bioinformatics | 2014

Adaptive reference-free compression of sequence quality scores

Lilian Janin; Giovanna Rosone; Anthony J. Cox

MOTIVATION Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and, hence, are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full resolution. As our approach relies directly on redundancy present in the reads, it does not need a reference sequence and is, therefore, applicable to data from metagenomics and de novo experiments as well as to re-sequencing data. RESULTS We show that a conservative smoothing strategy affecting 75% of the quality scores above Q2 leads to an overall quality score compression of 1 bit per value with a negligible effect on variant calling. A compression of 0.68 bit per quality value is achieved using a more aggressive smoothing strategy, again with a very small effect on variant calling. AVAILABILITY Code to construct the BWT and LCP-array on large genomic data sets is part of the BEETL library, available as a github repository at [email protected]:BEETL/BEETL.git.

workshop on algorithms in bioinformatics | 2012

Comparing DNA sequence collections by direct comparison of compressed text indexes

Anthony J. Cox; Tobias Jakobi; Giovanna Rosone; Ole Schulz-Trieglaff

Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods. Code to construct and compare the BWT of large genomic data sets is available at http://beetl.github.com/BEETL/ as part of the BEETL library.

BMC Bioinformatics | 2013

metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences

Christina Ander; Ole Schulz-Trieglaff; Jens Stoye; Anthony J. Cox

Environmental shotgun sequencing (ESS) has potential to give greater insight into microbial communities than targeted sequencing of 16S regions, but requires much higher sequence coverage. The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing such data sets are needed.Here we present metaBEETL, a fast taxonomic classifier for environmental shotgun sequences. It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences. Unlike other BWT-based tools, our method has no upper limit on the number or the total size of the reference sequences in its database. By capturing sequence relationships between strains, our reference index also allows us to classify reads which are not unique to an individual strain but are nevertheless specific to some higher phylogenetic order.Tested on datasets with known taxonomic composition, metaBEETL gave results that are competitive with existing similarity-based tools: due to normalization steps which other classifiers lack, the taxonomic profile computed by metaBEETL closely matched the true environmental profile. At the same time, its moderate running time and low memory footprint allow metaBEETL to scale well to large data sets.Code to construct the BWT indexed database and for the taxonomic classification is part of the BEETL library, available as a github repository at [email protected]:BEETL/BEETL.git.

Explore More