Geoffrey Paul Smith
Illumina
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Geoffrey Paul Smith.
Nature Biotechnology | 2015
Vicki S. Chambers; Giovanni Marsico; Jonathan Mark Boutell; Marco Di Antonio; Geoffrey Paul Smith; Shankar Balasubramanian
G-quadruplexes (G4s) are nucleic acid secondary structures that form within guanine-rich DNA or RNA sequences. G4 formation can affect chromatin architecture and gene regulation and has been associated with genomic instability, genetic diseases and cancer progression. Here we present a high-resolution sequencing–based method to detect G4s in the human genome. We identified 716,310 distinct G4 structures, 451,646 of which were not predicted by computational methods. These included previously uncharacterized noncanonical long loop and bulged structures. We observed a high G4 density in functional regions, such as 5′ untranslated regions and splicing sites, as well as in genes previously not predicted to contain these structures (such as BRCA2). G4 formation was significantly associated with oncogenes, tumor suppressors and somatic copy number alterations related to cancer development. The G4s identified in this study may therefore represent promising targets for cancer intervention.
Cell | 2012
Elizabeth P. Murchison; Ole Schulz-Trieglaff; Zemin Ning; Ludmil B. Alexandrov; Markus J. Bauer; Beiyuan Fu; Matthew M. Hims; Zhihao Ding; Sergii Ivakhno; Caitlin Stewart; Bee Ling Ng; Wendy Wong; Bronwen Aken; Simon White; Amber E. Alsop; Jennifer Becq; Graham R. Bignell; R. Keira Cheetham; William Cheng; Thomas Richard Connor; Anthony J. Cox; Zhi-Ping Feng; Yong Gu; Russell Grocock; Simon R. Harris; Irina Khrebtukova; Zoya Kingsbury; Mark Kowarsky; Alexandre Kreiss; Shujun Luo
Summary The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations. PaperClip
JAMA | 2013
Nicholas J. Loman; Chrystala Constantinidou; Martin Christner; Holger Rohde; Jacqueline Chan; Joshua Quick; Jacqueline C. Weir; Christopher Quince; Geoffrey Paul Smith; Jason Richard Betley; Martin Aepfelbacher; Mark J. Pallen
IMPORTANCE Identification of the bacterium responsible for an outbreak can aid in disease management. However, traditional culture-based diagnosis can be difficult, particularly if no specific diagnostic test is available for an outbreak strain. OBJECTIVE To explore the potential of metagenomics, which is the direct sequencing of DNA extracted from microbiologically complex samples, as an open-ended clinical discovery platform capable of identifying and characterizing bacterial strains from an outbreak without laboratory culture. DESIGN, SETTING, AND PATIENTS In a retrospective investigation, 45 samples were selected from fecal specimens obtained from patients with diarrhea during the 2011 outbreak of Shiga-toxigenic Escherichia coli (STEC) O104:H4 in Germany. Samples were subjected to high-throughput sequencing (August-September 2012), followed by a 3-phase analysis (November 2012-February 2013). In phase 1, a de novo assembly approach was developed to obtain a draft genome of the outbreak strain. In phase 2, the depth of coverage of the outbreak strain genome was determined in each sample. In phase 3, sequences from each sample were compared with sequences from known bacteria to identify pathogens other than the outbreak strain. MAIN OUTCOMES AND MEASURES The recovery of genome sequence data for the purposes of identification and characterization of the outbreak strain and other pathogens from fecal samples. RESULTS During phase 1, a draft genome of the STEC outbreak strain was obtained. During phase 2, the outbreak strain genome was recovered from 10 samples at greater than 10-fold coverage and from 26 samples at greater than 1-fold coverage. Sequences from the Shiga-toxin genes were detected in 27 of 40 STEC-positive samples (67%). In phase 3, sequences from Clostridium difficile, Campylobacter jejuni, Campylobacter concisus, and Salmonella enterica were recovered. CONCLUSIONS AND RELEVANCE These results suggest the potential of metagenomics as a culture-independent approach for the identification of bacterial pathogens during an outbreak of diarrheal disease. Challenges include improving diagnostic sensitivity, speeding up and simplifying workflows, and reducing costs.
Nature Biotechnology | 2011
Hamidreza Chitsaz; Joyclyn Yee-Greenbaum; Glenn Tesler; Mary-Jane Lombardo; Christopher L. Dupont; Jonathan H. Badger; Mark Novotny; Douglas B. Rusch; Louise Fraser; Niall Anthony Gormley; Ole Schulz-Trieglaff; Geoffrey Paul Smith; Dirk Evers; Pavel A. Pevzner; Roger S. Lasken
Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of DNA from single cells of bacteria that cannot be cultured. Assembling a genome is challenging, however, because MDA generates highly nonuniform coverage of the genome. Here we describe an algorithm tailored for short-read data from single cells that improves assembly through the use of a progressively increasing coverage cutoff. Assembly of reads from single Escherichia coli and Staphylococcus aureus cells captures >91% of genes within contigs, approaching the 95% captured from an assembly based on many E. coli cells. We apply this method to assemble a genome from a single cell of an uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. Our approach enables acquisition of genome assemblies for individual uncultivated bacteria using only short reads, providing cell-specific genetic information absent from metagenomic studies.Whole genome amplification by the multiple displacement amplification (MDA) method allows sequencing of genomes from single cells of bacteria that cannot be cultured. However, genome assembly is challenging because of highly non-uniform read coverage generated by MDA. We describe an improved assembly approach tailored for single cell Illumina sequences that incorporates a progressively increasing coverage cutoff. This allows variable coverage datasets to be utilized effectively with assembly of E. coli and S. aureus single cell reads capturing >91% of genes within contigs, approaching the 95% captured from a multi-cell E. coli assembly. We apply this method to assemble a single cell genome of the uncultivated SAR324 clade of Deltaproteobacteria, a cosmopolitan bacterial lineage in the global ocean. Metabolic reconstruction suggests that SAR324 is aerobic, motile and chemotaxic. These new methods enable acquisition of genome assemblies for individual uncultivated bacteria, providing cell-specific genetic information absent from metagenomic studies.
The New England Journal of Medicine | 2013
Claudio U. Köser; Josephine M. Bryant; Jennifer Becq; M. Estée Török; Matthew J. Ellington; Marc A. Marti-Renom; Andrew J. Carmichael; Julian Parkhill; Geoffrey Paul Smith; Sharon J. Peacock
As reported here, whole-genome sequencing has the potential to rapidly facilitate the determination of antimicrobial susceptibility, especially for slower-growing pathogens, such as Mycobacterium tuberculosis.
JAMA Internal Medicine | 2013
Sandra Reuter; Matthew J. Ellington; Edward J. P. Cartwright; Claudio U. Köser; M. Estée Török; Theodore Gouliouris; Simon R. Harris; Nick Brown; Matthew T. G. Holden; Michael A. Quail; Julian Parkhill; Geoffrey Paul Smith; Stephen D. Bentley; Sharon J. Peacock
IMPORTANCE The latest generation of benchtop DNA sequencing platforms can provide an accurate whole-genome sequence (WGS) for a broad range of bacteria in less than a day. These could be used to more effectively contain the spread of multidrug-resistant pathogens. OBJECTIVE To compare WGS with standard clinical microbiology practice for the investigation of nosocomial outbreaks caused by multidrug-resistant bacteria, the identification of genetic determinants of antimicrobial resistance, and typing of other clinically important pathogens. DESIGN, SETTING, AND PARTICIPANTS A laboratory-based study of hospital inpatients with a range of bacterial infections at Cambridge University Hospitals NHS Foundation Trust, a secondary and tertiary referral center in England, comparing WGS with standard diagnostic microbiology using stored bacterial isolates and clinical information. MAIN OUTCOMES AND MEASURES Specimens were taken and processed as part of routine clinical care, and cultured isolates stored and referred for additional reference laboratory testing as necessary. Isolates underwent DNA extraction and library preparation prior to sequencing on the Illumina MiSeq platform. Bioinformatic analyses were performed by persons blinded to the clinical, epidemiologic, and antimicrobial susceptibility data. RESULTS We investigated 2 putative nosocomial outbreaks, one caused by vancomycin-resistant Enterococcus faecium and the other by carbapenem-resistant Enterobacter cloacae; WGS accurately discriminated between outbreak and nonoutbreak isolates and was superior to conventional typing methods. We compared WGS with standard methods for the identification of the mechanism of carbapenem resistance in a range of gram-negative bacteria (Acinetobacter baumannii, E cloacae, Escherichia coli, and Klebsiella pneumoniae). This demonstrated concordance between phenotypic and genotypic results, and the ability to determine whether resistance was attributable to the presence of carbapenemases or other resistance mechanisms. Whole-genome sequencing was used to recapitulate reference laboratory typing of clinical isolates of Neisseria meningitidis and to provide extended phylogenetic analyses of these. CONCLUSIONS AND RELEVANCE The speed, accuracy, and depth of information provided by WGS platforms to confirm or refute outbreaks in hospitals and the community, and to accurately define transmission of multidrug-resistant and other organisms, represents an important advance.
BMJ Open | 2013
Sandra Reuter; Timothy G. Harrison; Claudio U. Köser; Matthew J. Ellington; Geoffrey Paul Smith; Julian Parkhill; Sharon J. Peacock; Stephen D. Bentley; M. Estée Török
Objectives Epidemiological investigations of Legionnaires’ disease outbreaks rely on the rapid identification and typing of clinical and environmental Legionella isolates in order to identify and control the source of infection. Rapid bacterial whole-genome sequencing (WGS) is an emerging technology that has the potential to rapidly discriminate outbreak from non-outbreak isolates in a clinically relevant time frame. Methods We performed a pilot study to determine the feasibility of using bacterial WGS to differentiate outbreak from non-outbreak isolates collected during an outbreak of Legionnaires’ disease. Seven Legionella isolates (three clinical and four environmental) were obtained from the reference laboratory and sequenced using the Illumina MiSeq platform at Addenbrookes Hospital, Cambridge. Bioinformatic analysis was performed blinded to the epidemiological data at the Wellcome Trust Sanger Institute. Results We were able to distinguish outbreak from non-outbreak isolates using bacterial WGS, and to confirm the probable environmental source. Our analysis also highlighted constraints, which were the small number of Legionella pneumophila isolates available for sequencing, and the limited number of published genomes for comparison. Conclusions We have demonstrated the feasibility of using rapid WGS to investigate an outbreak of Legionnaires’ disease. Future work includes building larger genomic databases of L pneumophila from both clinical and environmental sources, developing automated data interpretation software, and conducting a cost–benefit analysis of WGS versus current typing methods.
Journal of Clinical Microbiology | 2013
Mili Estee Torok; Sandra Reuter; Josephine M. Bryant; Claudio U. Köser; S. V. Stinchcombe; B. Nazareth; Matthew J. Ellington; Stephen D. Bentley; Geoffrey Paul Smith; Julian Parkhill; Sharon J. Peacock
ABSTRACT Two Southeast Asian students attending the same school in the United Kingdom presented with pulmonary tuberculosis. An epidemiological investigation failed to link the two cases, and drug resistance profiles of the Mycobacterium tuberculosis isolates were discrepant. Whole-genome sequencing of the isolates found them to be genetically identical, suggesting a missed transmission event.
Journal of Bacteriology | 2012
Jade C. S. Chung; Jennifer Becq; Louise Fraser; Ole Schulz-Trieglaff; Nicholas J. Bond; Juliet Foweraker; Kenneth D. Bruce; Geoffrey Paul Smith; Martin Welch
The airways of individuals with cystic fibrosis (CF) often become chronically infected with unique strains of the opportunistic pathogen Pseudomonas aeruginosa. Several lines of evidence suggest that the infecting P. aeruginosa lineage diversifies in the CF lung niche, yet so far this contemporary diversity has not been investigated at a genomic level. In this work, we sequenced the genomes of pairs of randomly selected contemporary isolates sampled from the expectorated sputum of three chronically infected adult CF patients. Each patient was infected by a distinct strain of P. aeruginosa. Single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) were identified in the DNA common to the paired isolates from different patients. The paired isolates from one patient differed due to just 1 SNP and 8 indels. The paired isolates from a second patient differed due to 54 SNPs and 38 indels. The pair of isolates from the third patient both contained a mutS mutation, which conferred a hypermutator phenotype; these isolates cumulatively differed due to 344 SNPs and 93 indels. In two of the pairs of isolates, a different accessory genome composition, specifically integrated prophage, was identified in one but not the other isolate of each pair. We conclude that contemporary isolates from a single sputum sample can differ at the SNP, indel, and accessory genome levels and that the cross-sectional genomic variation among coeval pairs of P. aeruginosa CF isolates can be comparable to the variation previously reported to differentiate between paired longitudinally sampled isolates.
Cancer Letters | 2012
Olaf Hardt; Stefan Wild; Ilka Oerlecke; Kay Hofmann; Shujun Luo; Yvonne Wiencek; Eva Kantelhardt; Christoph Vess; Geoffrey Paul Smith; Gary P. Schroth; Andreas Bosio; Jürgen Dittmer
We performed next generation sequencing- and microarray-based gene expression profiling of CD44(+)/CD24(-)/CD45(-) breast CSCs (cancer stem cells) isolated from primary ERα-positive breast cancer. By combining semi-automated dissociation of human tumor tissue, magnetic cell sorting and cDNA amplification less than 500 CSCs were required for transcriptome analyses. Besides overexpressing genes involved in maintenance of stemness, the CSCs showed higher levels of genes that drive the PI3K pathway, including EGFR, HB-EGF, PDGFRA/B, PDGF, MET, PIK3CA, PIK3R1 and PIK3R2. This suggests that, in CSCs of ERα-positive breast cancer, the PI3K pathway which is involved in endocrine resistance is hyperactive.