Is this you? Create Your Porfile

Paul Davis

Wellcome Trust Sanger Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Paul Davis is active.

Explore More

Publication

Featured researches published by Paul Davis.

Nature | 2005

The genome of the social amoeba Dictyostelium discoideum

Ludwig Eichinger; J. A. Pachebat; G. Glöckner; Marie-Adele Rajandream; Richard Sucgang; Matthew Berriman; J. Song; Rolf Olsen; Karol Szafranski; Qikai Xu; Budi Tunggal; Sarah K. Kummerfeld; B. A. Konfortov; Francisco Rivero; Alan Thomas Bankier; R. Lehmann; N. Hamlin; Robert Davies; Pascale Gaudet; Petra Fey; Karen E Pilcher; Guokai Chen; David L. Saunders; Erica Sodergren; Paul Davis; Arnaud Kerhornou; X. Nie; Neil Hall; Christophe Anjard; Lisa Hemphill

The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal–fungal lineage after the plant–animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.

Nature Genetics | 2006

The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome.

Mohammed Sebaihia; Brendan W. Wren; Peter Mullany; Neil Fairweather; Nigel P. Minton; Richard A. Stabler; Nicholas R. Thomson; Adam P. Roberts; Ana Cerdeño-Tárraga; Hongmei Wang; Matthew T. G. Holden; Anne Wright; Carol Churcher; Michael A. Quail; Stephen Baker; Nathalie Bason; Karen Brooks; Tracey Chillingworth; Ann Cronin; Paul Davis; Linda Dowd; Audrey Fraser; Theresa Feltwell; Zahra Hance; S. Holroyd; Kay Jagels; Sharon Moule; Karen Mungall; Claire Price; Ester Rabbinowitsch

We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organisms niche in the gut and should provide information on the evolution of virulence in this organism.

Genome Biology | 2006

The genome of Rhizobium leguminosarum has recognizable core and accessory components

J. Peter W. Young; Lisa Crossman; Andrew W. B. Johnston; Nicholas R. Thomson; Zara F. Ghazoui; Katherine H Hull; Margaret Wexler; Andrew R. J. Curson; Jonathan D. Todd; Philip S. Poole; Tim H. Mauchline; Alison K. East; Michael A. Quail; Carol Churcher; Claire Arrowsmith; Inna Cherevach; Tracey Chillingworth; Kay Clarke; Ann Cronin; Paul Davis; Audrey Fraser; Zahra Hance; Heidi Hauser; Kay Jagels; Sharon Moule; Karen Mungall; Halina Norbertczak; Ester Rabbinowitsch; Mandy Sanders; Mark Simmonds

BackgroundRhizobium leguminosarum is an α-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841.ResultsThe 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens.ConclusionOverall, the genome can be considered to have two main components: a core, which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an accessory component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.

Genome Research | 2008

Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis

Timothy P. Stinear; Torsten Seemann; Paul F. Harrison; Grant A. Jenkin; John K. Davies; Paul D. R. Johnson; Zahra Abdellah; Claire Arrowsmith; Tracey Chillingworth; Carol Churcher; Kay Clarke; Ann Cronin; Paul Davis; Ian Goodhead; Nancy Holroyd; Kay Jagels; Angela Lord; Sharon Moule; Karen Mungall; Halina Norbertczak; Michael A. Quail; Ester Rabbinowitsch; Danielle Walker; Brian R. White; Sally Whitehead; Pamela L. C. Small; Roland Brosch; Lalita Ramakrishnan; Michael A. Fischbach; Julian Parkhill

Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.

Nature | 2002

Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13

Neil Hall; Arnab Pain; Matthew Berriman; Carol Churcher; Barbara Harris; David Harris; Karen Mungall; Sharen Bowman; Rebecca Atkin; Stephen Baker; Andy Barron; Karen Brooks; Caroline O. Buckee; C. Burrows; Inna Cherevach; Tracey Chillingworth; Z. Christodoulou; Louise Clark; Richard Clark; Craig Corton; Ann Cronin; Robert Davies; Paul Davis; P. Dear; F. Dearden; Jonathon Doggett; Theresa Feltwell; Arlette Goble; Ian Goodhead; R. Gwilliam

Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3–9 and 13 of P. falciparum clone 3D7—these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.

BMC Bioinformatics | 2012

Automatic categorization of diverse experimental information in the bioscience literature

Ruihua Fang; Gary Schindelman; Kimberly Van Auken; Jolene S. Fernandes; Wen Chen; Xiaodong Wang; Paul Davis; Mary Ann Tuli; Steven J. Marygold; Gillian Millburn; Beverley B. Matthews; Haiyan Zhang; Nicholas H. Brown; William M. Gelbart; Paul W. Sternberg

BackgroundCuration of information from bioscience literature into biological knowledge databases is a crucial way of capturing experimental information in a computable form. During the biocuration process, a critical first step is to identify from all published literature the papers that contain results for a specific data type the curator is interested in annotating. This step normally requires curators to manually examine many papers to ascertain which few contain information of interest and thus, is usually time consuming. We developed an automatic method for identifying papers containing these curation data types among a large pool of published scientific papers based on the machine learning method Support Vector Machine (SVM). This classification system is completely automatic and can be readily applied to diverse experimental data types. It has been in use in production for automatic categorization of 10 different experimental datatypes in the biocuration process at WormBase for the past two years and it is in the process of being adopted in the biocuration process at FlyBase and the Saccharomyces Genome Database (SGD). We anticipate that this method can be readily adopted by various databases in the biocuration community and thereby greatly reducing time spent on an otherwise laborious and demanding task. We also developed a simple, readily automated procedure to utilize training papers of similar data types from different bodies of literature such as C. elegans and D. melanogaster to identify papers with any of these data types for a single database. This approach has great significance because for some data types, especially those of low occurrence, a single corpus often does not have enough training papers to achieve satisfactory performance.ResultsWe successfully tested the method on ten data types from WormBase, fifteen data types from FlyBase and three data types from Mouse Genomics Informatics (MGI). It is being used in the curation work flow at WormBase for automatic association of newly published papers with ten data types including RNAi, antibody, phenotype, gene regulation, mutant allele sequence, gene expression, gene product interaction, overexpression phenotype, gene interaction, and gene structure correction.ConclusionsOur methods are applicable to a variety of data types with training set containing several hundreds to a few thousand documents. It is completely automatic and, thus can be readily incorporated to different workflow at different literature-based databases. We believe that the work presented here can contribute greatly to the tremendous task of automating the important yet labor-intensive biocuration effort.

Archive | 2012