Glenn Proctor
European Bioinformatics Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Glenn Proctor.
Database | 2011
Rhoda Kinsella; Andreas Kähäri; Syed Haider; Jorge Zamora; Glenn Proctor; Giulietta Spudich; Jeff Almeida-King; Daniel M. Staines; Paul S. Derwent; Arnaud Kerhornou; Paul Kersey; Paul Flicek
For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/
Nucleic Acids Research | 2010
Paul Flicek; Bronwen Aken; Benoit Ballester; Kathryn Beal; Eugene Bragin; Simon Brent; Yuan Chen; Peter Clapham; Guy Coates; Susan Fairley; Stephen Fitzgerald; Julio Fernandez-Banet; Leo Gordon; Stefan Gräf; Syed Haider; Martin Hammond; Kerstin Howe; Andrew M. Jenkinson; Nathan Johnson; Andreas Kähäri; Damian Keefe; Stephen Keenan; Rhoda Kinsella; Felix Kokocinski; Gautier Koscielny; Eugene Kulesha; Daniel Lawson; Ian Longden; Tim Massingham; William M. McLaren
Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
Nucleic Acids Research | 2010
Paul J. Kersey; Daniel John Lawson; Ewan Birney; Paul S. Derwent; Matthias Haimel; Javier Herrero; Stephen Keenan; Arnaud Kerhornou; Gautier Koscielny; Andreas Kähäri; Rhoda Kinsella; Eugene Kulesha; Uma Maheswari; Karine Megy; Michael Nuhn; Glenn Proctor; Daniel M. Staines; Franck Valentin; Albert J. Vilella; Andy Yates
Ensembl Genomes (http://www.ensemblgenomes.org) is a new portal offering integrated access to genome-scale data from non-vertebrate species of scientific interest, developed using the Ensembl genome annotation and visualisation platform. Ensembl Genomes consists of five sub-portals (for bacteria, protists, fungi, plants and invertebrate metazoa) designed to complement the availability of vertebrate genomes in Ensembl. Many of the databases supporting the portal have been built in close collaboration with the scientific community, which we consider as essential for maintaining the accuracy and usefulness of the resource. A common set of user interfaces (which include a graphical genome browser, FTP, BLAST search, a query optimised data warehouse, programmatic access, and a Perl API) is provided for all domains. Data types incorporated include annotation of (protein and non-protein coding) genes, cross references to external resources, and high throughput experimental data (e.g. data from large scale studies of gene expression and polymorphism visualised in their genomic context). Additionally, extensive comparative analysis has been performed, both within defined clades and across the wider taxonomy, and sequence alignments and gene trees resulting from this can be accessed through the site.
Genome Medicine | 2010
Raymond Dalgleish; Paul Flicek; Fiona Cunningham; Alex Astashyn; Raymond E. Tully; Glenn Proctor; Yuan Chen; William M. McLaren; Pontus Larsson; Brendan Vaughan; Christophe Béroud; Glen Dobson; Heikki Lehväslaiho; Peter E.M. Taschner; Johan T. den Dunnen; Andrew Devereau; Ewan Birney; Anthony J. Brookes; Donna Maglott
As our knowledge of the complexity of gene architecture grows, and we increase our understanding of the subtleties of gene expression, the process of accurately describing disease-causing gene variants has become increasingly problematic. In part, this is due to current reference DNA sequence formats that do not fully meet present needs. Here we present the Locus Reference Genomic (LRG) sequence format, which has been designed for the specific purpose of gene variant reporting. The format builds on the successful National Center for Biotechnology Information (NCBI) RefSeqGene project and provides a single-file record containing a uniquely stable reference DNA sequence along with all relevant transcript and protein sequences essential to the description of gene variants. In principle, LRGs can be created for any organism, not just human. In addition, we recognize the need to respect legacy numbering systems for exons and amino acids and the LRG format takes account of these. We hope that widespread adoption of LRGs - which will be created and maintained by the NCBI and the European Bioinformatics Institute (EBI) - along with consistent use of the Human Genome Variation Society (HGVS)-approved variant nomenclature will reduce errors in the reporting of variants in the literature and improve communication about variants affecting human health. Further information can be found on the LRG web site: http://www.lrg-sequence.org.
Mammalian Genome | 2007
John M. Hancock; Niels C. Adams; Vassilis Aidinis; Andrew Blake; Molly Bogue; Steve D.M. Brown; Elissa J. Chesler; Duncan Davidson; Christopher Duran; Janan T. Eppig; Valérie Gailus-Durner; Hilary Gates; Georgios V. Gkoutos; Simon Greenaway; Martin Hrabé de Angelis; George Kollias; Sophie Leblanc; Kirsty Lee; Christoph Lengger; Holger Maier; Ann-Marie Mallon; Hiroshi Masuya; David Melvin; Werner Müller; Helen Parkinson; Glenn Proctor; Eli Reuveni; Paul N. Schofield; Aadya Shukla; Cynthia L. Smith
Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterize the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first-line phenotyping data on novel mutations, data on the normal features of inbred lines) at many sites worldwide. For the most efficient use of these data sets, we have initiated a process to develop standards for the description of phenotypes (using ontologies) and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.
Briefings in Bioinformatics | 2008
Damian Smedley; Morris A. Swertz; Katy Wolstencroft; Glenn Proctor; Michael Zouberakis; Jonathan Bard; John M. Hancock; Paul N. Schofield
The torrent of data emerging from the application of new technologies to functional genomics and systems biology can no longer be contained within the traditional modes of data sharing and publication with the consequence that data is being deposited in, distributed across and disseminated through an increasing number of databases. The resulting fragmentation poses serious problems for the model organism community which increasingly rely on data mining and computational approaches that require gathering of data from a range of sources. In the light of these problems, the European Commission has funded a coordination action, CASIMIR (coordination and sustainability of international mouse informatics resources), with a remit to assess the technical and social aspects of database interoperability that currently prevent the full realization of the potential of data integration in mouse functional genomics. In this article, we assess the current problems with interoperability, with particular reference to mouse functional genomics, and critically review the technologies that can be deployed to overcome them. We describe a typical use-case where an investigator wishes to gather data on variation, genomic context and metabolic pathway involvement for genes discovered in a genome-wide screen. We go on to develop an automated approach involving an in silico experimental workflow tool, Taverna, using web services, BioMart and MOLGENIS technologies for data retrieval. Finally, we focus on the current impediments to adopting such an approach in a wider context, and strategies to overcome them.
BMC Genomics | 2010
Benoit Ballester; Nathan Johnson; Glenn Proctor; Paul Flicek
BackgroundGene expression arrays are valuable and widely used tools for biomedical research. Todays commercial arrays attempt to measure the expression level of all of the genes in the genome. Effectively translating the results from the microarray into a biological interpretation requires an accurate mapping between the probesets on the array and the genes that they are targeting. Although major array manufacturers provide annotations of their gene expression arrays, the methods used by various manufacturers are different and the annotations are difficult to keep up to date in the rapidly changing world of biological sequence databases.ResultsWe have created a consistent microarray annotation protocol applicable to all of the major array manufacturers. We constantly keep our annotations updated with the latest Ensembl Gene predictions, and thus cross-referenced with a large number of external biomedical sequence database identifiers. We show that these annotations are accurate and address in detail reasons for the minority of probesets that cannot be annotated. Annotations are publicly accessible through the Ensembl Genome Browser and programmatically through the Ensembl Application Programming Interface. They are also seamlessly integrated into the BioMart data-mining tool and the biomaRt package of BioConductor.ConclusionsConsistent, accurate and updated gene expression array annotations remain critical for biological research. Our annotations facilitate accurate biological interpretation of gene expression profiles.
Mammalian Genome | 2008
John M. Hancock; Niels C. Adams; Vassilis Aidinis; Andrew Blake; Judith A. Blake; Molly Bogue; Steve D.M. Brown; Elissa J. Chesler; Duncan Davidson; Christopher Duran; Janan T. Eppig; Valérie Gailus-Durner; Hilary Gates; Georgios V. Gkoutos; Simon Greenaway; Martin Hrabé de Angelis; George Kollias; Sophie Leblanc; Kirsty Lee; Christoph Lengger; Holger Maier; Ann-Marie Mallon; Hiroshi Masuya; David Melvin; Werner Müller; Helen Parkinson; Glenn Proctor; Eli Reuveni; Paul N. Schofield; Aadya Shukla
Understanding the functions encoded in the mouse genome will be central to an understanding of the genetic basis of human disease. To achieve this it will be essential to be able to characterise the phenotypic consequences of variation and alterations in individual genes. Data on the phenotypes of mouse strains are currently held in a number of different forms (detailed descriptions of mouse lines, first line phenotyping data on novel mutations, data on the normal features of inbred lines, etc.) at many sites worldwide. For the most efficient use of these data sets, we have set in train a process to develop standards for the description of phenotypes (using ontologies), and file formats for the description of phenotyping protocols and phenotype data sets. This process is ongoing, and needs to be supported by the wider mouse genetics and phenotyping communities to succeed. We invite interested parties to contact us as we develop this process further.
Database | 2017
Magali Ruffier; Andreas Kähäri; Monika Komorowska; Stephen Keenan; Matthew R. Laird; Ian Longden; Glenn Proctor; Steve Searle; Daniel M. Staines; Kieron Taylor; Alessandro Vullo; Andrew Yates; Daniel R. Zerbino; Paul Flicek
Abstract The Ensembl software resources are a stable infrastructure to store, access and manipulate genome assemblies and their functional annotations. The Ensembl ‘Core’ database and Application Programming Interface (API) was our first major piece of software infrastructure and remains at the centre of all of our genome resources. Since its initial design more than fifteen years ago, the number of publicly available genomic, transcriptomic and proteomic datasets has grown enormously, accelerated by continuous advances in DNA-sequencing technology. Initially intended to provide annotation for the reference human genome, we have extended our framework to support the genomes of all species as well as richer assembly models. Cross-referenced links to other informatics resources facilitate searching our database with a variety of popular identifiers such as UniProt and RefSeq. Our comprehensive and robust framework storing a large diversity of genome annotations in one location serves as a platform for other groups to generate and maintain their own tailored annotation. We welcome reuse and contributions: our databases and APIs are publicly available, all of our source code is released with a permissive Apache v2.0 licence at http://github.com/Ensembl and we have an active developer mailing list (http://www.ensembl.org/info/about/contact/index.html). Database URL: http://www.ensembl.org
Genome Research | 2004
Ewan Birney; T. Daniel Andrews; Paul Bevan; Mario Caccamo; Yuan Chen; Laura Clarke; Guy Coates; James Cuff; Val Curwen; Tim Cutts; Thomas A. Down; Eduardo Eyras; Xosé M. Fernández-Suárez; Paul Gane; Brian Gibbins; James Gilbert; Martin Hammond; Hans-Rudolf Hotz; Vivek Iyer; Kerstin Jekosch; Andreas Kähäri; Arek Kasprzyk; Damian Keefe; Stephen Keenan; Heikki Lehväslaiho; Graham McVicker; Craig Melsopp; Patrick Meidl; Emmanuel Mongin; Roger Pettett