Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Corin Yeats is active.

Publication


Featured researches published by Corin Yeats.


Nucleic Acids Research | 2009

InterPro: the integrative protein signature database

Sarah Hunter; Rolf Apweiler; Teresa K. Attwood; Amos Marc Bairoch; Alex Bateman; David Binns; Peer Bork; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D. Finn; Julian Gough; Daniel H. Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Aurélie Laugraud; Ivica Letunic; David M. Lonsdale; Rodrigo Lopez; John Maslen; Craig McAnulla; Jennifer McDowall; Jaina Mistry; Alex L. Mitchell; Nicola Mulder; Darren A. Natale; Christine A. Orengo; Antony F. Quinn; Jeremy D. Selengut

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).


Nucleic Acids Research | 2012

InterPro in 2011: new developments in the family and domain prediction database

Sarah Hunter; P. D. Jones; Alex L. Mitchell; Rolf Apweiler; Teresa K. Attwood; Alex Bateman; Thomas Bernard; David Binns; Peer Bork; Sarah W. Burge; Edouard de Castro; Penny Coggill; Matthew Corbett; Ujjwal Das; Louise Daugherty; Lauranne Duquenne; Robert D. Finn; Matthew Fraser; Julian Gough; Daniel H. Haft; Nicolas Hulo; Daniel Kahn; Elizabeth Kelly; Ivica Letunic; David M. Lonsdale; Rodrigo Lopez; John Maslen; Craig McAnulla; Jennifer McDowall; Conor McMenamin

InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.


Nucleic Acids Research | 2007

New developments in the InterPro database

Nicola Mulder; Rolf Apweiler; Teresa K. Attwood; Amos Marc Bairoch; Alex Bateman; David Binns; Peer Bork; Virginie Buillard; Lorenzo Cerutti; Richard R. Copley; Emmanuel Courcelle; Ujjwal Das; Louise Daugherty; Mark Dibley; Robert D. Finn; Wolfgang Fleischmann; Julian Gough; Daniel H. Haft; Nicolas Hulo; Sarah Hunter; Daniel Kahn; Alexander Kanapin; Anish Kejariwal; Alberto Labarga; Petra S. Langendijk-Genevaux; David M. Lonsdale; Rodrigo Lopez; Ivica Letunic; John Maslen; Craig McAnulla

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .


Nucleic Acids Research | 2007

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Lesley H. Greene; Tony E. Lewis; Sarah Addou; Alison L. Cuff; Timothy Dallman; Mark Dibley; Oliver Redfern; Frances M. G. Pearl; Rekha Nambudiry; Adam J. Reid; Ian Sillitoe; Corin Yeats; Janet M. Thornton; Christine A. Orengo

We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt.


Proceedings of the National Academy of Sciences of the United States of America | 2007

The implications of alternative splicing in the ENCODE protein complement.

Michael L. Tress; Pier Luigi Martelli; Adam Frankish; Gabrielle A. Reeves; Jan Jaap Wesselink; Corin Yeats; Páll ĺsólfur Ólason; Mario Albrecht; Hedi Hegyi; Alejandro Giorgetti; Domenico Raimondo; Julien Lagarde; Roman A. Laskowski; Gonzalo López; Michael I. Sadowski; James D. Watson; Piero Fariselli; Ivan Rossi; Alinda Nagy; Wang Kai; Zenia M Størling; Massimiliano Orsini; Yassen Assenov; Hagen Blankenburg; Carola Huthmacher; Fidel Ramírez; Andreas Schlicker; P. D. Jones; Samuel Kerrien; Sandra Orchard

Alternative premessenger RNA splicing enables genes to generate more than one gene product. Splicing events that occur within protein coding regions have the potential to alter the biological function of the expressed protein and even to create new protein functions. Alternative splicing has been suggested as one explanation for the discrepancy between the number of human genes and functional complexity. Here, we carry out a detailed study of the alternatively spliced gene products annotated in the ENCODE pilot project. We find that alternative splicing in human genes is more frequent than has commonly been suggested, and we demonstrate that many of the potential alternative gene products will have markedly different structure and function from their constitutively spliced counterparts. For the vast majority of these alternative isoforms, little evidence exists to suggest they have a role as functional proteins, and it seems unlikely that the spectrum of conventional enzymatic or structural functions can be substantially extended through alternative splicing.


The Lancet | 2003

Sequencing and analysis of the genome of the Whipple's disease bacterium Tropheryma whipplei

Stephen D. Bentley; Matthias Maiwald; Lee Murphy; Mark J. Pallen; Corin Yeats; Lynn G. Dover; Halina Norbertczak; Gurdyal S. Besra; Michael A. Quail; David Harris; Axel von Herbay; Arlette Goble; Simon Rutter; R. Squares; Stephen Squares; Bart Barrell; Julian Parkhill; David A. Relman

BACKGROUND Whipples disease is a rare multisystem chronic infection, involving the intestinal tract as well as various other organs. The causative agent, Tropheryma whipplei, is a Gram-positive bacterium about which little is known. Our aim was to investigate the biology of this organism by generating and analysing the complete DNA sequence of its genome. METHODS We isolated and propagated T whipplei strain TW08/27 from the cerebrospinal fluid of a patient diagnosed with Whipples disease. We generated the complete sequence of the genome by the whole genome shotgun method, and analysed it with a combination of automatic and manual bioinformatic techniques. FINDINGS Sequencing revealed a condensed 925938 bp genome with a lack of key biosynthetic pathways and a reduced capacity for energy metabolism. A family of large surface proteins was identified, some associated with large amounts of non-coding repetitive DNA, and an unexpected degree of sequence variation. INTERPRETATION The genome reduction and lack of metabolic capabilities point to a host-restricted lifestyle for the organism. The sequence variation indicates both known and novel mechanisms for the elaboration and variation of surface structures, and suggests that immune evasion and host interaction play an important part in the lifestyle of this persistent bacterial pathogen.


Trends in Biochemical Sciences | 2002

The PASTA domain: a β-lactam-binding domain

Corin Yeats; Robert D. Finn; Alex Bateman

The PASTA domain (for penicillin-binding protein and serine/threonine kinase associated domain) is found in the high molecular weight penicillin-binding proteins and eukaryotic-like serine/threonine kinases of a range of pathogens. We describe this previously uncharacterized domain and infer that it binds β-lactam antibiotics and their peptidoglycan analogues. We postulate that PknB-like kinases are key regulators of cell-wall biosynthesis. The essential function of these enzymes suggests an additional pathway for the action of β-lactam antibiotics.


Nucleic Acids Research | 2012

New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures

Ian Sillitoe; Alison L. Cuff; Benoit H. Dessailly; Natalie L. Dawson; Nicholas Furnham; David A. Lee; Jonathan G. Lees; Tony E. Lewis; Romain A. Studer; Robert Rentzsch; Corin Yeats; Janet M. Thornton; Christine A. Orengo

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


BMC Microbiology | 2003

New Knowledge from Old: In silico discovery of novel protein domains in Streptomyces coelicolor

Corin Yeats; Stephen D. Bentley; Alex Bateman

BackgroundStreptomyces coelicolor has long been considered a remarkable bacterium with a complex life-cycle, ubiquitous environmental distribution, linear chromosomes and plasmids, and a huge range of pharmaceutically useful secondary metabolites. Completion of the genome sequence demonstrated that this diversity carried through to the genetic level, with over 7000 genes identified. We sought to expand our understanding of this organism at the molecular level through identification and annotation of novel protein domains. Protein domains are the evolutionary conserved units from which proteins are formed.ResultsTwo automated methods were employed to rapidly generate an optimised set of targets, which were subsequently analysed manually. A final set of 37 domains or structural repeats, represented 204 times in the genome, was developed. Using these families enabled us to correlate items of information from many different resources. Several immediately enhance our understanding both of S. coelicolor and also general bacterial molecular mechanisms, including cell wall biosynthesis regulation and streptomycete telomere maintenance.DiscussionDelineation of protein domain families enables detailed analysis of protein function, as well as identification of likely regions or residues of particular interest. Hence this kind of prior approach can increase the rate of discovery in the laboratory. Furthermore we demonstrate that using this type of in silico method it is possible to fairly rapidly generate new biological information from previously uncorrelated data.


Nucleic Acids Research | 2007

Gene3D: comprehensive structural and functional annotation of genomes

Corin Yeats; Jonathan G. Lees; Adam James Reid; Paul Kellam; Nigel J. Martin; Xinhui Liu; Christine A. Orengo

Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/

Collaboration


Dive into the Corin Yeats's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alex Bateman

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Sillitoe

University College London

View shared research outputs
Top Co-Authors

Avatar

Stephen D. Bentley

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

Adam J. Reid

Wellcome Trust Sanger Institute

View shared research outputs
Top Co-Authors

Avatar

David A. Lee

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alison L. Cuff

University College London

View shared research outputs
Researchain Logo
Decentralizing Knowledge