Alison L. Cuff
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alison L. Cuff.
Nucleic Acids Research | 2007
Lesley H. Greene; Tony E. Lewis; Sarah Addou; Alison L. Cuff; Timothy Dallman; Mark Dibley; Oliver Redfern; Frances M. G. Pearl; Rekha Nambudiry; Adam J. Reid; Ian Sillitoe; Corin Yeats; Janet M. Thornton; Christine A. Orengo
We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt.
Nucleic Acids Research | 2015
Ian Sillitoe; Tony E. Lewis; Alison L. Cuff; Sayoni Das; Paul Ashford; Natalie L. Dawson; Nicholas Furnham; Roman A. Laskowski; David A. Lee; Jonathan G. Lees; Sonja Lehtinen; Romain A. Studer; Janet M. Thornton; Christine A. Orengo
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235 000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our ‘current’ putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Nucleic Acids Research | 2009
Alison L. Cuff; Ian Sillitoe; Tony E. Lewis; Oliver Redfern; Richard C. Garratt; Janet M. Thornton; Christine A. Orengo
The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 1 14 215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20 330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28 064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a ‘Protein Chart’ and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, ∼4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.
Nucleic Acids Research | 2012
Ian Sillitoe; Alison L. Cuff; Benoit H. Dessailly; Natalie L. Dawson; Nicholas Furnham; David A. Lee; Jonathan G. Lees; Tony E. Lewis; Romain A. Studer; Robert Rentzsch; Corin Yeats; Janet M. Thornton; Christine A. Orengo
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Nucleic Acids Research | 2011
Alison L. Cuff; Ian Sillitoe; Tony E. Lewis; Andrew B. Clegg; Robert Rentzsch; Nicholas Furnham; Marialuisa Pellegrini-Calace; David Jones; Janet M. Thornton; Christine A. Orengo
CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework.
PLOS Computational Biology | 2012
Nicholas Furnham; Ian Sillitoe; Gemma L. Holliday; Alison L. Cuff; Roman A. Laskowski; Christine A. Orengo; Janet M. Thornton
In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.
Nucleic Acids Research | 2012
Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Alison L. Cuff; Jose M. Dana; Ioannis Filippis; Julian Gough; Sarah Hunter; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Alex L. Mitchell; Alexey G. Murzin; Bernardo Ochoa-Montaño; Owen J. L. Rackham; James C. Smith; Michael J. E. Sternberg; Sameer Velankar; Corin Yeats; Christine A. Orengo
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Structure | 2009
Alison L. Cuff; Oliver Redfern; Lesley H. Greene; Ian Sillitoe; Tony E. Lewis; Mark Dibley; Adam J. Reid; Frances M. G. Pearl; Tim Dallman; Annabel E. Todd; Richard C. Garratt; Janet M. Thornton; Christine A. Orengo
Summary This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., αβ-motifs, α-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.
Current Opinion in Structural Biology | 2009
Benoit H. Dessailly; Oliver Redfern; Alison L. Cuff; Christine A. Orengo
The ability to assign function to proteins has become a major bottleneck for comprehensively understanding cellular mechanisms at the molecular level. Here we discuss the extent to which structural domain classifications can help in deciphering the complex relationship between the functions of proteins and their sequences and structures. Structural classifications are particularly helpful in understanding the mosaic manner in which new proteins and functions emerge through evolution. This is partly because they provide reliable and concrete domain definitions and enable the detection of very remote structural similarities and homologies. It is also because structural data can illuminate more clearly the mechanisms by which a broader functional repertoire can emerge during evolution.
Nucleic Acids Research | 2012
Nicholas Furnham; Ian Sillitoe; Gemma L. Holliday; Alison L. Cuff; Syed Asad Rahman; Roman A. Laskowski; Christine A. Orengo; Janet M. Thornton
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzymes sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ∼1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.