Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Tony E. Lewis is active.

Publication


Featured researches published by Tony E. Lewis.


Nucleic Acids Research | 2007

The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution

Lesley H. Greene; Tony E. Lewis; Sarah Addou; Alison L. Cuff; Timothy Dallman; Mark Dibley; Oliver Redfern; Frances M. G. Pearl; Rekha Nambudiry; Adam J. Reid; Ian Sillitoe; Corin Yeats; Janet M. Thornton; Christine A. Orengo

We report the latest release (version 3.0) of the CATH protein domain database (). There has been a 20% increase in the number of structural domains classified in CATH, up to 86 151 domains. Release 3.0 comprises 1110 fold groups and 2147 homologous superfamilies. To cope with the increases in diverse structural homologues being determined by the structural genomics initiatives, more sensitive methods have been developed for identifying boundaries in multi-domain proteins and for recognising homologues. The CATH classification update is now being driven by an integrated pipeline that links these automated procedures with validation steps, that have been made easier by the provision of information rich web pages summarising comparison scores and relevant links to external sites for each domain being classified. An analysis of the population of domains in the CATH hierarchy and several domain characteristics are presented for version 3.0. We also report an update of the CATH Dictionary of homologous structures (CATH-DHS) which now contains multiple structural alignments, consensus information and functional annotations for 1459 well populated superfamilies in CATH. CATH is directly linked to the Gene3D database which is a projection of CATH structural data onto ∼2 million sequences in completed genomes and UniProt.


Nucleic Acids Research | 2015

CATH: comprehensive structural and functional annotations for genome sequences

Ian Sillitoe; Tony E. Lewis; Alison L. Cuff; Sayoni Das; Paul Ashford; Natalie L. Dawson; Nicholas Furnham; Roman A. Laskowski; David A. Lee; Jonathan G. Lees; Sonja Lehtinen; Romain A. Studer; Janet M. Thornton; Christine A. Orengo

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235 000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our ‘current’ putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.


Nucleic Acids Research | 2004

The CATH Domain Structure Database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis

Frances M. G. Pearl; Annabel E. Todd; Ian Sillitoe; Mark Dibley; Oliver Redfern; Tony E. Lewis; Christopher G. Bennett; Russell L. Marsden; Alastair Grant; David A. Lee; Adrian Akpor; Michael Maibaum; Andrew P. Harrison; Timothy Dallman; Gabrielle A. Reeves; Ilhem Diboun; Sarah Addou; Stefano Lise; Caroline E. Johnston; Antonio Sillero; Janet M. Thornton; Christine A. Orengo

The CATH database of protein domain structures (http://www.biochem.ucl.ac.uk/bsm/cath/) currently contains 43 229 domains classified into 1467 superfamilies and 5107 sequence families. Each structural family is expanded with sequence relatives from GenBank and completed genomes, using a variety of efficient sequence search protocols and reliable thresholds. This extended CATH protein family database contains 616 470 domain sequences classified into 23 876 sequence families. This results in the significant expansion of the CATH HMM model library to include models built from the CATH sequence relatives, giving a 10% increase in coverage for detecting remote homologues. An improved Dictionary of Homologous superfamilies (DHS) (http://www.biochem.ucl.ac.uk/bsm/dhs/) containing specific sequence, structural and functional information for each superfamily in CATH considerably assists manual validation of homologues. Information on sequence relatives in CATH superfamilies, GenBank and completed genomes is presented in the CATH associated DHS and Gene3D resources. Domain partnership information can be obtained from Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/). A new CATH server has been implemented (http://www.biochem.ucl.ac.uk/cgi-bin/cath/CathServer.pl) providing automatic classification of newly determined sequences and structures using a suite of rapid sequence and structure comparison methods. The statistical significance of matches is assessed and links are provided to the putative superfamily or fold group to which the query sequence or structure is assigned.


Nucleic Acids Research | 2009

The CATH classification revisited—architectures reviewed and new ways to characterize structural divergence in superfamilies

Alison L. Cuff; Ian Sillitoe; Tony E. Lewis; Oliver Redfern; Richard C. Garratt; Janet M. Thornton; Christine A. Orengo

The latest version of CATH (class, architecture, topology, homology) (version 3.2), released in July 2008 (http://www.cathdb.info), contains 1 14 215 domains, 2178 Homologous superfamilies and 1110 fold groups. We have assigned 20 330 new domains, 87 new homologous superfamilies and 26 new folds since CATH release version 3.1. A total of 28 064 new domains have been assigned since our NAR 2007 database publication (CATH version 3.0). The CATH website has been completely redesigned and includes more comprehensive documentation. We have revisited the CATH architecture level as part of the development of a ‘Protein Chart’ and present information on the population of each architecture. The CATHEDRAL structure comparison algorithm has been improved and used to characterize structural diversity in CATH superfamilies and structural overlaps between superfamilies. Although the majority of superfamilies in CATH are not structurally diverse and do not overlap significantly with other superfamilies, ∼4% of superfamilies are very diverse and these are the superfamilies that are most highly populated in both the PDB and in the genomes. Information on the degree of structural diversity in each superfamily and structural overlaps between superfamilies can now be downloaded from the CATH website.


Nucleic Acids Research | 2012

New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures

Ian Sillitoe; Alison L. Cuff; Benoit H. Dessailly; Natalie L. Dawson; Nicholas Furnham; David A. Lee; Jonathan G. Lees; Tony E. Lewis; Romain A. Studer; Robert Rentzsch; Corin Yeats; Janet M. Thornton; Christine A. Orengo

CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.


Nucleic Acids Research | 2011

Extending CATH: increasing coverage of the protein structure universe and linking structure with function

Alison L. Cuff; Ian Sillitoe; Tony E. Lewis; Andrew B. Clegg; Robert Rentzsch; Nicholas Furnham; Marialuisa Pellegrini-Calace; David Jones; Janet M. Thornton; Christine A. Orengo

CATH version 3.3 (class, architecture, topology, homology) contains 128 688 domains, 2386 homologous superfamilies and 1233 fold groups, and reflects a major focus on classifying structural genomics (SG) structures and transmembrane proteins, both of which are likely to add structural novelty to the database and therefore increase the coverage of protein fold space within CATH. For CATH version 3.4 we have significantly improved the presentation of sequence information and associated functional information for CATH superfamilies. The CATH superfamily pages now reflect both the functional and structural diversity within the superfamily and include structural alignments of close and distant relatives within the superfamily, annotated with functional information and details of conserved residues. A significantly more efficient search function for CATH has been established by implementing the search server Solr (http://lucene.apache.org/solr/). The CATH v3.4 webpages have been built using the Catalyst web framework.


Nucleic Acids Research | 2012

Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains.

Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Alison L. Cuff; Jose M. Dana; Ioannis Filippis; Julian Gough; Sarah Hunter; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Alex L. Mitchell; Alexey G. Murzin; Bernardo Ochoa-Montaño; Owen J. L. Rackham; James C. Smith; Michael J. E. Sternberg; Sameer Velankar; Corin Yeats; Christine A. Orengo

Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).


Structure | 2009

The CATH Hierarchy Revisited—Structural Divergence in Domain Superfamilies and the Continuity of Fold Space

Alison L. Cuff; Oliver Redfern; Lesley H. Greene; Ian Sillitoe; Tony E. Lewis; Mark Dibley; Adam J. Reid; Frances M. G. Pearl; Tim Dallman; Annabel E. Todd; Richard C. Garratt; Janet M. Thornton; Christine A. Orengo

Summary This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., αβ-motifs, α-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.


Nucleic Acids Research | 2017

CATH: an expanded resource to predict protein function through structure and sequence

Natalie L. Dawson; Tony E. Lewis; Sayoni Das; Jonathan G. Lees; David A. Lee; Paul Ashford; Christine A. Orengo; Ian Sillitoe

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.


Nucleic Acids Research | 2015

Genome3D: exploiting structure to help users understand their sequences

Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Domenico Cozzetto; Jose M. Dana; Ioannis Filippis; Julian Gough; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Jaina Mistry; Alexey G. Murzin; Bernardo Ochoa-Montaño; Matt E. Oates; Marco Punta; Owen J. L. Rackham; Jonathan Stahlhacke; Michael J. E. Sternberg; Sameer Velankar; Christine A. Orengo

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3Ds SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.

Collaboration


Dive into the Tony E. Lewis's collaboration.

Top Co-Authors

Avatar

Ian Sillitoe

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alison L. Cuff

University College London

View shared research outputs
Top Co-Authors

Avatar

Janet M. Thornton

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar

David A. Lee

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar

David Jones

University College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oliver Redfern

University College London

View shared research outputs
Top Co-Authors

Avatar

Alexey G. Murzin

Laboratory of Molecular Biology

View shared research outputs
Researchain Logo
Decentralizing Knowledge