Natalie L. Dawson
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Natalie L. Dawson.
Nucleic Acids Research | 2015
Ian Sillitoe; Tony E. Lewis; Alison L. Cuff; Sayoni Das; Paul Ashford; Natalie L. Dawson; Nicholas Furnham; Roman A. Laskowski; David A. Lee; Jonathan G. Lees; Sonja Lehtinen; Romain A. Studer; Janet M. Thornton; Christine A. Orengo
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235 000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our ‘current’ putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Nucleic Acids Research | 2012
Ian Sillitoe; Alison L. Cuff; Benoit H. Dessailly; Natalie L. Dawson; Nicholas Furnham; David A. Lee; Jonathan G. Lees; Tony E. Lewis; Romain A. Studer; Robert Rentzsch; Corin Yeats; Janet M. Thornton; Christine A. Orengo
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Nucleic Acids Research | 2014
Jonathan G. Lees; David A. Lee; Romain A. Studer; Natalie L. Dawson; Ian Sillitoe; Sayoni Das; Corin Yeats; Benoit H. Dessailly; Robert Rentzsch; Christine A. Orengo
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.
Journal of Molecular Biology | 2016
Nicholas Furnham; Natalie L. Dawson; Syed Asad Rahman; Janet M. Thornton; Christine A. Orengo
Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes.
Nucleic Acids Research | 2017
Natalie L. Dawson; Tony E. Lewis; Sayoni Das; Jonathan G. Lees; David A. Lee; Paul Ashford; Christine A. Orengo; Ian Sillitoe
The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.
Nucleic Acids Research | 2016
Su Datt Lam; Natalie L. Dawson; Sayoni Das; Ian Sillitoe; Paul Ashford; David A. Lee; Sonja Lehtinen; Christine A. Orengo; Jonathan G. Lees
Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼20 000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website.
Biochimica et Biophysica Acta | 2013
Benoit H. Dessailly; Natalie L. Dawson; Kenji Mizuguchi; Christine A. Orengo
We present, to our knowledge, the first quantitative analysis of functional site diversity in homologous domain superfamilies. Different types of functional sites are considered separately. Our results show that most diverse superfamilies are very plastic in terms of the spatial location of their functional sites. This is especially true for protein–protein interfaces. In contrast, we confirm that catalytic sites typically occupy only a very small number of topological locations. Small-ligand binding sites are more diverse than expected, although in a more limited manner than protein–protein interfaces. In spite of the observed diversity, our results also confirm the previously reported preferential location of functional sites. We identify a subset of homologous domain superfamilies where diversity is particularly extreme, and discuss possible reasons for such plasticity, i.e. structural diversity. Our results do not contradict previous reports of preferential co-location of sites among homologues, but rather point at the importance of not ignoring other sites, especially in large and diverse superfamilies. Data on sites exploited by different relatives, within each well annotated domain superfamily, has been made accessible from the CATH website in order to highlight versatile superfamilies or superfamilies with highly preferential sites. This information is valuable for system biology and knowledge of any constraints on protein interactions could help in understanding the dynamic control of networks in which these proteins participate. The novelty of our work lies in the comprehensive nature of the analysis – we have used a significantly larger dataset than previous studies – and the fact that in many superfamilies we show that different parts of the domain surface are exploited by different relatives for ligand/protein interactions, particularly in superfamilies which are diverse in sequence and structure, an observation not previously reported on such a large scale. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Nucleic Acids Research | 2015
Sayoni Das; Ian Sillitoe; David A. Lee; Jonathan G. Lees; Natalie L. Dawson; John M. Ward; Christine A. Orengo
The widening function annotation gap in protein databases and the increasing number and diversity of the proteins being sequenced presents new challenges to protein function prediction methods. Multidomain proteins complicate the protein sequence–structure–function relationship further as new combinations of domains can expand the functional repertoire, creating new proteins and functions. Here, we present the FunFHMMer web server, which provides Gene Ontology (GO) annotations for query protein sequences based on the functional classification of the domain-based CATH-Gene3D resource. Our server also provides valuable information for the prediction of functional sites. The predictive power of FunFHMMer has been validated on a set of 95 proteins where FunFHMMer performs better than BLAST, Pfam and CDD. Recent validation by an independent international competition ranks FunFHMMer as one of the top function prediction methods in predicting GO annotations for both the Biological Process and Molecular Function Ontology. The FunFHMMer web server is available at http://www.cathdb.info/search/by_funfhmmer.
Current Opinion in Genetics & Development | 2015
Sayoni Das; Natalie L. Dawson; Christine A. Orengo
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a proteins biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Molecular Pharmaceutics | 2013
Katharina Welser; Frederick Campbell; Laila Kudsiova; Atefeh Mohammadi; Natalie L. Dawson; Stephen L. Hart; David Barlow; Helen C. Hailes; M. Jayne Lawrence; Alethea B. Tabor
Cationic peptide sequences, whether linear, branched, or dendritic, are widely used to condense and protect DNA in both polyplex and lipopolyplex gene delivery vectors. How these peptides behave within these particles and the consequences this has on transfection efficiency remain poorly understood. We have compared, in parallel, a complete series of cationic peptides, both branched and linear, coformulated with plasmid DNA to give polyplexes, or with plasmid DNA and the cationic lipid, DOTMA, mixed with 50% of the neutral helper lipid, DOPE, to give lipopolyplexes, and correlated the transfection efficiencies of these complexes to their biophysical properties. Lipopolyplexes formulated from branched Arg-rich peptides, or linear Lys-rich peptides, show the best transfection efficiencies in an alveolar epithelial cell line, with His-rich peptides being relatively ineffective. The majority of the biophysical studies (circular dichroism, dynamic light scattering, zeta potential, small angle neutron scattering, and gel band shift assay) indicated that all of the formulations were similar in size, surface charge, and lipid bilayer structure, and longer cationic sequences, in general, gave better transfection efficiencies. Whereas lipopolyplexes formulated from branched Arg-containing peptides were more effective than those formulated from linear Arg-containing sequences, the reverse was true for Lys-containing sequences, which may be related to differences in DNA condensation between Arg-rich and Lys-rich peptides observed in the CD studies.