Jonathan G. Lees
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jonathan G. Lees.
Bioinformatics | 2006
Jonathan G. Lees; Andrew J. Miles; Frank Wien; B. A. Wallace
MOTIVATION Circular Dichroism (CD) spectroscopy is a long-established technique for studying protein secondary structures in solution. Empirical analyses of CD data rely on the availability of reference datasets comprised of far-UV CD spectra of proteins whose crystal structures have been determined. This article reports on the creation of a new reference dataset which effectively covers both secondary structure and fold space, and uses the higher information content available in synchrotron radiation circular dichroism (SRCD) spectra to more accurately predict secondary structure than has been possible with existing reference datasets. It also examines the effects of wavelength range, structural redundancy and different means of categorizing secondary structures on the accuracy of the analyses. In addition, it describes a novel use of hierarchical cluster analyses to identify protein relatedness based on spectral properties alone. The databases are shown to be applicable in both conventional CD and SRCD spectroscopic analyses of proteins. Hence, by combining new bioinformatics and biophysical methods, a database has been produced that should have wide applicability as a tool for structural molecular biology.
Nucleic Acids Research | 2015
Ian Sillitoe; Tony E. Lewis; Alison L. Cuff; Sayoni Das; Paul Ashford; Natalie L. Dawson; Nicholas Furnham; Roman A. Laskowski; David A. Lee; Jonathan G. Lees; Sonja Lehtinen; Romain A. Studer; Janet M. Thornton; Christine A. Orengo
The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235 000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our ‘current’ putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.
Nucleic Acids Research | 2012
Ian Sillitoe; Alison L. Cuff; Benoit H. Dessailly; Natalie L. Dawson; Nicholas Furnham; David A. Lee; Jonathan G. Lees; Tony E. Lewis; Romain A. Studer; Robert Rentzsch; Corin Yeats; Janet M. Thornton; Christine A. Orengo
CATH version 3.5 (Class, Architecture, Topology, Homology, available at http://www.cathdb.info/) contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups. When focusing on structural genomics (SG) structures, we observe that the number of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that we may now know the majority of folds that are easily accessible to structure determination. We have improved the accuracy of our functional family (FunFams) sub-classification method and the CATH sequence domain search facility has been extended to provide FunFam annotations for each domain. The CATH website has been redesigned. We have improved the display of functional data and of conserved sequence features associated with FunFams within each CATH superfamily.
Protein Science | 2003
B. A. Wallace; Jonathan G. Lees; A.J.W. Orry; A. Lobley; Robert W. Janes
Circular dichroism (CD) spectroscopy is a valuable technique for the determination of protein secondary structures. Many linear and nonlinear algorithms have been developed for the empirical analysis of CD data, using reference databases derived from proteins of known structures. To date, the reference databases used by the various algorithms have all been derived from the spectra of soluble proteins. When applied to the analysis of soluble protein spectra, these methods generally produce calculated secondary structures that correspond well with crystallographic structures. In this study, however, it was shown that when applied to membrane protein spectra, the resulting calculations produce considerably poorer results. One source of this discrepancy may be the altered spectral peak positions (wavelength shifts) of membrane proteins due to the different dielectric of the membrane environment relative to that of water. These results have important consequences for studies that seek to use the existing soluble protein reference databases for the analyses of membrane proteins.
Spectroscopy | 2003
Andrew J. Miles; Frank Wien; Jonathan G. Lees; A. Rodger; Robert W. Janes; B. A. Wallace
Synchrotron radiation circular dichroism (SRCD) is an emerging technique in structural biology with particular value in protein secondary structure analyses since it permits the collection of data down to much lower wavelengths than conventional circular dichroism (cCD) instruments. Reference database spectra collected on different SRCD instruments in the future as well as current reference datasets derived from cCD spectra must be compatible. Therefore there is a need for standardization of calibration methods to ensure quality control. In this study, magnitude and optical rotation measurements on four cCD and three SRCD instruments were compared at 192.5, 219, 290 and 490 nm. At high wavelengths, all gave comparable results, however, at the lower wavelengths, some variations were observable. The consequences of these differences on the spectrum, and the calculated secondary structure, of a representative protein (myoglobin) are demonstrated. A method is proposed for standardising spectra obtained on any CD instrument, conventional or synchrotron-based, with respect to existing and future databases.
Nucleic Acids Research | 2007
Corin Yeats; Jonathan G. Lees; Adam James Reid; Paul Kellam; Nigel J. Martin; Xinhui Liu; Christine A. Orengo
Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/
Nucleic Acids Research | 2012
Jonathan G. Lees; Corin Yeats; James R. Perkins; Ian Sillitoe; Robert Rentzsch; Benoit H. Dessailly; Christine A. Orengo
Gene3D http://gene3d.biochem.ucl.ac.uk is a comprehensive database of protein domain assignments for sequences from the major sequence databases. Domains are directly mapped from structures in the CATH database or predicted using a library of representative profile HMMs derived from CATH superfamilies. As previously described, Gene3D integrates many other protein family and function databases. These facilitate complex associations of molecular function, structure and evolution. Gene3D now includes a domain functional family (FunFam) level below the homologous superfamily level assignments. Additions have also been made to the interaction data. More significantly, to help with the visualization and interpretation of multi-genome scale data sets, we have developed a new, revamped website. Searching has been simplified with more sophisticated filtering of results, along with new tools based on Cytoscape Web, for visualizing protein–protein interaction networks, differences in domain composition between genomes and the taxonomic distribution of individual superfamilies.
Nucleic Acids Research | 2010
Jonathan G. Lees; Corin Yeats; Oliver Redfern; Andrew B. Clegg; Christine A. Orengo
Over the last 2 years the Gene3D resource has been significantly improved, and is now more accurate and with a much richer interactive display via the Gene3D website (http://gene3d.biochem.ucl.ac.uk/). Gene3D provides accurate structural domain family assignments for over 1100 genomes and nearly 10 000 000 proteins. A hidden Markov model library, constructed from the manually curated CATH structural domain hierarchy, is used to search UniProt, RefSeq and Ensembl protein sequences. The resulting matches are refined into simple multi-domain architectures using a recently developed in-house algorithm, DomainFinder 3 (available at: ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/DomainFinder3/). The domain assignments are integrated with multiple external protein function descriptions (e.g. Gene Ontology and KEGG), structural annotations (e.g. coiled coils, disordered regions and sequence polymorphisms) and family resources (e.g. Pfam and eggNog) and displayed on the Gene3D website. The website allows users to view descriptions for both single proteins and genes and large protein sets, such as superfamilies or genomes. Subsets can then be selected for detailed investigation or associated functions and interactions can be used to expand explorations to new proteins. Gene3D also provides a set of services, including an interactive genome coverage graph visualizer, DAS annotation resources, sequence search facilities and SOAP services.
Pain | 2016
Andreas C. Themistocleous; Juan D. Ramirez; Pallai Shillo; Jonathan G. Lees; Dinesh Selvarajah; Christine A. Orengo; Solomon Tesfaye; Andrew S.C. Rice; David L. H. Bennett
Abstract Disabling neuropathic pain (NeuP) is a common sequel of diabetic peripheral neuropathy (DPN). We aimed to characterise the sensory phenotype of patients with and without NeuP, assess screening tools for NeuP, and relate DPN severity to NeuP. The Pain in Neuropathy Study (PiNS) is an observational cross-sectional multicentre study. A total of 191 patients with DPN underwent neurological examination, quantitative sensory testing, nerve conduction studies, and skin biopsy for intraepidermal nerve fibre density assessment. A set of questionnaires assessed the presence of pain, pain intensity, pain distribution, and the psychological and functional impact of pain. Patients were divided according to the presence of DPN, and thereafter according to the presence and severity of NeuP. The DN4 questionnaire demonstrated excellent sensitivity (88%) and specificity (93%) in screening for NeuP. There was a positive correlation between greater neuropathy severity (r = 0.39, P < 0.01), higher HbA1c (r = 0.21, P < 0.01), and the presence (and severity) of NeuP. Diabetic peripheral neuropathy sensory phenotype is characterised by hyposensitivity to applied stimuli that was more marked in the moderate/severe NeuP group than in the mild NeuP or no NeuP groups. Brush-evoked allodynia was present in only those with NeuP (15%); the paradoxical heat sensation did not discriminate between those with (40%) and without (41.3%) NeuP. The “irritable nociceptor” subgroup could only be applied to a minority of patients (6.3%) with NeuP. This study provides a firm basis to rationalise further phenotyping of painful DPN, for instance, stratification of patients with DPN for analgesic drug trials.
Nucleic Acids Research | 2014
Jonathan G. Lees; David A. Lee; Romain A. Studer; Natalie L. Dawson; Ian Sillitoe; Sayoni Das; Corin Yeats; Benoit H. Dessailly; Robert Rentzsch; Christine A. Orengo
Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.