Daniel W. A. Buchan
University College London
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Daniel W. A. Buchan.
Nucleic Acids Research | 2013
Daniel W. A. Buchan; Federico Minneci; Tim Nugent; Kevin Bryson; David Jones
Here, we present the new UCL Bioinformatics Group’s PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.
Bioinformatics | 2012
David Jones; Daniel W. A. Buchan; Domenico Cozzetto; Massimiliano Pontil
MOTIVATION The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.
Nucleic Acids Research | 2010
Daniel W. A. Buchan; S. M. Ward; Anna E. Lobley; Timothy Nugent; Kevin Bryson; David Jones
The UCL Bioinformatics Group web portal offers several high quality protein structure prediction and function annotation algorithms including PSIPRED, pGenTHREADER, pDomTHREADER, MEMSAT, MetSite, DISOPRED2, DomPred and FFPred for the prediction of secondary structure, protein fold, protein structural domain, transmembrane helix topology, metal binding sites, regions of protein disorder, protein domain boundaries and protein function, respectively. We also now offer a fully automated 3D modelling pipeline: BioSerf, which performed well in CASP8 and uses a fragment-assembly approach which placed it in the top five servers in the de novo modelling category. The servers are available via the group web site at http://bioinf.cs.ucl.ac.uk/.
Proteomics | 2002
Christine A. Orengo; James E. Bray; Daniel W. A. Buchan; Andrew P. Harrison; David A. Lee; Frances M. G. Pearl; Ian Sillitoe; Annabel E. Todd; Janet M. Thornton
Over the last decade, there have been huge increases in the numbers of protein sequences and structures determined. In parallel, many methods have been developed for recognising similarities between these proteins, arising from their common evolutionary background, and for clustering such relatives into protein families. Here we review some of the protein family resources available to the biologist and describe how these can be used to provide structural and functional annotations for newly determined sequences. In particular we describe recent developments to the CATH domain database of protein structural families which have facilitated genome annotation and which have also revealed important caveats that must be considered when transferring functional data between homologous proteins.
Nucleic Acids Research | 2001
Frances M. G. Pearl; Nigel J. Martin; James E. Bray; Daniel W. A. Buchan; Andrew P. Harrison; David A. Lee; Gabrielle A. Reeves; Adrian J. Shepherd; Ian Sillitoe; Annabel E. Todd; Janet M. Thornton; Christine A. Orengo
In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25,320 structural domains and a further 160,000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
Protein Science | 2009
Frances M. G. Pearl; David A. Lee; James E. Bray; Daniel W. A. Buchan; Adrian J. Shepherd; Christine A. Orengo
An automatic sequence search and analysis protocol (DomainFinder) based on PSI‐BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI‐BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH‐PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut‐offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH‐PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI‐BLAST profiles have been generated for each of the sequence families in the expanded CATH‐PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.
Nucleic Acids Research | 2012
Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Alison L. Cuff; Jose M. Dana; Ioannis Filippis; Julian Gough; Sarah Hunter; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Alex L. Mitchell; Alexey G. Murzin; Bernardo Ochoa-Montaño; Owen J. L. Rackham; James C. Smith; Michael J. E. Sternberg; Sameer Velankar; Corin Yeats; Christine A. Orengo
Genome3D, available at http://www.genome3d.eu, is a new collaborative project that integrates UK-based structural resources to provide a unique perspective on sequence–structure–function relationships. Leading structure prediction resources (DomSerf, FUGUE, Gene3D, pDomTHREADER, Phyre and SUPERFAMILY) provide annotations for UniProt sequences to indicate the locations of structural domains (structural annotations) and their 3D structures (structural models). Structural annotations and 3D model predictions are currently available for three model genomes (Homo sapiens, E. coli and baker’s yeast), and the project will extend to other genomes in the near future. As these resources exploit different strategies for predicting structures, the main aim of Genome3D is to enable comparisons between all the resources so that biologists can see where predictions agree and are therefore more trusted. Furthermore, as these methods differ in whether they build their predictions using CATH or SCOP, Genome3D also contains the first official mapping between these two databases. This has identified pairs of similar superfamilies from the two resources at various degrees of consensus (532 bronze pairs, 527 silver pairs and 370 gold pairs).
Current Opinion in Structural Biology | 2003
David A. Lee; Alastair Grant; Daniel W. A. Buchan; Christine A. Orengo
Protein translations of over 100 complete genomes are now available. About half of these sequences can be provided with structural annotation, thereby enabling some profound insights into protein and pathway evolution. Whereas the major domain structure families are common to all kingdoms of life, these are combined in different ways in multidomain proteins to give various domain architectures that are specific to kingdoms or individual genomes, and contribute to the diverse phenotypes observed. These data argue for more targets in structural genomics initiatives and particularly for the selection of different domain architectures to gain better insights into protein functions.
Nucleic Acids Research | 2015
Tony E. Lewis; Ian Sillitoe; Antonina Andreeva; Tom L. Blundell; Daniel W. A. Buchan; Cyrus Chothia; Domenico Cozzetto; Jose M. Dana; Ioannis Filippis; Julian Gough; David Jones; Lawrence A. Kelley; Gerard J. Kleywegt; Federico Minneci; Jaina Mistry; Alexey G. Murzin; Bernardo Ochoa-Montaño; Matt E. Oates; Marco Punta; Owen J. L. Rackham; Jonathan Stahlhacke; Michael J. E. Sternberg; Sameer Velankar; Christine A. Orengo
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing Genome3D in a previous NAR paper, we have substantially extended and improved the resource. We have annotated representatives from Pfam families to improve coverage of diverse sequences and added a fast sequence search to the website to allow users to find Genome3D-annotated sequences similar to their own. We have improved and extended the Genome3D data, enlarging the source data set from three model organisms to 10, and adding VIVACE, a resource new to Genome3D. We have analysed and updated Genome3Ds SCOP/CATH mapping. Finally, we have improved the superposition tools, which now give users a more powerful interface for investigating similarities and differences between structural models.
Bioinformatics | 2017
Daniel W. A. Buchan; David Jones
Motivation: Protein fold recognition when appropriate, evolutionarily‐related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem (Moult et al., 2014). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra‐residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al‐Eigen approach of Di Lena et al. (2010), but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. Results: EigenTHREADER outperforms well‐established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template‐based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. Availability and implementation: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts. EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/. Contact: [email protected]