David R. Westhead | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David R. Westhead is active.

Explore More

Publication

Featured researches published by David R. Westhead.

Bioinformatics | 2005

Improved prediction of protein--protein binding sites using a support vector machines approach

James R. Bradford; David R. Westhead

MOTIVATION Structural genomics projects are beginning to produce protein structures with unknown function, therefore, accurate, automated predictors of protein function are required if all these structures are to be properly annotated in reasonable time. Identifying the interface between two interacting proteins provides important clues to the function of a protein and can reduce the search space required by docking algorithms to predict the structures of complexes. RESULTS We have combined a support vector machine (SVM) approach with surface patch analysis to predict protein-protein binding sites. Using a leave-one-out cross-validation procedure, we were able to successfully predict the location of the binding site on 76% of our dataset made up of proteins with both transient and obligate interfaces. With heterogeneous cross-validation, where we trained the SVM on transient complexes to predict on obligate complexes (and vice versa), we still achieved comparable success rates to the leave-one-out cross-validation suggesting that sufficient properties are shared between transient and obligate interfaces. AVAILABILITY A web application based on the method can be found at http://www.bioinformatics.leeds.ac.uk/ppi_pred. The dataset of 180 proteins used in this study is also available via the same web site. CONTACT [email protected] SUPPLEMENTARY INFORMATION http://www.bioinformatics.leeds.ac.uk/ppi-pred/supp-material.

Proteins | 1998

Flexible docking using Tabu search and an empirical estimate of binding affinity.

Carol A. Baxter; Christopher W. Murray; David E. Clark; David R. Westhead; Matthew D. Eldridge

This article describes the implementation of a new docking approach. The method uses a Tabu search methodology to dock flexibly ligand molecules into rigid receptor structures. It uses an empirical objective function with a small number of physically based terms derived from fitting experimental binding affinities for crystallographic complexes. This means that docking energies produced by the searching algorithm provide direct estimates of the binding affinities of the ligands. The method has been tested on 50 ligand‐receptor complexes for which the experimental binding affinity and binding geometry are known. All water molecules are removed from the structures and ligand molecules are minimized in vacuo before docking. The lowest energy geometry produced by the docking protocol is within 1.5 Å root‐mean square of the experimental binding mode for 86% of the complexes. The lowest energies produced by the docking are in fair agreement with the known free energies of binding for the ligands. Proteins 33:367–382, 1998.

PLOS Computational Biology | 2007

A primer on learning in Bayesian networks for computational biology

Chris J. Needham; James R. Bradford; Andrew J. Bulpitt; David R. Westhead

Bayesian networks (BNs) provide a neat and compact representation for expressing joint probability distributions (JPDs) and for inference. They are becoming increasingly important in the biological sciences for the tasks of inferring cellular networks [1], modelling protein signalling pathways [2], systems biology, data integration [3], classification [4], and genetic data analysis [5]. The representation and use of probability theory makes BNs suitable for combining domain knowledge and data, expressing causal relationships, avoiding overfitting a model to training data, and learning from incomplete datasets. The probabilistic formalism provides a natural treatment for the stochastic nature of biological systems and measurements. This primer aims to introduce BNs to the computational biologist, focusing on the concepts behind methods for learning the parameters and structure of models, at a time when they are becoming the machine learning method of choice. There are many applications in biology where we wish to classify data; for example, gene function prediction. To solve such problems, a set of rules are required that can be used for prediction, but often such knowledge is unavailable, or in practice there turn out to be many exceptions to the rules or so many rules that this approach produces poor results. Machine learning approaches often produce better results, where a large number of examples (the training set) is used to adapt the parameters of a model that can then be used for performing predictions or classifications on data. There are many different types of models that may be required and many different approaches to training the models, each with its pros and cons. An excellent overview of the topic can be found in [6] and [7]. Neural networks, for example, are often able to learn a model from training data, but it is often difficult to extract information about the model, which with other methods can provide valuable insights into the data or problem being solved. A common problem in machine learning is overfitting, where the learned model is too complex and generalises poorly to unseen data. Increasing the size of the training dataset may reduce this; however, this assumes more training data is readily available, which is often not the case. In addition, often it is important to determine the uncertainty in the learned model parameters or even in the choice of model. This primer focuses on the use of BNs, which offer a solution to these issues. The use of Bayesian probability theory provides mechanisms for describing uncertainty and for adapting the number of parameters to the size of the data. Using a graphical representation provides a simple way to visualise the structure of a model. Inspection of models can provide valuable insights into the properties of the data and allow new models to be produced.

Current Opinion in Structural Biology | 2003

Ligand binding: functional site location, similarity and docking

Stephen J Campbell; Nicola D. Gold; Richard M. Jackson; David R. Westhead

Computational methods for the detection and characterisation of protein ligand-binding sites have increasingly become an area of interest now that large amounts of protein structural information are becoming available prior to any knowledge of protein function. There have been particularly interesting recent developments in the following areas: first, functional site detection, whereby protein evolutionary information has been used to locate binding sites on the protein surface; second, functional site similarity, whereby structural similarity and three-dimensional templates can be used to compare and classify and potentially locate new binding sites; and third, ligand docking, which is being used to find and validate functional sites, in addition to having more conventional uses in small-molecule lead discovery.

Nucleic Acids Research | 2006

Arabidopsis Co-expression Tool (ACT): web server tools for microarray-based gene expression analysis.

Iain W. Manfield; Chih-Hung Jen; John W. Pinney; Ioannis Michalopoulos; James R. Bradford; Philip M. Gilmartin; David R. Westhead

The Arabidopsis Co-expression Tool, ACT, ranks the genes across a large microarray dataset according to how closely their expression follows the expression of a query gene. A database stores pre-calculated co-expression results for ∼21 800 genes based on data from over 300 arrays. These results can be corroborated by calculation of co-expression results for user-defined sub-sets of arrays or experiments from the NASC/GARNet array dataset. Clique Finder (CF) identifies groups of genes which are consistently co-expressed with each other across a user-defined co-expression list. The parameters can be altered easily to adjust cluster size and the output examined for optimal inclusion of genes with known biological roles. Alternatively, a Scatter Plot tool displays the correlation coefficients for all genes against two user-selected queries on a scatter plot which can be useful for visual identification of clusters of genes with similar r-values. User-input groups of genes can be highlighted on the scatter plots. Inclusion of genes with known biology in sets of genes identified using CF and Scatter Plot tools allows inferences to be made about the roles of the other genes in the set and both tools can therefore be used to generate short lists of genes for further characterization. ACT is freely available at .

Bioinformatics | 2003

A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function

Vidhya Gomathi Krishnan; David R. Westhead

MOTIVATION The large volume of single nucleotide polymorphism data now available motivates the development of methods for distinguishing neutral changes from those which have real biological effects. Here, two different machine-learning methods, decision trees and support vector machines (SVMs), are applied for the first time to this problem. In common with most other methods, only non-synonymous changes in protein coding regions of the genome are considered. RESULTS In detailed cross-validation analysis, both learning methods are shown to compete well with existing methods, and to out-perform them in some key tests. SVMs show better generalization performance, but decision trees have the advantage of generating interpretable rules with robust estimates of prediction confidence. It is shown that the inclusion of protein structure information produces more accurate methods, in agreement with other recent studies, and the effect of using predicted rather than actual structure is evaluated. AVAILABILITY Software is available on request from the authors.

Genome Biology | 2005

Natural antisense transcripts with coding capacity in Arabidopsis may have a regulatory role that is not linked to double-stranded RNA degradation

Chih-Hung Jen; Ioannis Michalopoulos; David R. Westhead; Peter Meyer

BackgroundOverlapping transcripts in antisense orientation have the potential to form double-stranded RNA (dsRNA), a substrate for a number of different RNA-modification pathways. One prominent route for dsRNA is its breakdown by Dicer enzyme complexes into small RNAs, a pathway that is widely exploited by RNA interference technology to inactivate defined genes in transgenic lines. The significance of this pathway for endogenous gene regulation remains unclear.ResultsWe have examined transcription data for overlapping gene pairs in Arabidopsis thaliana. On the basis of an analysis of transcripts with coding regions, we find the majority of overlapping gene pairs to be convergently overlapping pairs (COPs), with the potential for dsRNA formation. In all tissues, COP transcripts are present at a higher frequency compared to the overall gene pool. The probability that both the sense and antisense copy of a COP are co-transcribed matches the theoretical value for coexpression under the assumption that the expression of one partner does not affect the expression of the other. Among COPs, we observe an over-representation of spliced (intron-containing) genes (90%) and of genes with alternatively spliced transcripts. For loci where antisense transcripts overlap with sense transcript introns, we also find a significant bias in favor of alternative splicing and variation of polyadenylation.ConclusionThe results argue against a predominant RNA degradation effect induced by dsRNA formation. Instead, our data support alternative roles for dsRNAs. They suggest that at least for a subgroup of COPs, antisense expression may induce alternative splicing or polyadenylation.

Nucleic Acids Research | 2006

Identification of the REST regulon reveals extensive transposable element-mediated binding site duplication

Rory Johnson; Richard J. Gamblin; Lezanne Ooi; Alexander W. Bruce; Ian J. Donaldson; David R. Westhead; Ian C. Wood; Richard M. Jackson; Noel J. Buckley

The genome-wide mapping of gene-regulatory motifs remains a major goal that will facilitate the modelling of gene-regulatory networks and their evolution. The repressor element 1 is a long, conserved transcription factor-binding site which recruits the transcriptional repressor REST to numerous neuron-specific target genes. REST plays important roles in multiple biological processes and disease states. To map RE1 sites and target genes, we created a position specific scoring matrix representing the RE1 and used it to search the human and mouse genomes. We identified 1301 and 997 RE1s inhuman and mouse genomes, respectively, of which >40% are novel. By employing an ontological analysis we show that REST target genes are significantly enriched in a number of functional classes. Taking the novel REST target gene CACNA1A as an experimental model, we show that it can be regulated by multiple RE1s of different binding affinities, which are only partially conserved between human and mouse. A novel BLAST methodology indicated that many RE1s belong to closely related families. Most of these sequences are associated with transposable elements, leading us to propose that transposon-mediated duplication and insertion of RE1s has led to the acquisition of novel target genes by REST during evolution.

The EMBO Journal | 2012

RUNX1 reshapes the epigenetic landscape at the onset of haematopoiesis.

Monika Lichtinger; Richard Ingram; Rebecca Hannah; Dorothee Müller; Deborah Clarke; Salam A. Assi; Michael Lie-A-Ling; Laura Noailles; M. S. Vijayabaskar; Mengchu Wu; Daniel G. Tenen; David R. Westhead; Valerie Kouskoff; Georges Lacaud; Berthold Göttgens; Constanze Bonifer

Cell fate decisions during haematopoiesis are governed by lineage‐specific transcription factors, such as RUNX1, SCL/TAL1, FLI1 and C/EBP family members. To gain insight into how these transcription factors regulate the activation of haematopoietic genes during embryonic development, we measured the genome‐wide dynamics of transcription factor assembly on their target genes during the RUNX1‐dependent transition from haemogenic endothelium (HE) to haematopoietic progenitors. Using a Runx1−/− embryonic stem cell differentiation model expressing an inducible Runx1 gene, we show that in the absence of RUNX1, haematopoietic genes bind SCL/TAL1, FLI1 and C/EBPβ and that this early priming is required for correct temporal expression of the myeloid master regulator PU.1 and its downstream targets. After induction, RUNX1 binds to numerous de novo sites, initiating a local increase in histone acetylation and rapid global alterations in the binding patterns of SCL/TAL1 and FLI1. The acquisition of haematopoietic fate controlled by Runx1 therefore does not represent the establishment of a new regulatory layer on top of a pre‐existing HE program but instead entails global reorganization of lineage‐specific transcription factor assemblies.

Bioinformatics | 1999

Motif-based searching in TOPS protein topology databases.

David R. Gilbert; David R. Westhead; Nozomi Nagano; Janet M. Thornton

MOTIVATION TOPS cartoons are a schematic ion of protein three-dimensional structures in two dimensions, and are used for understanding and manual comparison of protein folds. Recently, an algorithm that produces the cartoons automatically from protein structures has been devised and cartoons have been generated to represent all the structures in the structural databank. There is now a need to be able to define target topological patterns and to search the database for matching domains. RESULTS We have devised a formal language for describing TOPS diagrams and patterns, and have designed an efficient algorithm to match a pattern to a set of diagrams. A pattern-matching system has been implemented, and tested on a database derived from all the current entries in the Protein Data Bank (15,000 domains). Users can search on patterns selected from a library of motifs or, alternatively, they can define their own search patterns. AVAILABILITY The system is accessible over the Web at http://tops.ebi.ac.uk/tops

Explore More