Jonathan D. Hirst | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jonathan D. Hirst is active.

Explore More

Publication

Featured researches published by Jonathan D. Hirst.

BMC Bioinformatics | 2008

Prediction of glycosylation sites using random forests

Stephen E Hamby; Jonathan D. Hirst

BackgroundPost translational modifications (PTMs) occur in the vast majority of proteins and are essential for function. Prediction of the sequence location of PTMs enhances the functional characterisation of proteins. Glycosylation is one type of PTM, and is implicated in protein folding, transport and function.ResultsWe use the random forest algorithm and pairwise patterns to predict glycosylation sites. We identify pairwise patterns surrounding glycosylation sites and use an odds ratio to weight their propensity of association with modified residues. Our prediction program, GPP (glycosylation prediction program), predicts glycosylation sites with an accuracy of 90.8% for Ser sites, 92.0% for Thr sites and 92.8% for Asn sites. This is significantly better than current glycosylation predictors. We use the trepan algorithm to extract a set of comprehensible rules from GPP, which provide biological insight into all three major glycosylation types.ConclusionWe have created an accurate predictor of glycosylation sites and used this to extract comprehensible rules about the glycosylation process. GPP is available online at http://comp.chem.nottingham.ac.uk/glyco/.

Journal of Computational Chemistry | 1998

Assessing Energy Functions for Flexible Docking

Michal Vieth; Jonathan D. Hirst; Andrzej Kolinski; Charles L. Brooks

A good docking algorithm requires an energy function that is selective, in that it clearly differentiates correctly docked structures from misdocked ones, and that is efficient, meaning that a correctly docked structure can be identified quickly. We assess the selectivity and efficiency of a broad spectrum of energy functions, derived from systematic modifications of the CHARMM param19/toph19 energy function. In particular, we examine the effects of the dielectric constant, the solvation model, the scaling of surface charges, reduction of van der Waals repulsion, and nonbonded cutoffs. Based on an assessment of the energy functions for the docking of five different ligand–receptor complexes, we find that selective energy functions include a variety of distance‐dependent dielectric models together with truncation of the nonbonded interactions at 8 Å. We evaluate the docking efficiency, the mean number of docked structures per unit of time, of the more selective energy functions, using a simulated annealing molecular dynamics protocol. The largest improvements in efficiency come from a reduction of van der Waals repulsion and a reduction of surface charges. We note that the most selective potential is quite inefficient, although a hierarchical approach can be employed to take advantage of both selective and efficient energy functions. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1612–1622, 1998

Journal of Computational Chemistry | 1998

Assessing Search Strategies for Flexible Docking

Michal Vieth; Jonathan D. Hirst; Brian N. Dominy; Heidi Daigler; Charles L. Brooks

We assess the efficiency of molecular dynamics (MD), Monte Carlo (MC), and genetic algorithms (GA) for docking five representative ligand–receptor complexes. All three algorithms employ a modified CHARMM‐based energy function. The algorithms are also compared with an established docking algorithm, AutoDock. The receptors are kept rigid while flexibility of ligands is permitted. To test the efficiency of the algorithms, two search spaces are used: an 11‐Å‐radius sphere and a 2.5‐Å‐radius sphere, both centered on the active site. We find MD is most efficient in the case of the large search space, and GA outperforms the other methods in the small search space. We also find that MD provides structures that are, on average, lower in energy and closer to the crystallographic conformation. The GA obtains good solutions over the course of the fewest energy evaluations. However, due to the nature of the nonbonded interaction calculations, the GA requires the longest time for a single energy evaluation, which results in a decreased efficiency. The GA and MC search algorithms are implemented in the CHARMM macromolecular package. © 1998 John Wiley & Sons, Inc. J Comput Chem 19: 1623–1631, 1998

Combinatorial Chemistry & High Throughput Screening | 2009

Machine Learning in Virtual Screening

James L. Melville; Edmund K. Burke; Jonathan D. Hirst

In this review, we highlight recent applications of machine learning to virtual screening, focusing on the use of supervised techniques to train statistical learning algorithms to prioritize databases of molecules as active against a particular protein target. Both ligand-based similarity searching and structure-based docking have benefited from machine learning algorithms, including naïve Bayesian classifiers, support vector machines, neural networks, and decision trees, as well as more traditional regression techniques. Effective application of these methodologies requires an appreciation of data preparation, validation, optimization, and search methodologies, and we also survey developments in these areas.

Journal of Chemical Information and Modeling | 2007

Contemporary QSAR Classifiers Compared

Craig L. Bruce; James L. Melville; Stephen Pickett; Jonathan D. Hirst

We present a comparative assessment of several state-of-the-art machine learning tools for mining drug data, including support vector machines (SVMs) and the ensemble decision tree methods boosting, bagging, and random forest, using eight data sets and two sets of descriptors. We demonstrate, by rigorous multiple comparison statistical tests, that these techniques can provide consistent improvements in predictive performance over single decision trees. However, within these methods, there is no clearly best-performing algorithm. This motivates a more in-depth investigation into the properties of random forests. We identify a set of parameters for the random forest that provide optimal performance across all the studied data sets. Additionally, the tree ensemble structure of the forest may provide an interpretable model, a considerable advantage over SVMs. We test this possibility and compare it with standard decision tree models.

Journal of Computer-aided Molecular Design | 1994

Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolate reductase by pyrimidines.

Jonathan D. Hirst; Ross D. King; Michael J. E. Sternberg

SummaryNeural networks and inductive logic programming (ILP) have been compared to linear regression for modelling the QSAR of the inhibition of E. coli dihydrofolate reductase (DHFR) by 2,4-diamino-5-(substitured benzyl)pyrimidines, and, in the subsequent paper [Hirst, J.D., King, R.D. and Sternberg, M.J.E., J. Comput.-Aided Mol. Design, 8 (1994) 421], the inhibition of rodent DHFR by 2,4-diamino-6,6-dimethyl-5-phenyl-dihydrotriazines. Cross-validation trials provide a statistically rigorous assessment of the predictive capabilities of the methods, with training and testing data selected randomly and all the methods developed using identical training data. For the ILP analysis, molecules are represented by attributes other than Hansch parameters. Neural networks and ILP perform better than linear regression using the attribute representation, but the difference is not statistically significant. The major benefit from the ILP analysis is the formulation of understandable rules relating the activity of the inhibitors to their chemical structure.

Bioinformatics | 2009

DichroCalc—circular and linear dichroism online

Benjamin M. Bulheller; Jonathan D. Hirst

MOTIVATION Circular dichroism (CD) is widely used in studies of protein folding. The CD spectrum of a protein can be estimated from its structure alone, using the well-established matrix method. In the last decade, a related spectroscopy, linear dichroism (LD), has been increasingly applied to study the orientation of proteins in solution. However, matrix method calculations of LD spectra have not been presented before. DichroCalc makes both CD and LD calculations available in an easy-to-use fashion. RESULTS DichroCalc can be used without registration and calculates CD and LD spectra using a variety of matrix method parameters. PDB files can be uploaded as input or retrieved via their PDB code and a Perl-based parser is offered for easy handling of PDB files. AVAILABILITY http://comp.chem.nottingham.ac.uk/dichrocalc and http://comp.chem.nottingham.ac.uk/parsepdb.

Proteins | 2005

Protein secondary structure prediction with dihedral angles.

Matthew J. Wood; Jonathan D. Hirst

We present DESTRUCT, a new method of protein secondary structure prediction, which achieves a three‐state accuracy (Q3) of 79.4% in a cross‐validated trial on a nonredundant set of 513 proteins. An iterative set of cascade–correlation neural networks is used to predict both secondary structure and ψ dihedral angles, with predicted values enhancing the subsequent iteration. Predictive accuracies of 80.7% and 81.7% are achieved on the CASP4 and CASP5 targets, respectively. Our approach is significantly more accurate than other contemporary methods, due to feedback and a novel combination of structural representations. Proteins 2005.

BMC Bioinformatics | 2007

ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information

Daniel Barthel; Jonathan D. Hirst; Jacek Blazewicz; Edmund K. Burke; Natalio Krasnogor

BackgroundWe introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures.ResultsWe present ProCKSIs architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSIs new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure.ConclusionBased on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface.ProCKSI is publicly available at http://www.procksi.net for academic and non-commercial use.

Computational Biology and Chemistry | 2009

Supervised machine learning algorithms for protein structure classification

Pooja Jain; Jonathan M. Garibaldi; Jonathan D. Hirst

We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.

Explore More