Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where John B. O. Mitchell is active.

Publication


Featured researches published by John B. O. Mitchell.


Structure | 1998

Protein folds and functions

Andrew C. R. Martin; Christine A. Orengo; E. Gail Hutchinson; Susan Jones; Maria Karmirantzou; Roman A. Laskowski; John B. O. Mitchell; Chiara Taroni; Janet M. Thornton

BACKGROUND The recent rapid increase in the number of available three-dimensional protein structures has further highlighted the necessity to understand the relationship between biological function and structure. Using structural classification schemes such as SCOP, CATH and DALI, it is now possible to explore global relationships between protein fold and function, something which was previously impractical. RESULTS Using a relational database of CATH data we have generated fold distributions for arbitrary selections of proteins automatically. These distributions have been examined in the light of protein function and bound ligand. Different enzyme classes are not clearly reflected in distributions of protein class and architecture, whereas the type of bound ligand has a much more dramatic effect. CONCLUSIONS The availability of structural classification data has enabled this novel overview analysis. We conclude that function at the top level of the EC number enzyme classification is not related to fold, as only a very few specific residues are actually responsible for enzyme activity. Conversely, the fold is much more closely related to ligand type.


Bioinformatics | 2010

A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking

Pedro J. Ballester; John B. O. Mitchell

MOTIVATION Accurately predicting the binding affinities of large sets of diverse protein-ligand complexes is an extremely challenging task. The scoring functions that attempt such computational prediction are essential for analysing the outputs of molecular docking, which in turn is an important technique for drug discovery, chemical biology and structural biology. Each scoring function assumes a predetermined theory-inspired functional form for the relationship between the variables that characterize the complex, which also include parameters fitted to experimental or simulation data and its predicted binding affinity. The inherent problem of this rigid approach is that it leads to poor predictivity for those complexes that do not conform to the modelling assumptions. Moreover, resampling strategies, such as cross-validation or bootstrapping, are still not systematically used to guard against the overfitting of calibration data in parameter estimation for scoring functions. RESULTS We propose a novel scoring function (RF-Score) that circumvents the need for problematic modelling assumptions via non-parametric machine learning. In particular, Random Forest was used to implicitly capture binding effects that are hard to model explicitly. RF-Score is compared with the state of the art on the demanding PDBbind benchmark. Results show that RF-Score is a very competitive scoring function. Importantly, RF-Scores performance was shown to improve dramatically with training set size and hence the future availability of more high-quality structural and interaction data is expected to lead to improved versions of RF-Score. CONTACT [email protected]; [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.


Journal of Computational Chemistry | 1999

BLEEP—potential of mean force describing protein–ligand interactions: I. Generating potential

John B. O. Mitchell; Roman A. Laskowski; Alexander Alex; Janet M. Thornton

We have developed BLEEP (biomolecular ligand energy evaluation protocol), an atomic level potential of mean force (PMF) describing protein–ligand interactions. The pair potentials for BLEEP have been derived from high‐resolution X‐ray structures of protein–ligand complexes in the Brookhaven Protein Data Bank (PDB), with a careful treatment of homology. The use of a broad variety of protein–ligand structures in the derivation phase gives BLEEP more general applicability than previous potentials, which have been based on limited classes of complexes, and thus represents a significant step forward. We calculate the distance distributions in protein–ligand interactions for all 820 possible pairs that can be chosen from our set of 40 different atom types, including polar hydrogen. We then use a reverse Boltzmann methodology to convert these into energy‐like pair potential functions. Two versions of BLEEP are calculated, one including and one excluding interactions between protein and water. The pair potentials are found to have the expected forms; polar and hydrogen bonding interactions show minima at short range, around 3.0 Å, whereas a typical hydrophobic interaction is repulsive at this distance, with values above 4.0 Å being preferred. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 1165–1176, 1999


Journal of Chemical Information and Modeling | 2007

Random forest models to predict aqueous solubility

David S. Palmer; Noel M. O'Boyle; Robert C. Glen; John B. O. Mitchell

Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.


Journal of Chemical Information and Modeling | 2008

Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and Log P

Laura D. Hughes; David S. Palmer; Florian Nigsch; John B. O. Mitchell

This paper attempts to elucidate differences in QSPR models of aqueous solubility (Log S), melting point (Tm), and octanol-water partition coefficient (Log P), three properties of pharmaceutical interest. For all three properties, Support Vector Machine models using 2D and 3D descriptors calculated in the Molecular Operating Environment were the best models. Octanol-water partition coefficient was the easiest property to predict, as indicated by the RMSE of the external test set and the coefficient of determination (RMSE = 0.73, r2 = 0.87). Melting point prediction, on the other hand, was the most difficult (RMSE = 52.8 degrees C, r2 = 0.46), and Log S statistics were intermediate between melting point and Log P prediction (RMSE = 0.900, r2 = 0.79). The data imply that for all three properties the lack of measured values at the extremes is a significant source of error. This source, however, does not entirely explain the poor melting point prediction, and we suggest that deficiencies in descriptors used in melting point prediction contribute significantly to the prediction errors.


Wiley Interdisciplinary Reviews: Computational Molecular Science | 2014

Machine learning methods in chemoinformatics

John B. O. Mitchell

Machine learning algorithms are generally developed in computer science or adjacent disciplines and find their way into chemical modeling by a process of diffusion. Though particular machine learning methods are popular in chemoinformatics and quantitative structure–activity relationships (QSAR), many others exist in the technical literature. This discussion is methods‐based and focused on some algorithms that chemoinformatics researchers frequently use. It makes no claim to be exhaustive. We concentrate on methods for supervised learning, predicting the unknown property values of a test set of instances, usually molecules, based on the known values for a training set. Particularly relevant approaches include Artificial Neural Networks, Random Forest, Support Vector Machine, k‐Nearest Neighbors and naïve Bayes classifiers. WIREs Comput Mol Sci 2014, 4:468–481.


Bioinformatics | 2003

Protein Ligand Database (PLD): additional understanding of the nature and specificity of protein–ligand complexes

Dushyanthan Puvanendrampillai; John B. O. Mitchell

SUMMARY The Protein Ligand Database (PLD) is a publicly available web-based database that aims to provide further understanding of protein-ligand interactions. The PLD contains biomolecular data including calculated binding energies, Tanimoto ligand similarity scores and protein percentage sequence similarities. The database has potential for application as a tool in molecular design. AVAILABILITY http://www-mitchell.ch.cam.ac.uk/pld/


Journal of Computational Chemistry | 1999

BLEEP-POTENTIAL OF MEAN FORCE DESCRIBING PROTEIN-LIGAND INTERACTIONS : II.CALCULATION OF BINDING ENERGIES AND COMPARISON WITH EXPERIMENTAL DATA

John B. O. Mitchell; Roman A. Laskowski; Alexander Alex; Mark J. Forster; Janet M. Thornton

We have developed BLEEP (biomolecular ligand energy evaluation protocol), an atomic level potential of mean force (PMF) describing protein–ligand interactions. Here, we present four tests designed to assess different attributes of BLEEP. Calculating the energy of a small hydrogen‐bonded complex allows us to compare BLEEPs description of this system with a quantum‐chemical description. The results suggest that BLEEP gives an adequate description of hydrogen bonding. A study of the relative energies of various heparin binding geometries for human basic fibroblast growth factor (bFGF) demonstrates that BLEEP performs excellently in identifying low‐energy binding modes from decoy conformations for a given protein–ligand complex. We also calculate binding energies for a set of 90 protein–ligand complexes, obtaining a correlation coefficient of 0.74 when compared with experiment. This shows that BLEEP can perform well in the difficult area of ranking the interaction energies of diverse complexes. We also study a set of nine serine proteinase–inhibitor complexes; BLEEPs good performance here illustrates its ability to determine the relative energies of a series of similar complexes. We find that a protocol for incorporating solvation does not improve correlation with experiment. ©1999 John Wiley & Sons, Inc. J Comput Chem 20: 1177–1185, 1999


Journal of Chemical Information and Modeling | 2006

Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization.

Florian Nigsch; Andreas Bender; Bernd van Buuren; Jos Tissen; Eduard A. Nigsch; John B. O. Mitchell

We have applied the k-nearest neighbor (kNN) modeling technique to the prediction of melting points. A data set of 4119 diverse organic molecules (data set 1) and an additional set of 277 drugs (data set 2) were used to compare performance in different regions of chemical space, and we investigated the influence of the number of nearest neighbors using different types of molecular descriptors. To compute the prediction on the basis of the melting temperatures of the nearest neighbors, we used four different methods (arithmetic and geometric average, inverse distance weighting, and exponential weighting), of which the exponential weighting scheme yielded the best results. We assessed our model via a 25-fold Monte Carlo cross-validation (with approximately 30% of the total data as a test set) and optimized it using a genetic algorithm. Predictions for drugs based on drugs (separate training and test sets each taken from data set 2) were found to be considerably better [root-mean-squared error (RMSE)=46.3 degrees C, r2=0.30] than those based on nondrugs (prediction of data set 2 based on the training set from data set 1, RMSE=50.3 degrees C, r2=0.20). The optimized model yields an average RMSE as low as 46.2 degrees C (r2=0.49) for data set 1, and an average RMSE of 42.2 degrees C (r2=0.42) for data set 2. It is shown that the kNN method inherently introduces a systematic error in melting point prediction. Much of the remaining error can be attributed to the lack of information about interactions in the liquid state, which are not well-captured by molecular descriptors.


Journal of Molecular Biology | 2009

Understanding the functional roles of amino acid residues in enzyme catalysis

Gemma L. Holliday; John B. O. Mitchell; Janet M. Thornton

The MACiE database contains 223 distinct step-wise enzyme reaction mechanisms and holds representatives from each EC sub-subclass where there is a crystal structure and sufficient evidence in the literature to support a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated so that it includes the function of the catalytic residues involved in the reaction and the mechanism by which substrates are transformed into products. Using MACiE as a knowledge base, we have seen that the top 10 most catalytic residues are histidine, aspartate, glutamate, lysine, cysteine, arginine, serine, threonine, tyrosine and tryptophan. Of these only seven (cysteine, histidine, aspartate, lysine, serine, threonine and tyrosine) dominate catalysis and provide essentially five functional roles that are essential. Stabilisation is the most common and essential role for all classes of enzyme, followed by general acid/base (proton acceptor and proton donor) functionality, with nucleophilic addition following closely behind (nucleophile and nucleofuge). We investigated the occurrence of these residues in MACiE and the Catalytic Site Atlas and found that, as expected, certain residue types are associated with each functional role, with some residue types able to perform diverse roles. In addition, it was seen that different EC classes of enzyme have a tendency to employ different residues for catalysis. Further, we show that whilst the differences between EC classes in catalytic residue composition are not immediately obvious from the general classes of Ingold mechanisms, there is some weak correlation between the mechanisms involved in a given EC class and the functions that the catalytic amino acid residues are performing. The analysis presented here provides a valuable insight into the functional roles of catalytic amino acid residues, which may have applications in many aspects of enzymology, from the design of novel enzymes to the prediction and validation of enzyme reaction mechanisms.

Collaboration


Dive into the John B. O. Mitchell's collaboration.

Top Co-Authors

Avatar

Janet M. Thornton

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Sarah L. Price

University College London

View shared research outputs
Top Co-Authors

Avatar

David S. Palmer

University of Strathclyde

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gemma L. Holliday

European Bioinformatics Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Lazaros Mavridis

Queen Mary University of London

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge