Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert P. Sheridan is active.

Publication


Featured researches published by Robert P. Sheridan.


Drug Discovery Today | 2002

Why do we need so many chemical similarity search methods

Robert P. Sheridan; Simon K. Kearsley

Computational tools to search chemical structure databases are essential to finding leads early in a drug discovery project. Similarity methods are among the most diverse and most useful. We will present some lessons we have gathered over many years experience with in-house methods on several therapeutic problems. The effectiveness of any similarity method can vary greatly from one biological activity to another in a way that is difficult to predict. Also, any two methods tend to select different subsets of actives from a database, so it is advisable to use several search methods where possible.


Journal of Chemical Information and Computer Sciences | 1996

CHEMICAL SIMILARITY USING PHYSIOCHEMICAL PROPERTY DESCRIPTORS

Simon K. Kearsley; Susan Sallamack; Eugene M. Fluder; Joseph D. Andose; Ralph T. Mosley; Robert P. Sheridan

Similarity searches using topological descriptors have proved extremely useful in aiding large-scale screening. We describe alternative forms of the atom pair (Carhart et al. J. Chem. Inf. Comput. Sci. 1985, 25, 64−73.) and topological torsion (Nilakantan et al. J. Chem. Inf. Comput. Sci. 1987, 27, 82−85.) descriptors that use physiochemical atom types. These types are based on binding property class, atomic log P contribution, and partial atomic charges. The new descriptors are meant to be more “fuzzy” than the original descriptors. We propose objective criteria for determining how effective one descriptor is versus another in selecting active compounds from large databases. Using these criteria, we run similarity searches over the Derwent Standard Drug File with ten typical druglike probes. The new descriptors are not as good as the original descriptors in selecting actives if one considers the average over all probes, but the new descriptors do better for several individual probes. Generally we find th...


Journal of Computer-aided Molecular Design | 1994

FLOG: A system to select ‘quasi-flexible’ ligands complementary to a receptor of known three-dimensional structure

Michael D. Miller; Simon K. Kearsley; Dennis J. Underwood; Robert P. Sheridan

SummaryWe present a system, FLOG (Flexible Ligands Oriented on Grid), that searches a database of 3D coordinates to find molecules complementary to a macromolecular receptor of known 3D structure. The philosophy of FLOG is similar to that reported for DOCK [Shoichet, B.K. et al., J. Comput. Chem., 13 (1992) 380]. In common with that system, we use a match center representation of the volume of the binding cavity and we use a clique-finding algorithm to generate trial orientations of each candidate ligand in the binding site. Also we use a grid representation of the receptor to assess the fit of each orientation. We have introduced a number of novel features within this paradigm. First, we address ligand flexibility by including up to 25 explicit conformations of each structure in our databases. Nonhydrogen atoms in each database entry are assigned one of seven atom types (anion, cation, donor, acceptor, polar, hydrophobic and other) based on their local bonded chemical environments. Second, we have devised a new grid-based scoring function compatible with this ‘heavy atom’ representation of the ligands. This includes several potentials (electrostatic, hydrogen bonding, hydrophobic and van der Waals) calculated from the location of the receptor atoms. Third, we have improved the fitting stage of the search. Initial dockings are generated with a more efficient clique-finding algorithm. This new algorithm includes the concept of ‘essential points’, match centers that must be paired with a ligand atom. Also, we introduce the use of a rapid simplex-based rigid-body optimizer to refine the orientations. We demonstrate, using dihydrofolate reductase as a sample receptor, that the FLOG system can select known inhibitors from a large database of drug-like compounds.


Journal of Chemical Information and Computer Sciences | 2002

The Most Common Chemical Replacements in Drug-Like Compounds

Robert P. Sheridan

We have written a method that extracts one-to-one replacements of chemical groups in pairs of drug-like molecules with the same biological activity and counts the frequency of the replacements in a large collection of such molecules. There are two variations on the method that differ in their treatment of replacements in rings. This method is one possible approach to systematically identify candidate bioisosteres. Here we look at the MDDR database because it has a large diversity of drug-like compounds in a large number of therapeutic areas. The most frequent replacements in MDDR seem generally consistent with medicinal chemistry intuition about what chemical groups are equivalent or with groups that are easily converted by synthetic or metabolic pathways. This method can be applied to any set of molecules wherein the molecules can be paired by similar biological activity.


Journal of Chemical Information and Modeling | 2015

Deep Neural Nets as a Method for Quantitative Structure–Activity Relationships

Junshui Ma; Robert P. Sheridan; Andy Liaw; George E. Dahl; Vladimir Svetnik

Neural networks were widely used for quantitative structure-activity relationships (QSAR) in the 1990s. Because of various practical issues (e.g., slow on large problems, difficult to train, prone to overfitting, etc.), they were superseded by more robust methods like support vector machine (SVM) and random forest (RF), which arose in the early 2000s. The last 10 years has witnessed a revival of neural networks in the machine learning community thanks to new methods for preventing overfitting, more efficient training algorithms, and advancements in computer hardware. In particular, deep neural nets (DNNs), i.e. neural nets with more than one hidden layer, have found great successes in many applications, such as computer vision and natural language processing. Here we show that DNNs can routinely make better prospective predictions than RF on a set of large diverse QSAR data sets that are taken from Mercks drug discovery effort. The number of adjustable parameters needed for DNNs is fairly large, but our results show that it is not necessary to optimize them for individual data sets, and a single set of recommended parameters can achieve better performance than RF for most of the data sets we studied. The usefulness of the parameters is demonstrated on additional data sets not used in the calibration. Although training DNNs is still computationally intensive, using graphical processing units (GPUs) can make this issue manageable.


Journal of Chemical Information and Computer Sciences | 1996

CHEMICAL SIMILARITY USING GEOMETRIC ATOM PAIR DESCRIPTORS

Robert P. Sheridan; Michael D. Miller; Dennis J. Underwood; Simon K. Kearsley

Similarity searches using topological descriptors have proved extremely useful in aiding large-scale screening. In this paper we describe the geometric atom pair, the 3D analog of the topological atom pair descriptor (Carhart et al. J. Chem. Inf. Comput. Sci. 1985, 25, 64−73). We show the results of geometric similarity searches using the CONCORD-build structures of typical small druglike molecules as probes. The database to be searched is a 3D version of the Derwent Standard Drug File that contains an average of 10 explicit conformations per compound. Using objective criteria for determining how good a descriptor is in selecting active compounds from large databases, we compare the results using the geometric versus the topological atom pair. We find that geometric and topological atom pairs are about equally effective in selecting active compounds from large databases. How the two types of descriptors rank active compounds is generally about the same as well, but occasionally active compounds will be se...


Journal of Chemical Information and Modeling | 2005

Boosting: an ensemble learning tool for compound classification and QSAR modeling.

Vladimir Svetnik; Ting Wang; Christopher Tong; Andy Liaw; Robert P. Sheridan; Qinghua Song

A classification and regression tool, J. H. Friedmans Stochastic Gradient Boosting (SGB), is applied to predicting a compounds quantitative or categorical biological activity based on a quantitative description of the compounds molecular structure. Stochastic Gradient Boosting is a procedure for building a sequence of models, for instance regression trees (as in this paper), whose outputs are combined to form a predicted quantity, either an estimate of the biological activity, or a class label to which a molecule belongs. In particular, the SGB procedure builds a model in a stage-wise manner by fitting each tree to the gradient of a loss function: e.g., squared error for regression and binomial log-likelihood for classification. The values of the gradient are computed for each sample in the training set, but only a random sample of these gradients is used at each stage. (Friedman showed that the well-known boosting algorithm, AdaBoost of Freund and Schapire, could be considered as a particular case of SGB.) The SGB method is used to analyze 10 cheminformatics data sets, most of which are publicly available. The results show that SGBs performance is comparable to that of Random Forest, another ensemble learning method, and are generally competitive with or superior to those of other QSAR methods. The use of SGBs variable importance with partial dependence plots for model interpretation is also illustrated.


Journal of Computer-aided Molecular Design | 1994

Flexibases: A way to enhance the use of molecular docking methods

Simon K. Kearsley; Dennis J. Underwood; Robert P. Sheridan; Michael D. Miller

SummarySpecially expanded databases containing three-dimensional structures are created to enhance the utility of docking methods to find new leads, i.e., active compounds of pharmacological interest. The expansion is based on the automatic generation of a set of maximally dissimilar conformations. The ligand receptor system of methotrexate and dihydrofolate reductase is used to demonstrate the feasibility of creating flexibases and their utility in docking studies.


Journal of Chemical Information and Modeling | 2013

Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction.

Robert P. Sheridan

Cross-validation is a common method to validate a QSAR model. In cross-validation, some compounds are held out as a test set, while the remaining compounds form a training set. A model is built from the training set, and the test set compounds are predicted on that model. The agreement of the predicted and observed activity values of the test set (measured by, say, R(2)) is an estimate of the self-consistency of the model and is sometimes taken as an indication of the predictivity of the model. This estimate of predictivity can be optimistic or pessimistic compared to true prospective prediction, depending how compounds in the test set are selected. Here, we show that time-split selection gives an R(2) that is more like that of true prospective prediction than the R(2) from random selection (too optimistic) or from our analog of leave-class-out selection (too pessimistic). Time-split selection should be used in addition to random selection as a standard for cross-validation in QSAR model building.


Journal of Chemical Information and Modeling | 2012

Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest

Robert P. Sheridan

One popular metric for estimating the accuracy of prospective quantitative structure-activity relationship (QSAR) predictions is based on the similarity of the compound being predicted to compounds in the training set from which the QSAR model was built. More recent work in the field has indicated that other parameters might be equally or more important than similarity. Here we make use of two additional parameters: the variation of prediction among random forest trees (less variation among trees indicates more accurate prediction) and the prediction itself (certain ranges of activity are intrinsically easier to predict than others). The accuracy of prediction for a QSAR model, as measured by the root-mean-square error, can be estimated by cross-validation on the training set at the time of model-building and stored as a three-dimensional array of bins. This is an obvious extension of the one-dimensional array of bins we previously proposed for similarity to the training set [Sheridan et al. J. Chem. Inf. Comput. Sci.2004, 44, 1912-1928]. We show that using these three parameters simultaneously adds much more discrimination in prediction accuracy than any single parameter. This approach can be applied to any QSAR method that produces an ensemble of models. We also show that the root-mean-square errors produced by cross-validation are predictive of root-mean-square errors of compounds tested after the model was built.

Collaboration


Dive into the Robert P. Sheridan's collaboration.

Researchain Logo
Decentralizing Knowledge