Is this you? Create Your Porfile

Jed Zaretzki

Rensselaer Polytechnic Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jed Zaretzki is active.

Explore More

Publication

Featured researches published by Jed Zaretzki.

ACS Medicinal Chemistry Letters | 2010

SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism

Patrik Rydberg; David E. Gloriam; Jed Zaretzki; Curt M. Breneman; Lars Olsen

SMARTCyp is an in silico method that predicts the sites of cytochrome P450-mediated metabolism of druglike molecules. The method is foremost a reactivity model, and as such, it shows a preference for predicting sites that are metabolized by the cytochrome P450 3A4 isoform. SMARTCyp predicts the site of metabolism directly from the 2D structure of a molecule, without requiring calculation of electronic properties or generation of 3D structures. This is a major advantage, because it makes SMARTCyp very fast. Other advantages are that experimental data are not a prerequisite to create the model, and it can easily be integrated with other methods to create models for other cytochrome P450 isoforms. Benchmarking tests on a database of 394 3A4 substrates show that SMARTCyp successfully identifies at least one metabolic site in the top two ranked positions 76% of the time. SMARTCyp is available for download at http://www.farma.ku.dk/p450.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012

Fast Bundle Algorithm for Multiple-Instance Learning

Charles Bergeron; Gregory M. Moore; Jed Zaretzki; Curt M. Breneman; Kristin P. Bennett

We present a bundle algorithm for multiple-instance classification and ranking. These frameworks yield improved models on many problems possessing special structure. Multiple-instance loss functions are typically nonsmooth and nonconvex, and current algorithms convert these to smooth nonconvex optimization problems that are solved iteratively. Inspired by the latest linear-time subgradient-based methods for support vector machines, we optimize the objective directly using a nonconvex bundle method. Computational results show this method is linearly scalable, while not sacrificing generalization accuracy, permitting modeling on new and larger data sets in computational chemistry and other applications. This new implementation facilitates modeling with kernels.

Journal of Chemical Information and Modeling | 2012

RS-Predictor models augmented with SMARTCyp reactivities: Robust metabolic regioselectivity predictions for nine CYP isozymes

Jed Zaretzki; Patrik Rydberg; Charles Bergeron; Kristin P. Bennett; Lars Olsen; Curt M. Breneman

RS-Predictor is a tool for creating pathway-independent, isozyme-specific, site of metabolism (SOM) prediction models using any set of known cytochrome P450 (CYP) substrates and metabolites. Until now, the RS-Predictor method was only trained and validated on CYP 3A4 data, but in the present study, we report on the versatility the RS-Predictor modeling paradigm by creating and testing regioselectivity models for substrates of the nine most important CYP isozymes. Through curation of source literature, we have assembled 680 substrates distributed among CYPs 1A2, 2A6, 2B6, 2C19, 2C8, 2C9, 2D6, 2E1, and 3A4, the largest publicly accessible collection of P450 ligands and metabolites released to date. A comprehensive investigation into the importance of different descriptor classes for identifying the regioselectivity mediated by each isozyme is made through the generation of multiple independent RS-Predictor models for each set of isozyme substrates. Two of these models include a density functional theory (DFT) reactivity descriptor derived from SMARTCyp. Optimal combinations of RS-Predictor and SMARTCyp are shown to have stronger performance than either method alone, while also exceeding the accuracy of the commercial regioselectivity prediction methods distributed by Optibrium and Schrödinger, correctly identifying a large proportion of the metabolites in each substrate set within the top two rank-positions: 1A2 (83.0%), 2A6 (85.7%), 2B6 (82.1%), 2C19 (86.2%), 2C8 (83.8%), 2C9 (84.5%), 2D6 (85.9%), 2E1 (82.8%), 3A4 (82.3%), and merged (86.0%). Comprehensive datamining of each substrate set and careful statistical analyses of the predictions made by the different models revealed new insights into molecular features that control metabolic regioselectivity and enable accurate prospective prediction of likely SOMs.

international conference on machine learning | 2008

Multiple instance ranking

Charles Bergeron; Jed Zaretzki; Curt M. Breneman; Kristin P. Bennett

This paper introduces a novel machine learning model called multiple instance ranking (MIRank) that enables ranking to be performed in a multiple instance learning setting. The motivation for MIRank stems from the hydrogen abstraction problem in computational chemistry, that of predicting the group of hydrogen atoms from which a hydrogen is abstracted (removed) during metabolism. The model predicts the preferred hydrogen group within a molecule by ranking the groups, with the ambiguity of not knowing which hydrogen atom within the preferred group is actually abstracted. This paper formulates MIRank in its general context and proposes an algorithm for solving MIRank problems using successive linear programming. The method outperforms multiple instance classification models on several real and synthetic datasets.

Journal of Chemical Information and Modeling | 2013

DR-Predictor: Incorporating Flexible Docking with Specialized Electronic Reactivity and Machine Learning Techniques to Predict CYP-Mediated Sites of Metabolism

Tao-wei Huang; Jed Zaretzki; Charles Bergeron; Kristin P. Bennett; Curt M. Breneman

Computational methods that can identify CYP-mediated sites of metabolism (SOMs) of drug-like compounds have become required tools for early stage lead optimization. In recent years, methods that combine CYP binding site features with CYP/ligand binding information have been sought in order to increase the prediction accuracy of such hybrid models over those that use only one representation. Two challenges that any hybrid ligand/structure-based method must overcome are (1) identification of the best binding pose for a specific ligand with a given CYP and (2) appropriately incorporating the results of docking with ligand reactivity. To address these challenges we have created Docking-Regioselectivity-Predictor (DR-Predictor)--a method that incorporates flexible docking-derived information with specialized electronic reactivity and multiple-instance-learning methods to predict CYP-mediated SOMs. In this study, the hybrid ligand-structure-based DR-Predictor method was tested on substrate sets for CYP 1A2 and CYP 2A6. For these data, the DR-Predictor model was found to identify the experimentally observed SOM within the top two predicted rank-positions for 86% of the 261 1A2 substrates and 83% of the 100 2A6 substrates. Given the accuracy and extendibility of the DR-Predictor method, we anticipate that it will further facilitate the prediction of CYP metabolism liabilities and aid in in-silico ADMET assessment of novel structures.

Bioinformatics | 2013

RS-WebPredictor

Jed Zaretzki; Charles Bergeron; Tao-wei Huang; Patrik Rydberg; S. Joshua Swamidass; Curt M. Breneman

SUMMARY Regioselectivity-WebPredictor (RS-WebPredictor) is a server that predicts isozyme-specific cytochrome P450 (CYP)-mediated sites of metabolism (SOMs) on drug-like molecules. Predictions may be made for the promiscuous 2C9, 2D6 and 3A4 CYP isozymes, as well as CYPs 1A2, 2A6, 2B6, 2C8, 2C19 and 2E1. RS-WebPredictor is the first freely accessible server that predicts the regioselectivity of the last six isozymes. Server execution time is fast, taking on average 2s to encode a submitted molecule and 1s to apply a given model, allowing for high-throughput use in lead optimization projects. AVAILABILITY RS-WebPredictor is accessible for free use at http://reccr.chem.rpi.edu/Software/RS-WebPredictor/

Bioinformatics | 2013

Scaffold Network Generator: A Tool for Mining Molecular Structures

Matthew Matlock; Jed Zaretzki; S. Joshua Swamidass

SUMMARY Scaffold network generator (SNG) is an open-source command-line utility that computes the hierarchical network of scaffolds that define a large set of input molecules. Scaffold networks are useful for visualizing, analysing and understanding the chemical data that is increasingly available through large public repositories like PubChem. For example, some groups have used scaffold networks to identify missed-actives in high-throughput screens of small molecules with bioassays. Substantially improving on existing software, SNG is robust enough to work on millions of molecules at a time with a simple command-line interface. AVAILABILITY AND IMPLEMENTATION SNG is accessible at http://swami.wustl.edu/sng

Bioinformatics | 2015

Extending P450 site-of-metabolism models with region-resolution data

Jed Zaretzki; Michael R. Browning; Tyler B. Hughes; S. Joshua Swamidass

MOTIVATION Cytochrome P450s are a family of enzymes responsible for the metabolism of approximately 90% of FDA-approved drugs. Medicinal chemists often want to know which atoms of a molecule-its metabolized sites-are oxidized by Cytochrome P450s in order to modify their metabolism. Consequently, there are several methods that use literature-derived, atom-resolution data to train models that can predict a molecules sites of metabolism. There is, however, much more data available at a lower resolution, where the exact site of metabolism is not known, but the region of the molecule that is oxidized is known. Until now, no site-of-metabolism models made use of region-resolution data. RESULTS Here, we describe XenoSite-Region, the first reported method for training site-of-metabolism models with region-resolution data. Our approach uses the Expectation Maximization algorithm to train a site-of-metabolism model. Region-resolution metabolism data was simulated from a large site-of-metabolism dataset, containing 2000 molecules with 3400 metabolized and 30 000 un-metabolized sites and covering nine Cytochrome P450 isozymes. When training on the same molecules (but with only region-level information), we find that this approach yields models almost as accurate as models trained with atom-resolution data. Moreover, we find that atom-resolution trained models are more accurate when also trained with region-resolution data from additional molecules. Our approach, therefore, opens up a way to extend the applicable domain of site-of-metabolism models into larger regions of chemical space. This meets a critical need in drug development by tapping into underutilized data commonly available in most large drug companies. AVAILABILITY AND IMPLEMENTATION The algorithm, data and a web server are available at http://swami.wustl.edu/xregion.

Journal of Chemical Information and Modeling | 2015

Improved Prediction of CYP-Mediated Metabolism with Chemical Fingerprints

Jed Zaretzki; Kevin M. Boehm; S. Joshua Swamidass

Molecule and atom fingerprints, similar to path-based Daylight fingerprints, can substantially improve the accuracy of P450 site-of-metabolism prediction models. Only two chemical fingerprints have been used in metabolism prediction, so little is known about the importance of fingerprint parameters on site of metabolism predictions. It is possible that different fingerprints might yield more accurate models. Here, we study if tuning fingerprints to specific site of metabolism data sets can lead to improved models. We measure the impact of 484 specific chemical fingerprints on the accuracy of P450 site-of-metabolism prediction models on nine P450 isoform site of metabolism data sets. Using a range of search depths, we study path, circular, and subgraph fingerprints. Two different labelings, also, are considered, both standard SMILES labels and also a labeling that marks ring bonds differently than nonring bonds, enabling ortho, para, and meta positioning of substituents to be more clearly encoded. Optimal fingerprint models chosen by cross-validation performance on the full training data are, on average, 3.8% (Top-2; percent of molecules with a site of metabolism in the top two predictions) and 1.4% (AUC; area under the ROC curve) more accurate than base fingerprint models. These gains represent, respectively, a 25.6% and 16.7% reduction in error. A more rigorous assessment selects fingerprints within each cross-validation fold, sometimes selecting different fingerprints for different folds, but yielding a more reliable estimate of generalization error. In this assessment, averaging the scores from the top few fingerprints yields performances improvements of, on average, 3.0% (Top-2) and 0.7% (AUC). These gains are statistically significant and represent, respectively, a 20.1% and 8.8% reduction in error. Between different isoforms, not many consistencies were observed among the top performing fingerprints, with different fingerprints working best for different isoforms. These results suggest that there are important gains achievable in site of metabolism modeling by including and optimizing atom and molecule fingerprints. The optimal site of metabolism models determined by this approach are available for use at http://swami.wustl.edu/.

Journal of Chemical Information and Modeling | 2011