Charles Bergeron
Rensselaer Polytechnic Institute
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Charles Bergeron.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2012
Charles Bergeron; Gregory M. Moore; Jed Zaretzki; Curt M. Breneman; Kristin P. Bennett
We present a bundle algorithm for multiple-instance classification and ranking. These frameworks yield improved models on many problems possessing special structure. Multiple-instance loss functions are typically nonsmooth and nonconvex, and current algorithms convert these to smooth nonconvex optimization problems that are solved iteratively. Inspired by the latest linear-time subgradient-based methods for support vector machines, we optimize the objective directly using a nonconvex bundle method. Computational results show this method is linearly scalable, while not sacrificing generalization accuracy, permitting modeling on new and larger data sets in computational chemistry and other applications. This new implementation facilitates modeling with kernels.
Journal of Chemical Information and Modeling | 2012
Jed Zaretzki; Patrik Rydberg; Charles Bergeron; Kristin P. Bennett; Lars Olsen; Curt M. Breneman
RS-Predictor is a tool for creating pathway-independent, isozyme-specific, site of metabolism (SOM) prediction models using any set of known cytochrome P450 (CYP) substrates and metabolites. Until now, the RS-Predictor method was only trained and validated on CYP 3A4 data, but in the present study, we report on the versatility the RS-Predictor modeling paradigm by creating and testing regioselectivity models for substrates of the nine most important CYP isozymes. Through curation of source literature, we have assembled 680 substrates distributed among CYPs 1A2, 2A6, 2B6, 2C19, 2C8, 2C9, 2D6, 2E1, and 3A4, the largest publicly accessible collection of P450 ligands and metabolites released to date. A comprehensive investigation into the importance of different descriptor classes for identifying the regioselectivity mediated by each isozyme is made through the generation of multiple independent RS-Predictor models for each set of isozyme substrates. Two of these models include a density functional theory (DFT) reactivity descriptor derived from SMARTCyp. Optimal combinations of RS-Predictor and SMARTCyp are shown to have stronger performance than either method alone, while also exceeding the accuracy of the commercial regioselectivity prediction methods distributed by Optibrium and Schrödinger, correctly identifying a large proportion of the metabolites in each substrate set within the top two rank-positions: 1A2 (83.0%), 2A6 (85.7%), 2B6 (82.1%), 2C19 (86.2%), 2C8 (83.8%), 2C9 (84.5%), 2D6 (85.9%), 2E1 (82.8%), 3A4 (82.3%), and merged (86.0%). Comprehensive datamining of each substrate set and careful statistical analyses of the predictions made by the different models revealed new insights into molecular features that control metabolic regioselectivity and enable accurate prospective prediction of likely SOMs.
BMC Genomics | 2007
Kristin P. Bennett; Charles Bergeron; Evrim Acar; Robert F. Klees; Scott L. Vandenberg; Bülent Yener; George E. Plopper
BackgroundRecently, we demonstrated that human mesenchymal stem cells (hMSC) stimulated with dexamethazone undergo gene focusing during osteogenic differentiation (Stem Cells Dev 14(6): 1608–20, 2005). Here, we examine the protein expression profiles of three additional populations of hMSC stimulated to undergo osteogenic differentiation via either contact with pro-osteogenic extracellular matrix (ECM) proteins (collagen I, vitronectin, or laminin-5) or osteogenic media supplements (OS media). Specifically, we annotate these four protein expression profiles, as well as profiles from naïve hMSC and differentiated human osteoblasts (hOST), with known gene ontologies and analyze them as a tensor with modes for the expressed proteins, gene ontologies, and stimulants.ResultsDirect component analysis in the gene ontology space identifies three components that account for 90% of the variance between hMSC, osteoblasts, and the four stimulated hMSC populations. The directed component maps the differentiation stages of the stimulated stem cell populations along the differentiation axis created by the difference in the expression profiles of hMSC and hOST. Surprisingly, hMSC treated with ECM proteins lie closer to osteoblasts than do hMSC treated with OS media. Additionally, the second component demonstrates that proteomic profiles of collagen I- and vitronectin-stimulated hMSC are distinct from those of OS-stimulated cells. A three-mode tensor analysis reveals additional focus proteins critical for characterizing the phenotypic variations between naïve hMSC, partially differentiated hMSC, and hOST.ConclusionThe differences between the proteomic profiles of OS-stimulated hMSC and ECM-hMSC characterize different transitional phenotypes en route to becoming osteoblasts. This conclusion is arrived at via a three-mode tensor analysis validated using hMSC plated on laminin-5.
international conference on machine learning | 2008
Charles Bergeron; Jed Zaretzki; Curt M. Breneman; Kristin P. Bennett
This paper introduces a novel machine learning model called multiple instance ranking (MIRank) that enables ranking to be performed in a multiple instance learning setting. The motivation for MIRank stems from the hydrogen abstraction problem in computational chemistry, that of predicting the group of hydrogen atoms from which a hydrogen is abstracted (removed) during metabolism. The model predicts the preferred hydrogen group within a molecule by ranking the groups, with the ambiguity of not knowing which hydrogen atom within the preferred group is actually abstracted. This paper formulates MIRank in its general context and proposes an algorithm for solving MIRank problems using successive linear programming. The method outperforms multiple instance classification models on several real and synthetic datasets.
Machine Learning | 2011
Gregory M. Moore; Charles Bergeron; Kristin P. Bennett
This paper introduces two types of nonsmooth optimization methods for selecting model hyperparameters in primal SVM models based on cross-validation. Unlike common grid search approaches for model selection, these approaches are scalable both in the number of hyperparameters and number of data points. Taking inspiration from linear-time primal SVM algorithms, scalability in model selection is achieved by directly working with the primal variables without introducing any dual variables. The proposed implicit primal gradient descent (ImpGrad) method can utilize existing SVM solvers. Unlike prior methods for gradient descent in hyperparameters space, all work is done in the primal space so no inversion of the kernel matrix is required. The proposed explicit penalized bilevel programming (PBP) approach optimizes both the hyperparameters and parameters simultaneously. It solves the original cross-validation problem by solving a series of least squares regression problems with simple constraints in both the hyperparameter and parameter space. Computational results on least squares support vector regression problems with multiple hyperparameters establish that both the implicit and explicit methods perform quite well in terms of generalization and computational time. These methods are directly applicable to other learning tasks with differentiable loss functions and regularization functions. Both the implicit and explicit algorithms investigated represent powerful new approaches to solving large bilevel programs involving nonsmooth loss functions.
Journal of Chemical Information and Modeling | 2013
Tao-wei Huang; Jed Zaretzki; Charles Bergeron; Kristin P. Bennett; Curt M. Breneman
Computational methods that can identify CYP-mediated sites of metabolism (SOMs) of drug-like compounds have become required tools for early stage lead optimization. In recent years, methods that combine CYP binding site features with CYP/ligand binding information have been sought in order to increase the prediction accuracy of such hybrid models over those that use only one representation. Two challenges that any hybrid ligand/structure-based method must overcome are (1) identification of the best binding pose for a specific ligand with a given CYP and (2) appropriately incorporating the results of docking with ligand reactivity. To address these challenges we have created Docking-Regioselectivity-Predictor (DR-Predictor)--a method that incorporates flexible docking-derived information with specialized electronic reactivity and multiple-instance-learning methods to predict CYP-mediated SOMs. In this study, the hybrid ligand-structure-based DR-Predictor method was tested on substrate sets for CYP 1A2 and CYP 2A6. For these data, the DR-Predictor model was found to identify the experimentally observed SOM within the top two predicted rank-positions for 86% of the 261 1A2 substrates and 83% of the 100 2A6 substrates. Given the accuracy and extendibility of the DR-Predictor method, we anticipate that it will further facilitate the prediction of CYP metabolism liabilities and aid in in-silico ADMET assessment of novel structures.
Journal of Chemical Information and Modeling | 2011
Charles Bergeron; Gregory M. Moore; Michael P. Krein; Curt M. Breneman; Kristin P. Bennett
Least-squares fitting of the Hill equation to quantitative high-throughput screening (qHTS) assays results in frequent unsatisfactory fits. We learn and exploit prior knowledge to improve the Hill fitting in a nonlinear regression method called domain knowledge fitter (DK-fitter). This paper formulates and solves DK-fitter for 44 public qHTS data sets. This new Hill parameter estimation technique is validated using three unbiased approaches, including a novel method that involves generating simulated samples. This paper fosters the extraction of higher quality information from screens for improved potency evaluation.
Molecular Informatics | 2011
Charles Bergeron; Michael P. Krein; Gregory M. Moore; Curt M. Breneman; Kristin P. Bennett
Making suitable modeling choices is crucial for successful in silico drug design, and one of the most important of these is the proper extraction and curation of data from qHTS screens, and the use of optimized statistical learning methods to obtain valid models. More specifically, we aim to learn the top‐1 % most potent compounds against a variety of targets in a procedure we call virtual screening hit identification (VISHID). To do so, we exploit quantitative high‐throughput screens (qHTS) obtained from PubChem, descriptors derived from molecular structures, and support vector machines (SVM) for model generation. Our results illustrate how an appreciation of subtle issues underlying qHTS data extraction and the resulting SVM models created using these data can enhance the effectiveness of solutions and, in doing so, accelerate drug discovery.
international conference on data mining | 2009
Gregory M. Moore; Charles Bergeron; Kristin P. Bennett
We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via
Journal of Chemical Information and Modeling | 2011
Jed Zaretzki; Charles Bergeron; Patrik Rydberg; Tao-wei Huang; Kristin P. Bennett; Curt M. Breneman
T