Timon Schroeter
Technical University of Berlin
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Timon Schroeter.
Journal of Chemical Information and Modeling | 2009
Katja Hansen; Sebastian Mika; Timon Schroeter; Andreas Sutter; Antonius Ter Laak; Thomas Steger-Hartmann; Nikolaus Heinrich; Klaus-Robert Müller
Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.
Journal of Computer-aided Molecular Design | 2007
Timon Schroeter; Anton Schwaighofer; Sebastian Mika; Antonius Ter Laak; Detlev Suelzle; Ursula Ganzer; Nikolaus Heinrich; Klaus-Robert Müller
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
ChemMedChem | 2010
Matthias Rupp; Timon Schroeter; Ramona Steri; Heiko Zettl; Ewgenij Proschak; Katja Hansen; Oliver Rau; Oliver Schwarz; Lutz Müller-Kuhrt; Manfred Schubert-Zsilavecz; Klaus-Robert Müller; Gisbert Schneider
Peroxisome proliferator-activated receptors (PPARs) are nuclear proteins that act as transcription factors. They represent a validated drug target class involved in lipid and glucose metabolism as well as inflammatory response regulation. We combined state-of-the-art machine learning methods including Gaussian process (GP) regression, multiple kernel learning, the ISOAK molecular graph kernel, and a novel loss function to virtually screen a large compound collection for potential PPAR activators; 15 compounds were tested in a cellular reporter gene assay. The most potent PPARg-selective hit (EC50 = 10 0.2 mm) is a derivative of the natural product truxillic acid. Truxillic acid derivatives are known to be anti-inflammatory agents, potentially due to PPARg activation. Our study underscores the usefulness of modern machine learning algorithms for finding potent bioactive compounds and presents an example of scaffold-hopping from synthetic compounds to natural products. We thus motivate virtual screening of natural product collections as a source of novel lead compounds. The results of our study suggest that pharmacophoric patterns of synthetic bioactive compounds can be traced back to natural products, and this will be useful for “de-orphanizing” the natural bioactive agent. PPARs are present in three known isoforms: PPARa, PPARb (d), and PPARg, with different expression patterns according to their function. PPAR activation leads to an increased expression of key enzymes and proteins involved in the uptake and metabolism of lipids and glucose. Unsaturated fatty acids and eicosanoids such as linoleic acid and arachidonic acid are physiological PPAR activators. Owing to their central role in glucose and lipid homeostasis, PPARs represent attractive drug targets for the treatment of diabetes and dyslipidemia. Glitazones (thiazolidinediones) such as pioglitazone and rosiglitazone act as selective activators of PPARg and are used as therapeutics for diabetes mellitus type 2. In addition to synthetic activators, herbs are traditionally used for treatment of metabolic disorders, and some herbal ingredients have been identified as PPARg activators, for example, carnosol and carnosic acid, as well as several terpenoids and flavonoids. 12] We used several machine learning methods, with synthetic PPAR agonists as input, to find common pharmacophoric patterns for virtual screening in both synthetic and natural product derived substances. We focused on GP models, which originate from Bayesian statistics. Their original applications in cheminformatics were aimed at predicting aqueous solubility, blood–brain barrier penetration, hERG (human ethergo-go-related gene) inhibition, 15] and metabolic stability. A particular advantage of GPs is that they provide error estimates with their predictions. In GP modeling of molecular properties, one defines a positive definite kernel function to model molecular similarity. Compound information enters GP models only via this function, so relevant (context-dependent) physicochemical properties must be captured. This is done by computing molecular descriptors (physicochemical property vectors), or by graph kernels that are defined directly on the molecular graph. From a family of functions that are potentially able to model the underlying structure–activity relationship (“prior”), only functions that agree with the data are retained (Figure 1). The weighted average of the retained functions (“posterior”) acts as predictor, and its variance as an estimate of the confidence in the predic-
Molecular Informatics | 2011
Katja Hansen; David Baehrens; Timon Schroeter; Matthias Rupp; Klaus-Robert Müller
Statistical models are frequently used to estimate molecular properties, e.g., to establish quantitative structure‐activity and structure‐property relationships. For such models, interpretability, knowledge of the domain of applicability, and an estimate of confidence in the predictions are essential. We develop and validate a method for the interpretation of kernel‐based prediction models. As a consequence of interpretability, the method helps to assess the domain of applicability of a model, to judge the reliability of a prediction, and to determine relevant molecular features. Increased interpretability also facilitates the acceptance of such models. Our method is based on visualization: For each prediction, the most contributing training samples are computed and visualized. We quantitatively show the effectiveness of our approach by conducting a questionnaire study with 71 participants, resulting in significant improvements of the participants’ ability to distinguish between correct and incorrect predictions of a Gaussian process model for Ames mutagenicity.
Journal of Chemical Information and Modeling | 2009
Katja Hansen; Fabian Rathke; Timon Schroeter; Georg Rast; Thomas Fox; Jan M. Kriegl; Sebastian Mika
In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 +/- 0.01 to 0.57 +/- 0.01 using a local bias correction strategy.
Combinatorial Chemistry & High Throughput Screening | 2009
Anton Schwaighofer; Timon Schroeter; Sebastian Mika; Gilles Blanchard
A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (error bars) can be obtained from these methods. We continue with important aspects for model building and evaluation, such as methodologies for model selection, evaluation, performance criteria, and how the quality of error bar estimates can be verified. Besides an introduction to the respective methods, we will also point to available implementations, and discuss important issues for the practical application.
ChemMedChem | 2007
Timon Schroeter; Anton Schwaighofer; Sebastian Mika; Antonius Ter Laak; Detlev Suelzle; Ursula Ganzer; Nikolaus Heinrich; Klaus-Robert Müller
Many drug failures are due to an unfavorable ADMET profile (Absorption, Distribution, Metabolism, Excretion & Toxicity). Lipophilicity is intimately connected with ADMET and in today’s drug discovery process, the octanol water partition coefficient log P and it’s pH dependant counterpart log D have to be taken into account early on in lead discovery. Commercial tools available for ’in silico’ prediction of ADMET or lipophilicity parameters usually have been trained on relatively small and mostly neutral molecules, therefore their accuracy on industrial in-house data leaves room for considerable improvement (see Bruneau et al. and references therein). Using modern kernel-based machine learning algorithms – so called Gaussian Processes (GP)– this study constructs different log P and log D7 models that exhibit excellent predictions which compare favorably to state-of-the-art tools on both benchmark and in-house data sets.
Bioorganic & Medicinal Chemistry Letters | 2010
Ramona Steri; Matthias Rupp; Ewgenij Proschak; Timon Schroeter; Heiko Zettl; Katja Hansen; Oliver Schwarz; Lutz Müller-Kuhrt; Klaus-Robert Müller; Gisbert Schneider; Manfred Schubert-Zsilavecz
In previous studies, we identified a truxillic acid derivative as selective activator of the peroxisome proliferator-activated receptor gamma, which is a member of the nuclear receptor family and acts as ligand-activated transcription factor of genes involved in glucose metabolism. Herein we present the structure-activity relationships of 16 truxillic acid derivatives, investigated by a cell-based reporter gene assay guided by molecular docking analysis.
Chemistry Central Journal | 2009
Timon Schroeter; Matthias Rupp; Katja Hansen; Klaus-Robert Müller; Gisbert Schneider
For a virtual screening study, we introduce a combination of machine learning techniques, employing a graph kernel, Gaussian process regression and clustered cross-validation. The aim was to find ligands of peroxisome-proliferator activated receptor gamma (PPAR-y). The receptors in the PPAR family belong to the steroid-thyroid-retinoid superfamily of nuclear receptors and act as transcription factors. They play a role in the regulation of lipid and glucose metabolism in vertebrates and are linked to various human processes and diseases. For this study, we used a dataset of 176 PPAR-y agonists published by Ruecker et al. ...
COMPLIFE 2007: The Third International Symposium on Computational Life Science | 2007
Timon Schroeter; Anton Schwaighofer; Sebastian Mika; Antonius Ter Laak; Detlev Suelzle; Ursula Ganzer; Nikolaus Heinrich; Klaus-Robert Müller
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.