Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Katja Hansen is active.

Publication


Featured researches published by Katja Hansen.


Journal of Chemical Information and Modeling | 2009

Benchmark data set for in silico prediction of Ames mutagenicity.

Katja Hansen; Sebastian Mika; Timon Schroeter; Andreas Sutter; Antonius Ter Laak; Thomas Steger-Hartmann; Nikolaus Heinrich; Klaus-Robert Müller

Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.


Journal of Chemical Theory and Computation | 2013

Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies

Katja Hansen; Grégoire Montavon; Franziska Biegler; Siamac Fazli; Matthias Rupp; Matthias Scheffler; O. Anatole von Lilienfeld; Alexandre Tkatchenko; Klaus-Robert Müller

The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.


Physical Review Letters | 2012

Finding Density Functionals with Machine Learning

John C. Snyder; Matthias Rupp; Katja Hansen; Klaus-Robert Müller; Kieron Burke

Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of noninteracting fermions in 1D, mean absolute errors below 1 kcal/mol on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. The challenges for application of our method to real electronic structure problems are discussed.


New Journal of Physics | 2013

Machine learning of molecular electronic properties in chemical compound space

Grégoire Montavon; Matthias Rupp; Vivekanand V. Gobre; Alvaro Vazquez-Mayagoitia; Katja Hansen; Alexandre Tkatchenko; Klaus-Robert Müller; O. Anatole von Lilienfeld

The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure?property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e.?nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ?quantum machine? is similar, and sometimes superior, to modern quantum-chemical methods?at negligible computational cost.


Journal of Physical Chemistry Letters | 2015

Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space

Katja Hansen; Franziska Biegler; Raghunathan Ramakrishnan; Wiktor Pronobis; O. Anatole von Lilienfeld; Klaus-Robert Müller; Alexandre Tkatchenko

Simultaneously accurate and efficient prediction of molecular properties throughout chemical compound space is a critical ingredient toward rational compound design in chemical and pharmaceutical industries. Aiming toward this goal, we develop and apply a systematic hierarchy of efficient empirical methods to estimate atomization and total energies of molecules. These methods range from a simple sum over atoms, to addition of bond energies, to pairwise interatomic force fields, reaching to the more sophisticated machine learning approaches that are capable of describing collective interactions between many atoms or bonds. In the case of equilibrium molecular geometries, even simple pairwise force fields demonstrate prediction accuracy comparable to benchmark energies calculated using density functional theory with hybrid exchange-correlation functionals; however, accounting for the collective many-body interactions proves to be essential for approaching the “holy grail” of chemical accuracy of 1 kcal/mol for both equilibrium and out-of-equilibrium geometries. This remarkable accuracy is achieved by a vectorized representation of molecules (so-called Bag of Bonds model) that exhibits strong nonlocality in chemical space. In addition, the same representation allows us to predict accurate electronic properties of molecules, such as their polarizability and molecular frontier orbital energies.


Journal of Chemical Physics | 2013

Orbital-free bond breaking via machine learning

John C. Snyder; Matthias Rupp; Katja Hansen; Leo Blooston; Klaus-Robert Müller; Kieron Burke

Using a one-dimensional model, we explore the ability of machine learning to approximate the non-interacting kinetic energy density functional of diatomics. This nonlinear interpolation between Kohn-Sham reference calculations can (i) accurately dissociate a diatomic, (ii) be systematically improved with increased reference data and (iii) generate accurate self-consistent densities via a projection method that avoids directions with no data. With relatively few densities, the error due to the interpolation is smaller than typical errors in standard exchange-correlation functionals.


Journal of Chemical Physics | 2012

Optimizing transition states via kernel-based machine learning

Zachary D. Pozun; Katja Hansen; Daniel Sheppard; Matthias Rupp; Klaus-Robert Müller; Graeme Henkelman

We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface.


ChemMedChem | 2010

From Machine Learning to Natural Product Derivatives that Selectively Activate Transcription Factor PPARγ

Matthias Rupp; Timon Schroeter; Ramona Steri; Heiko Zettl; Ewgenij Proschak; Katja Hansen; Oliver Rau; Oliver Schwarz; Lutz Müller-Kuhrt; Manfred Schubert-Zsilavecz; Klaus-Robert Müller; Gisbert Schneider

Peroxisome proliferator-activated receptors (PPARs) are nuclear proteins that act as transcription factors. They represent a validated drug target class involved in lipid and glucose metabolism as well as inflammatory response regulation. We combined state-of-the-art machine learning methods including Gaussian process (GP) regression, multiple kernel learning, the ISOAK molecular graph kernel, and a novel loss function to virtually screen a large compound collection for potential PPAR activators; 15 compounds were tested in a cellular reporter gene assay. The most potent PPARg-selective hit (EC50 = 10 0.2 mm) is a derivative of the natural product truxillic acid. Truxillic acid derivatives are known to be anti-inflammatory agents, potentially due to PPARg activation. Our study underscores the usefulness of modern machine learning algorithms for finding potent bioactive compounds and presents an example of scaffold-hopping from synthetic compounds to natural products. We thus motivate virtual screening of natural product collections as a source of novel lead compounds. The results of our study suggest that pharmacophoric patterns of synthetic bioactive compounds can be traced back to natural products, and this will be useful for “de-orphanizing” the natural bioactive agent. PPARs are present in three known isoforms: PPARa, PPARb (d), and PPARg, with different expression patterns according to their function. PPAR activation leads to an increased expression of key enzymes and proteins involved in the uptake and metabolism of lipids and glucose. Unsaturated fatty acids and eicosanoids such as linoleic acid and arachidonic acid are physiological PPAR activators. Owing to their central role in glucose and lipid homeostasis, PPARs represent attractive drug targets for the treatment of diabetes and dyslipidemia. Glitazones (thiazolidinediones) such as pioglitazone and rosiglitazone act as selective activators of PPARg and are used as therapeutics for diabetes mellitus type 2. In addition to synthetic activators, herbs are traditionally used for treatment of metabolic disorders, and some herbal ingredients have been identified as PPARg activators, for example, carnosol and carnosic acid, as well as several terpenoids and flavonoids. 12] We used several machine learning methods, with synthetic PPAR agonists as input, to find common pharmacophoric patterns for virtual screening in both synthetic and natural product derived substances. We focused on GP models, which originate from Bayesian statistics. Their original applications in cheminformatics were aimed at predicting aqueous solubility, blood–brain barrier penetration, hERG (human ethergo-go-related gene) inhibition, 15] and metabolic stability. A particular advantage of GPs is that they provide error estimates with their predictions. In GP modeling of molecular properties, one defines a positive definite kernel function to model molecular similarity. Compound information enters GP models only via this function, so relevant (context-dependent) physicochemical properties must be captured. This is done by computing molecular descriptors (physicochemical property vectors), or by graph kernels that are defined directly on the molecular graph. From a family of functions that are potentially able to model the underlying structure–activity relationship (“prior”), only functions that agree with the data are retained (Figure 1). The weighted average of the retained functions (“posterior”) acts as predictor, and its variance as an estimate of the confidence in the predic-


Molecular Informatics | 2011

Visual Interpretation of Kernel-Based Prediction Models.

Katja Hansen; David Baehrens; Timon Schroeter; Matthias Rupp; Klaus-Robert Müller

Statistical models are frequently used to estimate molecular properties, e.g., to establish quantitative structure‐activity and structure‐property relationships. For such models, interpretability, knowledge of the domain of applicability, and an estimate of confidence in the predictions are essential. We develop and validate a method for the interpretation of kernel‐based prediction models. As a consequence of interpretability, the method helps to assess the domain of applicability of a model, to judge the reliability of a prediction, and to determine relevant molecular features. Increased interpretability also facilitates the acceptance of such models. Our method is based on visualization: For each prediction, the most contributing training samples are computed and visualized. We quantitatively show the effectiveness of our approach by conducting a questionnaire study with 71 participants, resulting in significant improvements of the participants’ ability to distinguish between correct and incorrect predictions of a Gaussian process model for Ames mutagenicity.


Journal of Chemical Information and Modeling | 2009

Bias-correction of regression models: a case study on hERG inhibition.

Katja Hansen; Fabian Rathke; Timon Schroeter; Georg Rast; Thomas Fox; Jan M. Kriegl; Sebastian Mika

In the present work we develop a predictive QSAR model for the blockade of the hERG channel. Additionally, this specific end point is used as a test scenario to develop and evaluate several techniques for fusing predictions from multiple regression models. hERG inhibition models which are presented here are based on a combined data set of roughly 550 proprietary and 110 public domain compounds. Models are built using various statistical learning techniques and different sets of molecular descriptors. Single Support Vector Regression, Gaussian Process, or Random Forest models achieve root mean-squared errors of roughly 0.6 log units as determined from leave-group-out cross-validation. An analysis of the evaluation strategy on the performance estimates shows that standard leave-group-out cross-validation yields overly optimistic results. As an alternative, a clustered cross-validation scheme is introduced to obtain a more realistic estimate of the model performance. The evaluation of several techniques to combine multiple prediction models shows that the root mean squared error as determined from clustered cross-validation can be reduced from 0.73 +/- 0.01 to 0.57 +/- 0.01 using a local bias correction strategy.

Collaboration


Dive into the Katja Hansen's collaboration.

Top Co-Authors

Avatar

Klaus-Robert Müller

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Timon Schroeter

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Sebastian Mika

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar

Gisbert Schneider

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ewgenij Proschak

Goethe University Frankfurt

View shared research outputs
Researchain Logo
Decentralizing Knowledge