Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Knut Baumann is active.

Publication


Featured researches published by Knut Baumann.


Trends in Analytical Chemistry | 2003

Cross-validation as the objective function for variable-selection techniques

Knut Baumann

Different methods of cross-validation are studied for their suitability to guide variable-selection algorithms to yield highly predictive models. It is shown that the commonly applied leave-one-out cross-validation has a strong tendency to overfitting, underestimates the true prediction error, and should not be used without further constraints or further validation. Alternatives to leave-one-out cross-validation and other validation methods are presented.


Journal of Chemical Information and Modeling | 2009

Maximum Unbiased Validation (MUV) Data Sets for Virtual Screening Based on PubChem Bioactivity Data

Sebastian G. Rohrer; Knut Baumann

Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.


Journal of Chemical Information and Computer Sciences | 2002

An Alignment-Independent Versatile Structure Descriptor for QSAR and QSPR Based on the Distribution of Molecular Features

Knut Baumann

A molecular descriptor based upon a count statistic of the topological distance matrix is described and evaluated for use in QSAR studies. Encoding a molecule is done by computing many selective count statistics (histograms) reflecting the distribution of different atom types and bond types in the molecule. The descriptor was also extended to incorporate geometric features of molecules by weighting the topological distance counts with the geometric distance. It is invariant to both translation and rotation. As a result, it does not require the alignment of the structures under study. The method was applied to several QSAR data sets and performed equally well or better than CoMFA and the EVA descriptor. Compared to the latter two methods, it is computationally easier.


Journal of Computer-aided Molecular Design | 2004

Validation tools for variable subset regression.

Knut Baumann; Nikolaus Stiefl

Variable selection is applied frequently in QSAR research. Since the selection process influences the characteristics of the finally chosen model, thorough validation of the selection technique is very important. Here, a validation protocol is presented briefly and two of the tools which are part of this protocol are introduced in more detail. The first tool, which is based on permutation testing, allows to assess the inflation of internal figures of merit (such as the cross-validated prediction error). The other tool, based on noise addition, can be used to determine the complexity and with it the stability of models generated by variable selection. The obtained statistical information is important in deciding whether or not to trust the predictive abilities of a specific model. The graphical output of the validation tools is easily accessible and provides a reliable impression of model performance. Among others, the tools were employed to study the influence of leave-one-out and leave-multiple-out cross-validation on model characteristics. Here, it was confirmed that leave-multiple-out cross-validation yields more stable models. To study the performance of the entire validation protocol, it was applied to eight different QSAR data sets with default settings. In all cases internal and external model performance was good, indicating that the protocol serves its purpose quite well.


ChemMedChem | 2006

Aziridide-based inhibitors of cathepsin L: synthesis, inhibition activity, and docking studies.

Radim Vicik; Matthias Busemann; Christoph Gelhaus; Nikolaus Stiefl; Josef Scheiber; Werner Schmitz; Franziska Schulz; Milena Mladenovic; Bernd Engels; Matthias Leippe; Knut Baumann; Tanja Schirmeister

A comprehensive screening of N‐acylated aziridine (aziridide) based cysteine protease inhibitors containing either Boc‐Leu‐Caa (Caa=cyclic amino acid), Boc‐Gly‐Caa, or Boc‐Phe‐Ala attached to the aziridine nitrogen atom revealed Boc‐(S)‐Leu‐(S)‐Azy‐(S,S)‐Azi(OBn)2 (18 a) as a highly potent cathepsin L (CL) inhibitor (Ki=13 nM) (Azy=aziridine‐2‐carboxylate, Azi=aziridine‐2,3‐dicarboxylate). Docking studies, which also accounted for the unusual bonding situations (the flexibility and hybridization of the aziridides) predict that the inhibitor adopts a Y shape and spans across the entire active site cleft, binding into both the nonprimed and primed sites of CL.


Journal of Chromatography A | 1995

Appropriate calibration functions for capillary electrophoresis II. Heteroscedasticity and its consequences

Knut Baumann; Hermann Wätzig

Abstract If ordinary least squares regression methods are to be used, the standard deviation of the signal should not depend on the sample concentration, but this is not true in CE. Results indicate, that the signal standard deviation is approximately proportional to the sample concentration. Therefore weighted least squares regression must be used, if the standard deviation within the concentration range differs by more than the factor 50. It is advised to use this regression method down to the factor 5, where the difference to ordinary least squares calculations is still significant. This is demonstrated by comparing experimental and simulated data. These considerations are valid for other analytical techniques as well, if their characteristics of calibration and variance function are similar.


Journal of Cheminformatics | 2014

Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation.

Désirée Baumann; Knut Baumann

BackgroundGenerally, QSAR modelling requires both model selection and validation since there is no a priori knowledge about the optimal QSAR model. Prediction errors (PE) are frequently used to select and to assess the models under study. Reliable estimation of prediction errors is challenging – especially under model uncertainty – and requires independent test objects. These test objects must not be involved in model building nor in model selection. Double cross-validation, sometimes also termed nested cross-validation, offers an attractive possibility to generate test data and to select QSAR models since it uses the data very efficiently. Nevertheless, there is a controversy in the literature with respect to the reliability of double cross-validation under model uncertainty. Moreover, systematic studies investigating the adequate parameterization of double cross-validation are still missing. Here, the cross-validation design in the inner loop and the influence of the test set size in the outer loop is systematically studied for regression models in combination with variable selection.MethodsSimulated and real data are analysed with double cross-validation to identify important factors for the resulting model quality. For the simulated data, a bias-variance decomposition is provided.ResultsThe prediction errors of QSAR/QSPR regression models in combination with variable selection depend to a large degree on the parameterization of double cross-validation. While the parameters for the inner loop of double cross-validation mainly influence bias and variance of the resulting models, the parameters for the outer loop mainly influence the variability of the resulting prediction error estimate.ConclusionsDouble cross-validation reliably and unbiasedly estimates prediction errors under model uncertainty for regression models. As compared to a single test set, double cross-validation provided a more realistic picture of model quality and should be preferred over a single test set.


Analytica Chimica Acta | 1997

Computer-assisted IR spectra prediction — linked similarity searches for structures and spectra

Knut Baumann; J.T. Clerc

Abstract The prediction of IR spectra of organic compounds in the range between 2250 and 550 cm −1 containing carbon, nitrogen, oxygen and halogen atoms based on a spectroscopic database is outlined. Structure similarity searches are performed to determine appropriate reference molecules whose spectra are then used for the prediction of the spectrum of the query molecule. The performance and reliability of the prediction system was extensively tested by a ‘leave one out’ procedure.


Molecular Informatics | 2016

Chemoinformatic Classification Methods and their Applicability Domain

Miriam Mathea; Waldemar Klingspohn; Knut Baumann

Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the “normal” objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built‐in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.


Journal of Medicinal Chemistry | 2015

10-Iodo-11H-indolo[3,2-c]quinoline-6-carboxylic Acids Are Selective Inhibitors of DYRK1A.

Hannes Falke; A. Chaikuad; Anja Becker; Nadège Loaëc; Olivier Lozach; Samira Abu Jhaisha; Walter Becker; Peter G. Jones; Lutz Preu; Knut Baumann; Stefan Knapp; Laurent Meijer; Conrad Kunick

The protein kinase DYRK1A has been suggested to act as one of the intracellular regulators contributing to neurological alterations found in individuals with Down syndrome. For an assessment of the role of DYRK1A, selective synthetic inhibitors are valuable pharmacological tools. However, the DYRK1A inhibitors described in the literature so far either are not sufficiently selective or have not been tested against closely related kinases from the DYRK and the CLK protein kinase families. The aim of this study was the identification of DYRK1A inhibitors exhibiting selectivity versus the structurally and functionally closely related DYRK and CLK isoforms. Structure modification of the screening hit 11H-indolo[3,2-c]quinoline-6-carboxylic acid revealed structure–activity relationships for kinase inhibition and enabled the design of 10-iodo-substituted derivatives as very potent DYRK1A inhibitors with considerable selectivity against CLKs. X-ray structure determination of three 11H-indolo[3,2-c]quinoline-6-carboxylic acids cocrystallized with DYRK1A confirmed the predicted binding mode within the ATP binding site.

Collaboration


Dive into the Knut Baumann's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gisbert Schneider

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Jürgen Popp

Leibniz Institute of Photonic Technology

View shared research outputs
Top Co-Authors

Avatar

Jan Dreher

Braunschweig University of Technology

View shared research outputs
Top Co-Authors

Avatar

Magnus Matz

Braunschweig University of Technology

View shared research outputs
Top Co-Authors

Avatar

U. Schmid

Braunschweig University of Technology

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge