Anne Mai Wassermann
University of Bonn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anne Mai Wassermann.
Journal of Chemical Information and Modeling | 2009
Anne Mai Wassermann; Hanna Geppert; Jürgen Bajorath
Support vector machine (SVM) calculations combining protein and small molecule information have been applied to identify ligands for simulated orphan targets (i.e., targets for which no ligands were available). The combination of protein and ligand information was facilitated through the design of target-ligand kernel functions that account for pairwise ligand and target similarity. The design and biological information content of such kernel functions was expected to play a major role for target-directed ligand prediction. Therefore, a variety of target-ligand kernels were implemented to capture different types of target information including sequence, secondary structure, tertiary structure, biophysical properties, ontologies, or structural taxonomy. These kernels were tested in ligand predictions for simulated orphan targets in two target protein systems characterized by the presence of different intertarget relationships. Surprisingly, although there were target- and set-specific differences in prediction rates for alternative target-ligand kernels, the performance of these kernels was overall similar and also similar to SVM linear combinations. Test calculations designed to better understand possible reasons for these observations revealed that ligand information provided by nearest neighbors of orphan targets significantly influenced SVM performance, much more so than the inclusion of protein information. As long as ligands of closely related neighbors of orphan targets were available for SVM learning, orphan target ligands could be well predicted, regardless of the type and sophistication of the kernel function that was used. These findings suggest simplified strategies for SVM-based ligand prediction for orphan targets.
Journal of Chemical Information and Modeling | 2009
Anne Mai Wassermann; Hanna Geppert; Jürgen Bajorath
The identification of small chemical compounds that are selective for a target protein over one or more closely related members of the same family is of high relevance for applications in chemical biology. Conventional 2D similarity searching using known selective molecules as templates has recently been found to preferentially detect selective over non-selective and inactive database compounds. To improve the initially observed search performance, we have attempted to use 2D fingerprints as descriptors for support vector machine (SVM)-based selectivity searching. Different from typically applied binary SVM compound classification, SVM analysis has been adapted here for multiclass predictions and compound ranking to distinguish between selective, active but non-selective, and inactive compounds. In systematic database search calculations, we tested combinations of four alternative SVM ranking schemes, four different kernel functions, and four fingerprints and were able to further improve selectivity search performance by effectively removing non-selective molecules from high ranking positions while retaining high recall of selective compounds.
Journal of Chemical Information and Modeling | 2012
Anne Mai Wassermann; Peter Haebel; Nils Weskamp; Jürgen Bajorath
We introduce the SAR matrix data structure that is designed to elucidate SAR patterns produced by groups of structurally related active compounds, which are extracted from large data sets. SAR matrices are systematically generated and sorted on the basis of SAR information content. Matrix generation is computationally efficient and enables processing of large compound sets. The matrix format is reminiscent of SAR tables, and SAR patterns revealed by different categories of matrices are easily interpretable. The structural organization underlying matrix formation is more flexible than standard R-group decomposition schemes. Hence, the resulting matrices capture SAR information in a comprehensive manner.
ChemMedChem | 2010
Anne Mai Wassermann; Lisa Peltason; Jürgen Bajorath
For series of compounds with activity against multiple targets, the resulting multi‐target structure–activity relationships (mtSARs) are usually difficult to analyze. However, rationalizing mtSARs is of great importance for the development of compounds that are selective for one target over closely related ones. Herein we present a methodological framework for the study of mtSARs and identification of substitution sites in analogue series that are selectivity determinants. Active analogues are subjected to uniform R‐group decomposition, compared on the basis of pharmacophore feature edit distances, and organized in previously reported tree‐like structures that we adapted for mtSAR analysis. These data structures represent a substitution site hierarchy, capture potency variations, and reflect patterns of SAR discontinuity. Generating this data structure for multiple targets makes it possible to determine preference orders for chemical modifications to improve target selectivity. Accordingly, high emphasis is put on the derivation of simple rules to design substitutions that are likely to yield target‐selective compounds. Furthermore, the analysis is applicable to identify both additive and non‐additive effects on compound activity and selectivity as a consequence of multi‐site substitutions.
Drug Development Research | 2012
Anne Mai Wassermann; Dilyana Dimova; Preeti Iyer; Jürgen Bajorath
Preclinical Research
Journal of Chemical Information and Modeling | 2012
Disha Gupta-Ostermann; Mathias Wawer; Anne Mai Wassermann; Jürgen Bajorath
The transfer of SAR information from one analog series to another is a difficult, yet highly attractive task in medicinal chemistry. At present, the evaluation of SAR transfer potential from a data mining perspective is still in its infancy. Only recently, a first computational approach has been introduced to evaluate SAR transfer events. Here, a substructure relationship-based molecular network representation has been used as a starting point to systematically identify SAR transfer series in large compound data sets. For this purpose, a methodology is introduced that consists of two stages. For graph mining, an algorithm has been designed that extracts all parallel series from compound data sets. A parallel series is formed by two series of analogs with different core structures but pairwise corresponding substitution patterns. The SAR transfer potential of identified parallel series is then evaluated using a scoring function that emphasizes corresponding potency progression over many analog pairs and large potency ranges. The substructure relationship-based molecular network in combination with the graph mining algorithm currently represents the only generally applicable approach to systematically detect SAR transfer events in large compound data sets. The combined approach has been evaluated on a large number of compound data sets and shown to systematically identify SAR transfer series.
Information-an International Interdisciplinary Journal | 2010
Martin Vogt; Anne Mai Wassermann; Jürgen Bajorath
The use of computational methodologies for chemical database mining and molecular similarity searching or structure-activity relationship analysis has become an integral part of modern chemical and pharmaceutical research. These types of computational studies fall into the chemoinformatics spectrum and usually have large-scale character. Concepts from information theory such as Shannon entropy and Kullback-Leibler divergence have also been adopted for chemoinformatics applications. In this review, we introduce these concepts, describe their adaptations, and discuss exemplary applications of information theory to a variety of relevant problems. These include, among others, chemical feature (or descriptor) selection, database profiling, and compound recall rate predictions.
Methods of Molecular Biology | 2011
Anne Mai Wassermann; Hanna Geppert; Jürgen Bajorath
Support vector machine (SVM)-based selectivity searching has recently been introduced to identify compounds in virtual screening libraries that are not only active for a target protein, but also selective for this target over a closely related member of the same protein family. In simulated virtual screening calculations, SVM-based strategies termed preference ranking and one-versus-all ranking were successfully applied to rank a database and enrich high-ranking positions with selective compounds while removing nonselective molecules from high ranks. In contrast to the original SVM approach developed for binary classification, these strategies enable learning from more than two classes, considering that distinguishing between selective, promiscuously active, and inactive compounds gives rise to a three-class prediction problem. In this chapter, we describe the extension of the one-versus-all strategy to four training classes. Furthermore, we present an adaptation of the preference ranking strategy that leads to higher recall of selective compounds than previously investigated approaches and is applicable in situations where the removal of nonselective compounds from high-ranking positions is not required.
Methods of Molecular Biology | 2012
Anne Mai Wassermann; Britta Nisius; Martin Vogt; Jürgen Bajorath
The identification of molecular descriptors that are able to distinguish between different compound classes is of paramount importance in chemoinformatics. To aid in the identification of such discriminatory descriptors, concepts from information theory have been adapted. In an earlier study, an approach termed Differential Shannon Entropy (DSE) has been introduced for descriptor profiling to detect and quantify compound database-dependent differences in the information content and value range distribution of descriptors. Because the DSE approach was intrinsically limited in its ability to select compound class-specific descriptors by comparing data sets of very different size, this approach has recently been extended to Mutual Information-DSE (MI-DSE). Herein, DSE, MI-DSE, and the Shannon entropy concept underlying both information theoretic approaches are introduced and compared, and differences between their application areas are discussed.
Molecular Informatics | 2010
Anne Mai Wassermann; Martin Vogt; Jürgen Bajorath
We introduce an entropy‐based methodology, Iterative Shannon entropy (ISE), to quantify the information contained in molecular descriptors and compound selectivity data sets taking data spread directly into account. The method is applicable to determine the information content of any value range dependent data distribution. An analysis of descriptor information content has been carried out to explore alternative binning schemes for entropy calculation. Using this entropic measure we have profiled 153 compound selectivity data sets for combinations of 68 target proteins belonging to 10 target families. With the ISE measure, we aim to assign high information content to compound data sets that span a wide range of selectivity values and different selectivity relationships and hence correspond to more than one biological phenotype. Target families with high average entropy scores are identified. For members of these families, active compounds display highly differentiated selectivity profiles.