Jürgen Bajorath
University of Bonn
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jürgen Bajorath.
Journal of Medicinal Chemistry | 2010
Anne Mai Wassermann; Mathias Wawer; Jürgen Bajorath
The study of compound structure-activity relationships (SARs) is one of the central themes in medicinal chemistry. SAR information is analyzed in different contexts, from screening and hit-to-lead to lead optimization projects. For the exploration of SARs, the concept of an activity landscape, which integrates molecular similarity and potency information, is of high relevance. The computational study of activity landscapes is still an evolving field. Activity landscapemodels are designed to rationalize SAR features of compound data sets and select key compounds for chemical exploration. The choice of molecular representations and the way molecular similarity is assessed are critically important factors for landscape generation and analysis. Graphical representation of SAR features is a major focal point of landscape modeling. Although complex activity landscapes are generally difficult to analyze, much progress has recently been made in extracting SAR information from various landscape views. This Perspective aims to provide an overview of the state-of-theart in activity landscape analysis and a discussion of its potential for medicinal chemistry applications. Understanding how structural modifications affect the biological activity of compounds or deriving a pharmacophore hypothesis from diverse active chemical entities present challenges that can be tackled using medicinal chemistry experience and intuition and/or computational tools. By no means is SAR analysis a priori dependent on computational methods. Rather, SAR analysis is often carried out on paper or whiteboards, by comparing molecular graphs of active compounds, consistentwith theway chemists are traditionally trained. It has been pointed out that judgments of medicinal chemists are naturally subjective and often inconsistent. This is of course not specific to medicinal chemistry but rather a consequence of how we as individuals subjectively access and evaluate data sets of any kind. Likely inconsistencies in individual judgments about chemical and biological data might well be taken as an argument to promote the use of computational methods for SAR analysis. However, it would be rather careless to assume that computational analysis would per se be objective. In fact, computational objectivity does not exist. We typically apply models with underlying assumptions and inherent approximations that are often only useful within relatively narrow applicability domains and the results of which are generally difficult to evaluate. In this context, it is often overlooked thatwe can notmodel phenomena whose physicochemical or biological foundations we do not understand. Of course, calculations that are carried out and reported should at least be reproducible (one would hope), but reproducibility does not mean objectivity. There is, however, a rather simple factor that generally favors computational approaches to SARanalysis, and that is data set size. As long as one investigates one compound series at a time, knowledge of chemical graphs and activity data might be readily sufficient to deduce and predict SAR behavior. However, as molecular data sets grow in size, we quickly approach our limits to access and compare structures and associated biological properties such that computational data processing and analysis often become essential. Many compound data sets that have accumulated in pharmaceutical settings go far beyond the capacity of medicinal chemistrycentric SARanalysis and require the applicationof specialized computational tools for data handling and also modeling. Again, given the model-based nature of computational SAR analysis schemes, this does notmake SARanalysis necessarily more objective (than individual assessments), but it makes it feasible. Currently available computational approaches to SAR analysis are multifaceted and of rather different methodological complexity. A general distinction can be made between methodologies thatprimarily help toaccess andvisualize SAR data obtained from screening or chemical optimization campaigns and those that ultimately predict biological activities. Among predictive methods, there are, for example, approaches to model linear and nonlinear structure-activity relationships, in particular, those based on the classicalQSAR paradigm, pharmacophore techniques, andvariousmachine learning approaches. Activity landscape methods, as introduced in the following, add to this methodological spectrum a strong focus on data-driven, descriptive, and large-scale SAR analysis schemes.
Journal of Medicinal Chemistry | 2008
Mathias Wawer; Lisa Peltason; Nils Weskamp; Andreas Teckentrup; Jürgen Bajorath
The study of structure-activity relationships (SARs) of small molecules is of fundamental importance in medicinal chemistry and drug design. Here, we introduce an approach that combines the analysis of similarity-based molecular networks and SAR index distributions to identify multiple SAR components present within sets of active compounds. Different compound classes produce molecular networks of distinct topology. Subsets of compounds related by different local SARs are often organized in small communities in networks annotated with potency information. Many local SAR communities are not isolated but connected by chemical bridges, i.e., similar molecules occurring in different local SAR contexts. The analysis makes it possible to relate local and global SAR features to each other and identify key compounds that are major determinants of SAR characteristics. In many instances, such compounds represent start and end points of chemical optimization pathways and aid in the selection of other candidates from their communities.
Journal of Chemical Information and Modeling | 2010
Eugen Lounkine; Mathias Wawer; Anne Mai Wassermann; Jürgen Bajorath
We introduce SARANEA, an open-source Java application for interactive exploration of structure-activity relationship (SAR) and structure-selectivity relationship (SSR) information in compound sets of any source. SARANEA integrates various SAR and SSR analysis functions and utilizes a network-like similarity graph data structure for visualization. The program enables the systematic detection of activity and selectivity cliffs and corresponding key compounds across multiple targets. Advanced SAR analysis functions implemented in SARANEA include, among others, layered chemical neighborhood graphs, cliff indices, selectivity trees, editing functions for molecular networks and pathways, bioactivity summaries of key compounds, and markers for bioactive compounds having potential side effects. We report the application of SARANEA to identify SAR and SSR determinants in different sets of serine protease inhibitors. It is found that key compounds can influence SARs and SSRs in rather different ways. Such compounds and their SAR/SSR characteristics can be systematically identified and explored using SARANEA. The program and source code are made freely available under the GNU General Public License.
Journal of Medicinal Chemistry | 2010
Martin Vogt; Dagmar Stumpfe; Hanna Geppert; Jürgen Bajorath
The scaffold hopping potential of popular 2D fingerprints has been thoroughly investigated. We have found that these types of fingerprints have at least limited scaffold hopping ability including early enrichment of small numbers of active scaffolds at high database ranks. However, it has not been possible to derive Tanimoto coefficient value ranges for individual fingerprints that are generally preferred for scaffold hopping. For selected fingerprints, similarity threshold values have been identified that yield small database selection sets having a high probability to contain a few active scaffolds. Furthermore, essentially all tested fingerprints have shown the ability to enrich scaffold hops in approximately 1% of a screening database. For the test cases reported herein, selecting 0.5-1% of the screening database yields approximately 25% of the available scaffolds. On the basis of our findings, practical guidelines for virtual screening using different types of 2D fingerprints have been formulated.
Journal of Chemical Information and Modeling | 2010
Anne Mai Wassermann; Jürgen Bajorath
Applying the concept of matched molecular pairs, we have systematically analyzed the ability of defined chemical changes to introduce activity cliffs. Public domain compound data were systematically screened for matched molecular pairs that were then organized according to chemical transformations they represent and associated potency changes. From vast available chemical transformation space, including both R-group and core substructure changes, approximately 250 nonredundant substitutions were identified that displayed a general tendency to form activity cliffs. These substitutions introduced activity cliffs in the structural context of diverse scaffolds and in compounds active against many different targets. Activity cliff-forming transformations were often rather simple, including replacements of small functional groups. Moreover, in many instances, chemically very similar transformations were identified that had a much lower propensity to form activity cliffs or no detectable cliff potential. Thus, clear preferences emerged for specific transformations. A compendium of substitutions with general activity cliff-forming potential is provided to aid in compound optimization efforts.
Journal of Medicinal Chemistry | 2011
Mathias Wawer; Jürgen Bajorath
The systematic extraction of structure-activity relationship (SAR) information from large and diverse compound data sets depends on the application of computational analysis methods. Irrespective of the methodological details, the ultimate goal of large-scale SAR analysis is to identify most informative compounds and rationalize structural changes that determine SAR behavior. Such insights provide a basis for further chemical exploration. Herein we introduce the first graphical SAR analysis method that globally organizes large compound data sets on the basis of local structural relationships, hence providing an immediate access to important structural modifications and SAR determinants.
Journal of Chemical Information and Modeling | 2010
Ye Hu; Jürgen Bajorath
Increasing evidence that many pharmaceutically relevant compounds elicit their effects through binding to multiple targets, so-called polypharmacology, is beginning to change conventional drug discovery and design strategies. In light of this paradigm shift, we have mined publicly available compound and bioactivity data for promiscuous chemotypes. For this purpose, a hierarchy of active compounds, atomic property based scaffolds, and unique molecular topologies were generated, and activity annotations were analyzed using this framework. Starting from ∼35 000 compounds active against human targets with at least 1 μM potency, 33 chemotypes with distinct topology were identified that represented molecules active against at least 3 different target families. Network representations were utilized to study scaffold-target family relationships and activity profiles of scaffolds corresponding to promiscuous chemotypes. A subset of promiscuous chemotypes displayed a significant enrichment in drugs over bioactive compounds. A total of 190 drugs were identified that had on average only 2 known target annotations but belonged to the 7 most promiscuous chemotypes that were active against 8-15 target families. These drugs should be attractive candidates for polypharmacological profiling.
Journal of Chemical Information and Modeling | 2010
Mathias Wawer; Jürgen Bajorath
An intuitive and generally applicable analysis method, termed similarity-potency tree (SPT), is introduced to mine structure-activity relationship (SAR) information in compound data sets of any source. Only compound potency values and nearest-neighbor similarity relationships are considered. Rather than analyzing a data set as a whole, in part overlapping compound neighborhoods are systematically generated and represented as SPTs. This local analysis scheme simplifies the evaluation of SAR information and SPTs of high SAR information content are easily identified. By inspecting only a limited number of compound neighborhoods, it is also straightforward to determine whether data sets contain only little or no interpretable SAR information. Interactive analysis of SPTs is facilitated by reading the trees in two directions, which makes it possible to extract SAR rules, if available, in a consistent manner. The simplicity and interpretability of the data structure and the ease of calculation are characteristic features of this approach. We apply the methodology to high-throughput screening and lead optimization data sets, compare the approach to standard clustering techniques, illustrate how SAR rules are derived, and provide some practical guidance how to best utilize the methodology. The SPT program is made freely available to the scientific community.
Journal of Chemical Information and Modeling | 2011
Ye Hu; Dagmar Stumpfe; Jürgen Bajorath
The scaffold concept is one of the most frequently applied concepts in medicinal chemistry and virtual screening. The term scaffold is used to describe molecular core structures that are utilized in drug design or detected in virtual screening and, in addition, building blocks for synthetic efforts. For a series of analogs, a scaffold might be derived by determining their maximum common substructure, but there are many other ways to define scaffolds (vide infra). Unfortunately, in chemoinformatics, the scaffold concept is often applied in a rather subjective manner, without adhering to clear, formal, and consistent definitions. For scaffold hopping, i.e., the identification of different scaffolds with similar activity that represents the “holy grail” of virtual screening, the frequent lack of formal consistency presents a substantial problem and makes it often impossible to compare different studies and methods. In fact, the absence of generally accepted evaluation standards for benchmarking and the inconsistency in assessing scaffold hopping analyses currently are major roadblocks for the further development of the virtual screening field. To further complicate matters, the terms scaffolds, substructures, and fragments are often used to refer to similar or the same structures. Substructures and fragments are rather general designations and are applied to describe small or large structural moieties, scaffolds, parts of scaffolds, or R-groups. Moreover, many different substructures are utilized in drug design applications and different molecular fragmentation schemes have been introduced. 8 These fragmentation methods include knowledge-based approaches such as the generation of fragment dictionaries to flag reactive and toxic compounds or predict ADME properties as well as systematic fragmentation schemes that are based on synthetic or retrosynthetic criteria. Such fragmentation and fragment organization approaches have also provided a basis for the design of fragment libraries in the context of fragment-based drug discovery. 11 Also, even random fragmentation approaches have been introduced to generate structural signatures of compounds with certain biological activities. In addition to knowledge-based and synthetically oriented fragmentation methods, fragments can also be systematically derived on the basis of a defined molecular hierarchy and such approaches have become particularly relevant for scaffold generation and analysis. Regardless of how scaffolds are ultimately rationalized, general aims of scaffold analysis include the assessment of structural diversity of small molecules, the generation of structural classes and structural organization schemes, and the evaluation of biological activities or other molecular properties that are associated with different structural motifs. In this Perspective, we largely, but not exclusively, focus on studies that have analyzed scaffold distributions in selected compound data sets (such as drugs or screening libraries) or in currently available bioactive compounds. A number of these investigations have explored different types of relationships between scaffolds and the biological activities of compounds they represent.
Journal of Chemical Information and Modeling | 2011
Kathrin Heikamp; Jürgen Bajorath
A large-scale similarity search investigation has been carried out on 266 well-defined compound activity classes extracted from the ChEMBL database. The analysis was performed using two widely applied two-dimensional (2D) fingerprints that mark opposite ends of the current performance spectrum of these types of fingerprints, i.e., MACCS structural keys and the extended connectivity fingerprint with bond diameter four (ECFP4). For each fingerprint, three nearest neighbor search strategies were applied. On the basis of these search calculations, a similarity search profile of the ChEMBL database was generated. Overall, the fingerprint search campaign was surprisingly successful. In 203 of 266 test cases (∼76%), a compound recovery rate of at least 50% was observed with at least the better performing fingerprint and one search strategy. The similarity search profile also revealed several general trends. For example, fingerprint searching was often characterized by an early enrichment of active compounds in database selection sets. In addition, compound activity classes have been categorized according to different similarity search performance levels, which helps to put the results of benchmark calculations into perspective. Therefore, a compendium of activity classes falling into different search performance categories is provided. On the basis of our large-scale investigation, the performance range of state-of-the-art 2D fingerprinting has been delineated for compound data sets directed against a wide spectrum of pharmaceutical targets.