Dragos Horvath
University of Strasbourg
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Dragos Horvath.
Current Computer - Aided Drug Design | 2008
Alexandre Varnek; Denis Fourches; Dragos Horvath; Olga Klimchuk; Cédric Gaudin; Philippe Vayer; Vitaly P. Solov'ev; Frank Hoonakker; Igor V. Tetko; Gilles Marcou
In this paper we illustrate the application of the ISIDA (In SIlico design and Data Analysis) software to perform virtual screening of large databases of compounds and reactions and to assess some ADME/Tox properties. ISIDA represents an ensemble of tools allowing users to store, search and analyze the data, to perform similarity searches in large databases of molecules and reactions, to build and validate QSAR models, and to generate and screen virtual combinatorial libraries. It uses its own descriptors (substructural molecular fragments and fuzzy pharmacophore triplets). Workflow can be easily organized by combining different ISIDA modules. Several examples of ISIDA applications (similarity search of potent benzodiazepine ligands with FPT, QSAR modeling of aqueous solubility, aquatic toxicity, tissue-air partition coefficients, anti-HIV activity, and screening of the “Chimiotheque Nationale” Database), are discussed. Particular attention is paid to mining reaction databases using Condensed Reaction Graphs approach.
Environmental Health Perspectives | 2016
Kamel Mansouri; Ahmed Abdelaziz; Aleksandra Rybacka; Alessandra Roncaglioni; Alexander Tropsha; Alexandre Varnek; Alexey V. Zakharov; Andrew Worth; Ann M. Richard; Christopher M. Grulke; Daniela Trisciuzzi; Denis Fourches; Dragos Horvath; Emilio Benfenati; Eugene N. Muratov; Eva Bay Wedebye; Francesca Grisoni; Giuseppe Felice Mangiatordi; Giuseppina M. Incisivo; Huixiao Hong; Hui W. Ng; Igor V. Tetko; Ilya Balabin; Jayaram Kancherla; Jie Shen; Julien Burton; Marc C. Nicklaus; Matteo Cassotti; Nikolai Georgiev Nikolov; Orazio Nicolotti
Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. Objectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. Methods: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure–activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. Results: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. Conclusion: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points. Citation: Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023–1033; http://dx.doi.org/10.1289/ehp.1510267
Molecular Informatics | 2010
Fiorella Ruggiu; Gilles Marcou; Alexandre Varnek; Dragos Horvath
ISIDA Property‐Labelled Fragment Descriptors (IPLF) were introduced as a general framework to numerically encode molecular structures in chemoinformatics, as counts of specific subgraphs in which atom vertices are coloured with respect to some local property/feature. Combining various colouring strategies of the molecular graph – notably pH‐dependent pharmacophore and electrostatic potential‐based flagging – with several fragmentation schemes, the different subtypes of IPLFs may range from classical atom pair and sequence counts, to monitoring population levels of branched fragments or feature multiplets. The pH‐dependent feature flagging, pursued at the level of each significantly populated microspecies involved in the proteolytic equilibrium, may furthermore add some competitive advantage over classical descriptors, even when the chosen fragmentation scheme is one of the state‐of‐the‐art pattern extraction procedures (feature sequence or pair counts, etc.) in chemoinformatics. The implemented fragmentation schemes support counting (1) linear feature sequences, (2) feature pairs, (3) circular feature fragments a.k.a. “augmented atoms” or (4) feature trees. Fuzzy rendering – optionally allowing nonterminal fragment atoms to be counted as wildcards, ignoring their specific colours/features – ensures for a seamless transition between the “strict” counts (sequences or circular fragments) and the “fuzzy” multiplet counts (pairs or trees). Also, bond information may be represented or ignored, thus leaving the user a vast choice in terms of the level of resolution at which chemical information should be extracted into the descriptors. Selected IPLF subsets were – tree descriptors, in particular – successfully tested in both neighbourhood behaviour and QSAR modelling challenges, with very promising results. They showed excellent results in similarity‐based virtual screening for analogue protease inhibitors, and generated highly predictive octanol‐water partition coefficient and hERG channel inhibition models.
Molecular Informatics | 2012
Natalia Kireeva; I. I. Baskin; Héléna A. Gaspar; Dragos Horvath; Gilles Marcou; Alexander Varnek
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure‐activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self‐Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high‐dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
Future Generation Computer Systems | 2007
Alexandru-Adrian Tantar; Nouredine Melab; El-Ghazali Talbi; Benjamin Parent; Dragos Horvath
Solving the structure prediction problem for complex proteins is difficult and computationally expensive. In this paper, we propose a bicriterion parallel hybrid genetic algorithm (GA) in order to efficiently deal with the problem using the computational grid. The use of a near-optimal metaheuristic, such as a GA, allows a significant reduction in the number of explored potential structures. However, the complexity of the problem remains prohibitive as far as large proteins are concerned, making the use of parallel computing on the computational grid essential for its efficient resolution. A conjugated gradient-based Hill Climbing local search is combined with the GA in order to intensify the search in the neighborhood of its provided configurations. In this paper we consider two molecular complexes: the tryptophan-cage protein (Brookhaven Protein Data Bank ID 1L2Y) and @a-cyclodextrin. The experimentation results obtained on a computational grid show the effectiveness of the approach.
Journal of Chemical Information and Modeling | 2013
Héléna A. Gaspar; Gilles Marcou; Dragos Horvath; Alban Arault; Sylvain Lozano; Philippe Vayer; Alexandre Varnek
Earlier (Kireeva et al. Mol. Inf. 2012, 31, 301-312), we demonstrated that generative topographic mapping (GTM) can be efficiently used both for data visualization and building of classification models in the initial D-dimensional space of molecular descriptors. Here, we describe the modeling in two-dimensional latent space for the four classes of the BioPharmaceutics Drug Disposition Classification System (BDDCS) involving VolSurf descriptors. Three new definitions of the applicability domain (AD) of models have been suggested: one class-independent AD which considers the GTM likelihood and two class-dependent ADs considering respectively, either the predominant class in a given node of the map or informational entropy. The class entropy AD was found to be the most efficient for the BDDCS modeling. The predominant class AD can be directly visualized on GTM maps, which helps the interpretation of the model.
Journal of Chemical Information and Modeling | 2015
Héléna A. Gaspar; I. I. Baskin; Gilles Marcou; Dragos Horvath; Alexandre Varnek
This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.
Biochemistry | 2010
Isabelle Landrieu; Xavier Hanoulle; Fanny Bonachera; Arnaud Hamel; Nathalie Sibille; Yanxia Yin; Jean-Michel Wieruszeski; Dragos Horvath; Qun Wei; Grégoire Vuagniaux; Guy Lippens
Debio 025 is a cyclosporin A (CsA) analogue that interferes strongly with the hepatitis C viral life cycle. Compared to CsA, Debio 025 has an additional methyl group at position 3 of the cyclic undecapeptide and an N-ethylvaline instead of an N-methylleucine at position 4. Unlike CsA, Debio 025 lacks immunosuppressive activity in vitro and in vivo. We show here that, in vitro, the cyclophilin A (CypA)-Debio 025 complex cannot interact any longer with calcineurin (CaN), a determinant for the immunosuppressive activity of CsA. We further use NMR spectroscopy to investigate at the molecular level the interaction of Debio 025 with CypA and thereby understand the basis for this loss of CaN interaction. NMR data and molecular modeling indicate that Debio 025 optimally interacts with CypA, which underlies the anti-HCV properties of Debio 025. However, the interaction between CaN and the CypA-Debio 025 complex is impeded by sterical hindrance of the CaN with the side chain of its Val4 residue. This is in sharp contrast with the case for the CypA-CsA-CaN ternary complex, where the Leu4 side chain can enter a hydrophobic cavity at the CaN interface. The structure of the CypA-Debio 025 complex thus provides a rational explanation for the non-immunosuppressive character of Debio 025.
Journal of Chemical Information and Modeling | 2008
Fanny Bonachera; Dragos Horvath
Topological fuzzy pharmacophore triplets (2D-FPT), using the number of interposed bonds to measure separation between the atoms representing pharmacophore types, were employed to establish and validate quantitative structure-activity relationships (QSAR). Thirteen data sets for which state-of-the-art QSAR models were reported in literature were revisited in order to benchmark 2D-FPT biological activity-explaining propensities. Linear and nonlinear QSAR models were constructed for each compound series (following the original authors splitting into training/validation subsets) with three different 2D-FPT versions, using the genetic algorithm-driven Stochastic QSAR sampler (SQS) to pick relevant triplets and fit their coefficients. 2D-FPT QSARs are computationally cheap, interpretable, and perform well in benchmarking. In a majority of cases (10/13), default 2D-FPT models validated better than or as well as the best among those reported, including 3D overlay-dependent approaches. Most of the analogues series, either unaffected by protonation equilibria or unambiguously adopting expected protonation states, were equally well described by rule- or pKa-based pharmacophore flagging. Thermolysin inhibitors represent a notable exception: pKa-based flagging boosts model quality, although--surprisingly--not due to proteolytic equilibrium effects. The optimal degree of 2D-FPT fuzziness is compound set dependent. This work further confirmed the higher robustness of nonlinear over linear SQS models. In spite of the wealth of studied sets, benchmarking is nevertheless flawed by low intraset diversity: a whole series of thereby caused artifacts were evidenced, implicitly raising questions about the way QSAR studies are conducted nowadays. An in-depth investigation of thrombin inhibition models revealed that some of the selected triplets make sense (one of these stands for a topological pharmacophore covering the P1 and P2 binding pockets). Nevertheless, equations were either unable to predict the activity of the structurally different ligands or tended to indiscriminately predict any compound outside the training family to be active. 2D-FPT QSARs do however not depend on any common scaffold required for molecule superimposition and may in principle be trained on hand of diverse sets, which is a must in order to obtain widely applicable models. Adding (assumed) inactives of various families for training enabled discovery of models that specifically recognize the structurally different actives.
Journal of Computer-aided Molecular Design | 2014
J.B. Brown; Yasushi Okuno; Gilles Marcou; Alexandre Varnek; Dragos Horvath
High-throughput assays challenge us to extract knowledge from multi-ligand, multi-target activity data. In QSAR, weights are statically fitted to each ligand descriptor with respect to a single endpoint or target. However, computational chemogenomics (CG) has demonstrated benefits of learning from entire grids of data at once, rather than building target-specific QSARs. A possible reason for this is the emergence of inductive knowledge transfer (IT) between targets, providing statistical robustness to the model, with no assumption about the structure of the targets. Relevant protein descriptors in CG should allow one to learn how to dynamically adjust ligand attribute weights with respect to protein structure. Hence, models built through explicit learning (EL) by including protein information, while benefitting from IT enhancement, should provide additional predictive capability, notably for protein deorphanization. This interplay between IT and EL in CG modeling is not sufficiently studied. While IT is likely to occur irrespective of the injected target information, it is not clear whether and when boosting due to EL may occur. EL is only possible if protein description is appropriate to the target set under investigation. The key issue here is the search for evidence of genuine EL exceeding expectations based on pure IT. We explore the problem in the context of Support Vector Regression, using more than 9,400