Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eugene N. Muratov is active.

Publication


Featured researches published by Eugene N. Muratov.


Journal of Medicinal Chemistry | 2014

QSAR Modeling: Where have you been? Where are you going to?

Artem Cherkasov; Eugene N. Muratov; Denis Fourches; Alexandre Varnek; I. I. Baskin; Mark T. D. Cronin; John C. Dearden; Paola Gramatica; Yvonne C. Martin; Roberto Todeschini; Viviana Consonni; Victor E. Kuz’min; Richard D. Cramer; Romualdo Benigni; Chihae Yang; James F. Rathman; Lothar Terfloth; Johann Gasteiger; Ann M. Richard; Alexander Tropsha

Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.


Journal of Chemical Information and Modeling | 2010

Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research

Denis Fourches; Eugene N. Muratov; Alexander Tropsha

Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of models prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all scientists working in the fields of molecular modeling, cheminformatics, and QSAR studies.


Chemical Research in Toxicology | 2011

Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches.

Yen Low; Takeki Uehara; Minowa Y; Yamada H; Ohno Y; Urushidani T; Alexander Sedykh; Eugene N. Muratov; Kuz'min; Denis Fourches; Hao Zhu; Ivan Rusyn; Alexander Tropsha

Quantitative structure-activity relationship (QSAR) modeling and toxicogenomics are typically used independently as predictive tools in toxicology. In this study, we evaluated the power of several statistical models for predicting drug hepatotoxicity in rats using different descriptors of drug molecules, namely, their chemical descriptors and toxicogenomics profiles. The records were taken from the Toxicogenomics Project rat liver microarray database containing information on 127 drugs ( http://toxico.nibio.go.jp/datalist.html ). The model end point was hepatotoxicity in the rat following 28 days of continuous exposure, established by liver histopathology and serum chemistry. First, we developed multiple conventional QSAR classification models using a comprehensive set of chemical descriptors and several classification methods (k nearest neighbor, support vector machines, random forests, and distance weighted discrimination). With chemical descriptors alone, external predictivity (correct classification rate, CCR) from 5-fold external cross-validation was 61%. Next, the same classification methods were employed to build models using only toxicogenomics data (24 h after a single exposure) treated as biological descriptors. The optimized models used only 85 selected toxicogenomics descriptors and had CCR as high as 76%. Finally, hybrid models combining both chemical descriptors and transcripts were developed; their CCRs were between 68 and 77%. Although the accuracy of hybrid models did not exceed that of the models based on toxicogenomics data alone, the use of both chemical and biological descriptors enriched the interpretation of the models. In addition to finding 85 transcripts that were predictive and highly relevant to the mechanisms of drug-induced liver injury, chemical structural alerts for hepatotoxicity were identified. These results suggest that concurrent exploration of the chemical features and acute treatment-induced changes in transcript levels will both enrich the mechanistic understanding of subchronic liver injury and afford models capable of accurate prediction of hepatotoxicity from chemical structure and short-term assay results.


Journal of Chemical Information and Modeling | 2012

Does rational selection of training and test sets improve the outcome of QSAR modeling

Todd M. Martin; Paul Harten; Douglas M. Young; Eugene N. Muratov; Alexander Golbraikh; Hao Zhu; Alexander Tropsha

Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.


Nature Biotechnology | 2016

Comprehensive characterization of the Published Kinase Inhibitor Set

J.M. Elkins; Vita Fedele; M. Szklarz; Kamal R. Abdul Azeez; E. Salah; Jowita Mikolajczyk; Sergei Romanov; Nikolai Sepetov; Xi-Ping Huang; Bryan L. Roth; Ayman Al Haj Zen; Denis Fourches; Eugene N. Muratov; Alex Tropsha; Joel Morris; Beverly A. Teicher; Mark Kunkel; Eric C. Polley; Karen E Lackey; Francis Atkinson; John P. Overington; Paul Bamborough; Susanne Müller; Daniel J. Price; Timothy M. Willson; David H. Drewry; Stefan Knapp; William J. Zuercher

Despite the success of protein kinase inhibitors as approved therapeutics, drug discovery has focused on a small subset of kinase targets. Here we provide a thorough characterization of the Published Kinase Inhibitor Set (PKIS), a set of 367 small-molecule ATP-competitive kinase inhibitors that was recently made freely available with the aim of expanding research in this field and as an experiment in open-source target validation. We screen the set in activity assays with 224 recombinant kinases and 24 G protein–coupled receptors and in cellular assays of cancer cell proliferation and angiogenesis. We identify chemical starting points for designing new chemical probes of orphan kinases and illustrate the utility of these leads by developing a selective inhibitor for the previously untargeted kinases LOK and SLK. Our cellular screens reveal compounds that modulate cancer cell growth and angiogenesis in vitro. These reagents and associated data illustrate an efficient way forward to increasing understanding of the historically untargeted kinome.


Chemistry of Materials | 2015

Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints

Olexandr Isayev; Denis Fourches; Eugene N. Muratov; Corey Oses; Kevin Rasch; Alexander Tropsha; Stefano Curtarolo

As the proliferation of high-throughput approaches in materials science is increasing the wealth of data in the field, the gap between accumulated-information and derived-knowledge widens. We address the issue of scientific discovery in materials databases by introducing novel analytical approaches based on structural and electronic materials fingerprints. The framework is employed to (i) query large databases of materials using similarity concepts, (ii) map the connectivity of materials space (i.e., as a materials cartograms) for rapidly identifying regions with unique organizations/properties, and (iii) develop predictive Quantitative Materials Structure–Property Relationship models for guiding materials design. In this study, we test these fingerprints by seeking target material properties. As a quantitative example, we model the critical temperatures of known superconductors. Our novel materials fingerprinting and materials cartography approaches contribute to the emerging field of materials informati...


Environmental Health Perspectives | 2016

CERAPP : Collaborative Estrogen Receptor Activity Prediction Project

Kamel Mansouri; Ahmed Abdelaziz; Aleksandra Rybacka; Alessandra Roncaglioni; Alexander Tropsha; Alexandre Varnek; Alexey V. Zakharov; Andrew Worth; Ann M. Richard; Christopher M. Grulke; Daniela Trisciuzzi; Denis Fourches; Dragos Horvath; Emilio Benfenati; Eugene N. Muratov; Eva Bay Wedebye; Francesca Grisoni; Giuseppe Felice Mangiatordi; Giuseppina M. Incisivo; Huixiao Hong; Hui W. Ng; Igor V. Tetko; Ilya Balabin; Jayaram Kancherla; Jie Shen; Julien Burton; Marc C. Nicklaus; Matteo Cassotti; Nikolai Georgiev Nikolov; Orazio Nicolotti

Background: Humans are exposed to thousands of man-made chemicals in the environment. Some chemicals mimic natural endocrine hormones and, thus, have the potential to be endocrine disruptors. Most of these chemicals have never been tested for their ability to interact with the estrogen receptor (ER). Risk assessors need tools to prioritize chemicals for evaluation in costly in vivo tests, for instance, within the U.S. EPA Endocrine Disruptor Screening Program. Objectives: We describe a large-scale modeling project called CERAPP (Collaborative Estrogen Receptor Activity Prediction Project) and demonstrate the efficacy of using predictive computational models trained on high-throughput screening data to evaluate thousands of chemicals for ER-related activity and prioritize them for further testing. Methods: CERAPP combined multiple models developed in collaboration with 17 groups in the United States and Europe to predict ER activity of a common set of 32,464 chemical structures. Quantitative structure–activity relationship models and docking approaches were employed, mostly using a common training set of 1,677 chemical structures provided by the U.S. EPA, to build a total of 40 categorical and 8 continuous models for binding, agonist, and antagonist ER activity. All predictions were evaluated on a set of 7,522 chemicals curated from the literature. To overcome the limitations of single models, a consensus was built by weighting models on scores based on their evaluated accuracies. Results: Individual model scores ranged from 0.69 to 0.85, showing high prediction reliabilities. Out of the 32,464 chemicals, the consensus model predicted 4,001 chemicals (12.3%) as high priority actives and 6,742 potential actives (20.8%) to be considered for further testing. Conclusion: This project demonstrated the possibility to screen large libraries of chemicals using a consensus of different in silico approaches. This concept will be applied in future projects related to other end points. Citation: Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, Zakharov A, Worth A, Richard AM, Grulke CM, Trisciuzzi D, Fourches D, Horvath D, Benfenati E, Muratov E, Wedebye EB, Grisoni F, Mangiatordi GF, Incisivo GM, Hong H, Ng HW, Tetko IV, Balabin I, Kancherla J, Shen J, Burton J, Nicklaus M, Cassotti M, Nikolov NG, Nicolotti O, Andersson PL, Zang Q, Politi R, Beger RD, Todeschini R, Huang R, Farag S, Rosenberg SA, Slavov S, Hu X, Judson RS. 2016. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect 124:1023–1033; http://dx.doi.org/10.1289/ehp.1510267


Nature Chemical Biology | 2015

Curation of chemogenomics data.

Denis Fourches; Eugene N. Muratov; Alexander Tropsha

To the Editor: With the rapid accumulation of data in all areas of chemical biology research, scientists rely increasingly on historical chemogenomics data and computational models to guide smallmolecule bioactivity screens and chemical probe development. However, there is a growing public concern about the frequent irreproducibility of experimental data reported in peer-reviewed scientific publications1,2. An editorial in this journal3 emphasized a critical need to address this problem, an issue that has also received attention from the US National Institutes of Health (NIH) leadership4. Since successful development of chemical probes and robust screening assays—one central objective of chemical biology—rely on the prior art in the field, it is critical that researchers establish the highest possible quality standards for data deposited in chemogenomics databases. Concerning the impact of poor data in chemogenomic databases, we5 and others6 have shown that inaccurate and inconsistent representations of chemical structures in available molecular datasets result in models of poor accuracy, whereas data curation improves the modeling outcome. Researchers relying on non-curated historical data are taking a risk of corrupting their results owing to the following ‘five I’s’: data may be incomplete, inaccurate, imprecise, incompatible and/or irreproducible. These considerations emphasize the need for thorough curation as the first critical step of any data analysis study to ensure the stability and reliability of the models and to guide experimental follow-up5. As one means of addressing the data quality problem, we propose a general chemical and biological data curation workflow (Fig. 1) that relies on existing cheminformatics approaches to flag, and in some cases correct, possibly erroneous entries in large chemogenomics datasets. This workflow begins with chemical data curation following a previously established protocol5 (step 1 in Fig. 1), resulting in the identification and correction of structural errors. Duplicate analysis (step 2) assesses data quality and removes duplicate chemical structures and contradictory records. Analysis of intraand interlab experimental variability (step 3) and exclusion of unreliable data sources (step 4) help increase data quality and aid decision-making about combination of data from different sources. Detection and verification of activity ‘cliffs’ (step 5) and calculation and tuning of the dataset modelability index7 (step 6), which estimates the feasibility of obtaining predictive quantitative structure-activity relationship (QSAR) models for a given dataset, serve as additional indicators of data quality. Consensus QSAR modeling (step 7), used for the identification and correction of potentially erroneous values or categories of compound bioactivities (step 8), conclude the workflow. As a community, we must take multifaceted approaches to ensure the quality and reproducibility of chemogenomics data through better data generation and reporting. The Nature family of journals8 have taken steps in this direction by removing space restrictions for method sections and having external statisticians verifying the correctness of statistical tests reported in some manuscripts considered for publication. The NIH is also developing plans to stimulate researchers to enhance reproducibility of their research results (http://grants.nih.gov/grants/ guide/notice-files/NOT-OD-15-103.html). It is also crucial for journals to support and encourage the use of standardized electronic protocols and formats (such as MIABE9) for chemical data sharing and to require authors to upload their data electronically to public repositories at the time of manuscript submission. Among other measures, the chemical biology community should adopt a culture of curation as a mandatory component of primary data processing and a prerequisite for data sharing. Chemical and biological data curation workflows can be developed further and utilized to flag (and where possible, fix) those records and ultimately improve the quality of data analysis and the prediction performances of modeling approaches. Experimental and computational scientists should convene to agree upon standards and best practices for data generation, reporting and curation of chemogenomics data, which will improve data reproducibility and accelerate the progression from data to knowledge in chemical biology research.


Journal of Chemical Information and Modeling | 2016

Trust, but Verify II: A Practical Guide to Chemogenomics Data Curation

Denis Fourches; Eugene N. Muratov; Alexander Tropsha

There is a growing public concern about the lack of reproducibility of experimental data published in peer-reviewed scientific literature. Herein, we review the most recent alerts regarding experimental data quality and discuss initiatives taken thus far to address this problem, especially in the area of chemical genomics. Going beyond just acknowledging the issue, we propose a chemical and biological data curation workflow that relies on existing cheminformatics approaches to flag, and when appropriate, correct possibly erroneous entries in large chemogenomics data sets. We posit that the adherence to the best practices for data curation is important for both experimental scientists who generate primary data and deposit them in chemical genomics databases and computational researchers who rely on these data for model development.


Journal of Chemical Information and Modeling | 2014

Data set modelability by QSAR

Alexander Golbraikh; Eugene N. Muratov; Denis Fourches; Alexander Tropsha

We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (correct classification rate above 0.7) for a binary data set of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of nearest-neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 data sets, and the threshold of 0.65 was found to separate the nonmodelable and modelable data sets.

Collaboration


Dive into the Eugene N. Muratov's collaboration.

Top Co-Authors

Avatar

Denis Fourches

North Carolina State University

View shared research outputs
Top Co-Authors

Avatar

Anatoly G. Artemenko

National Academy of Sciences of Ukraine

View shared research outputs
Top Co-Authors

Avatar

Carolina H. Andrade

Universidade Federal de Goiás

View shared research outputs
Top Co-Authors

Avatar

Victor E. Kuz'min

National Academy of Sciences of Ukraine

View shared research outputs
Top Co-Authors

Avatar

Vinicius M. Alves

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Rodolpho C. Braga

Universidade Federal de Goiás

View shared research outputs
Top Co-Authors

Avatar

Stephen J. Capuzzi

University of North Carolina at Chapel Hill

View shared research outputs
Top Co-Authors

Avatar

Bruno J. Neves

Universidade Federal de Goiás

View shared research outputs
Researchain Logo
Decentralizing Knowledge