I. I. Baskin
Moscow State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by I. I. Baskin.
Journal of Medicinal Chemistry | 2014
Artem Cherkasov; Eugene N. Muratov; Denis Fourches; Alexandre Varnek; I. I. Baskin; Mark T. D. Cronin; John C. Dearden; Paola Gramatica; Yvonne C. Martin; Roberto Todeschini; Viviana Consonni; Victor E. Kuz’min; Richard D. Cramer; Romualdo Benigni; Chihae Yang; James F. Rathman; Lothar Terfloth; Johann Gasteiger; Ann M. Richard; Alexander Tropsha
Quantitative structure-activity relationship modeling is one of the major computational tools employed in medicinal chemistry. However, throughout its entire history it has drawn both praise and criticism concerning its reliability, limitations, successes, and failures. In this paper, we discuss (i) the development and evolution of QSAR; (ii) the current trends, unsolved problems, and pressing challenges; and (iii) several novel and emerging applications of QSAR modeling. Throughout this discussion, we provide guidelines for QSAR development, validation, and application, which are summarized in best practices for building rigorously validated and externally predictive QSAR models. We hope that this Perspective will help communications between computational and experimental chemists toward collaborative development and use of QSAR models. We also believe that the guidelines presented here will help journal editors and reviewers apply more stringent scientific standards to manuscripts reporting new QSAR studies, as well as encourage the use of high quality, validated QSARs for regulatory decision making.
Journal of Computer-aided Molecular Design | 2011
Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Matthias Rupp; Wolfram Teetz; Stefan Brandmaier; Ahmed Abdelaziz; Volodymyr V. Prokopenko; Vsevolod Yu. Tanchuk; Roberto Todeschini; Alexandre Varnek; Gilles Marcou; Peter Ertl; Vladimir Potemkin; Maria A. Grishina; Johann Gasteiger; Christof H. Schwab; I. I. Baskin; V. A. Palyulin; E. V. Radchenko; William J. Welsh; Vladyslav Kholodovych; Dmitriy Chekmarev; Artem Cherkasov; João Aires-de-Sousa; Qingyou Zhang; Andreas Bender; Florian Nigsch; Luc Patiny
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.
Journal of Chemical Information and Modeling | 2007
Alexandre Varnek; Natalia Kireeva; Igor V. Tetko; I. I. Baskin; Vitaly P. Solov'ev
Several popular machine learning methods--Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), modified version of the partial least-squares analysis (PLSM), backpropagation neural network (BPNN), and Multiple Linear Regression Analysis (MLR)--implemented in ISIDA, NASAWIN, and VCCLAB software have been used to perform QSPR modeling of melting point of structurally diverse data set of 717 bromides of nitrogen-containing organic cations (FULL) including 126 pyridinium bromides (PYR), 384 imidazolium and benzoimidazolium bromides (IMZ), and 207 quaternary ammonium bromides (QUAT). Several types of descriptors were tested: E-state indices, counts of atoms determined for E-state atom types, molecular descriptors generated by the DRAGON program, and different types of substructural molecular fragments. Predictive ability of the models was analyzed using a 5-fold external cross-validation procedure in which every compound in the parent set was included in one of five test sets. Among the 16 types of developed structure--melting point models, nonlinear SVM, ASNN, and BPNN techniques demonstrate slightly better performance over other methods. For the full set, the accuracy of predictions does not significantly change as a function of the type of descriptors. For other sets, the performance of descriptors varies as a function of method and data set used. The root-mean squared error (RMSE) of prediction calculated on independent test sets is in the range of 37.5-46.4 degrees C (FULL), 26.2-34.8 degrees C (PYR), 38.8-45.9 degrees C (IMZ), and 34.2-49.3 degrees C (QUAT). The moderate accuracy of predictions can be related to the quality of the experimental data used for obtaining the models as well as to difficulties to take into account the structural features of ionic liquids in the solid state (polymorphic effects, eutectics, glass formation).
Molecular Informatics | 2011
Alexandre Varnek; I. I. Baskin
Here, chemoinformatics is considered as a theoretical chemistry discipline complementary to quantum chemistry and force‐field molecular modeling. These three fields are compared with respect to molecular representation, inference mechanisms, basic concepts and application areas. A chemical space, a fundamental concept of chemoinformatics, is considered with respect to complex relations between chemical objects (graphs or descriptor vectors). Statistical Learning Theory, one of the main mathematical approaches in structure‐property modeling, is briefly reviewed. Links between chemoinformatics and its “sister” fields – machine learning, chemometrics and bioinformatics are discussed.
Molecular Informatics | 2012
Natalia Kireeva; I. I. Baskin; Héléna A. Gaspar; Dragos Horvath; Gilles Marcou; Alexander Varnek
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure‐activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self‐Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high‐dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
Molecular Informatics | 2010
I. I. Baskin; Natalia Kireeva; Alexandre Varnek
In this paper, we associate an applicability domain (AD) of QSAR/QSPR models with the area in the input (descriptor) space in which the density of training data points exceeds a certain threshold. It could be proved that the predictive performance of the models (built on the training set) is larger for the test compounds inside the high density area, than for those outside this area. Instead of searching a decision surface separating high and low density areas in the input space, the one‐class classification 1‐SVM approach looks for a hyperplane in the associated feature space. Unlike other reported in the literature AD definitions, this approach: (i) is purely “data‐based”, i.e. it assigns the same AD to all models built on the same training set, (ii) provides results that depend only on the initial descriptors pool generated for the training set, (iii) can be used for the huge number of descriptors, as well as in the framework of structured kernel‐based approaches, e.g., chemical graph kernels. The developed approach has been applied to improve the performance of QSPR models for stability constants of the complexes of organic ligands with alkaline‐earth metals in water.
Journal of Chemical Information and Modeling | 2009
Alexandre Varnek; Cédric Gaudin; Gilles Marcou; I. I. Baskin; Anil Kumar Pandey; Igor V. Tetko
Two inductive knowledge transfer approaches - multitask learning (MTL) and Feature Net (FN) - have been used to build predictive neural networks (ASNN) and PLS models for 11 types of tissue-air partition coefficients (TAPC). Unlike conventional single-task learning (STL) modeling focused only on a single target property without any relations to other properties, in the framework of inductive transfer approach, the individual models are viewed as nodes in the network of interrelated models built in parallel (MTL) or sequentially (FN). It has been demonstrated that MTL and FN techniques are extremely useful in structure-property modeling on small and structurally diverse data sets, when conventional STL modeling is unable to produce any predictive model. The predictive STL individual models were obtained for 4 out of 11 TAPC, whereas application of inductive knowledge transfer techniques resulted in models for 9 TAPC. Differences in prediction performances of the models as a function of the machine-learning method, and of the number of properties simultaneously involved in the learning, has been discussed.
Expert Opinion on Drug Discovery | 2016
I. I. Baskin; David A. Winkler; Igor V. Tetko
ABSTRACT Introduction: Neural networks are becoming a very popular method for solving machine learning and artificial intelligence problems. The variety of neural network types and their application to drug discovery requires expert knowledge to choose the most appropriate approach. Areas covered: In this review, the authors discuss traditional and newly emerging neural network approaches to drug discovery. Their focus is on backpropagation neural networks and their variants, self-organizing maps and associated methods, and a relatively new technique, deep learning. The most important technical issues are discussed including overfitting and its prevention through regularization, ensemble and multitask modeling, model interpretation, and estimation of applicability domain. Different aspects of using neural networks in drug discovery are considered: building structure-activity models with respect to various targets; predicting drug selectivity, toxicity profiles, ADMET and physicochemical properties; characteristics of drug-delivery systems and virtual screening. Expert opinion: Neural networks continue to grow in importance for drug discovery. Recent developments in deep learning suggests further improvements may be gained in the analysis of large chemical data sets. It’s anticipated that neural networks will be more widely used in drug discovery in the future, and applied in non-traditional areas such as drug delivery systems, biologically compatible materials, and regenerative medicine.
Journal of Chemical Information and Modeling | 2015
Héléna A. Gaspar; I. I. Baskin; Gilles Marcou; Dragos Horvath; Alexandre Varnek
This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.
Annals of the New York Academy of Sciences | 2006
S. O. Bachurin; Sergey E. Tkachenko; I. I. Baskin; N. N. Lermontova; Tatyana Mukhina; Lyudmila Petrova; Anatoliy Ustinov; A. N. Proshin; V. V. Grigoriev; Nikolay Lukoyanov; V. A. Palyulin; Nikolay S. Zefirov
Abstract: Neuroprotective and biobehavioral properties of a series of novel open chain MK‐801 analogs, as well as their structure‐activity relationships have been investigated. Three groups of compounds were synthesized: monobenzylamino, benzhydrylamino, and dibenzylamino (DBA) analogs of MK‐801. It was revealed that DBA analogs exhibit pronounced glutamate‐induced calcium uptake blocking properties and anti‐NMDA activity. The hit compound of DBA series, NT‐1505, was investigated for its ability to improve cognition functions in animal model of Alzheimers disease type dementia, simulated by treating animals with cholinotoxin AF64A. The results from an active avoidance test and a Morris water maze test showed that experimental animals, treated additionally with NT‐1505, exhibited much better learning ability and memory than the control group (AF64A treated) and close to that of the vehicle group of animals (treated with physiological solution). Study of NT‐1505 influence on locomotor activity revealed that it is characterized by a spectrum of behavioral activity radically different from that of MK‐801, and in contrast to the latter one does not produce any psychotomimetic side effects in the therapeutically significant dose interval. The computed docking of MK‐801 and its flexible analogs on the NMDA receptor elucidated the crucial role of the hydrogen bond formed between these compounds and the asparagine residue for magnesium binding in the NMDA receptor. It was suggested that strong hydrophobic interaction between MK‐801 and the hydrophobic pocket in the NMDA receptor‐channel complex determines much higher irreversibility of this adduct compared to the intermediates formed between this site and Mg ions or flexible DBA derivatives, which might explain the absence of PCP‐like side effects of the latter compounds.