Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Anil Kumar Pandey.
Journal of Chemical Information and Modeling | 2008
Igor V. Tetko; Iurii Sushko; Anil Kumar Pandey; Hao Zhu; Alexander Tropsha; Ester Papa; Tomas Öberg; Roberto Todeschini; Denis Fourches; Alexandre Varnek
The estimation of the accuracy of predictions is a critical problem in QSAR modeling. The distance to model can be defined as a metric that defines the similarity between the training set molecules and the test set compound for the given property in the context of a specific model. It could be expressed in many different ways, e.g., using Tanimoto coefficient, leverage, correlation in space of models, etc. In this paper we have used mixtures of Gaussian distributions as well as statistical tests to evaluate six types of distances to models with respect to their ability to discriminate compounds with small and large prediction errors. The analysis was performed for twelve QSAR models of aqueous toxicity against T. pyriformis obtained with different machine-learning methods and various types of descriptors. The distances to model based on standard deviation of predicted toxicity calculated from the ensemble of models afforded the best results. This distance also successfully discriminated molecules with low and large prediction errors for a mechanism-based model developed using log P and the Maximum Acceptor Superdelocalizability descriptors. Thus, the distance to model metric could also be used to augment mechanistic QSAR models by estimating their prediction errors. Moreover, the accuracy of prediction is mainly determined by the training set data distribution in the chemistry and activity spaces but not by QSAR approaches used to develop the models. We have shown that incorrect validation of a model may result in the wrong estimation of its performance and suggested how this problem could be circumvented. The toxicity of 3182 and 48774 molecules from the EPA High Production Volume (HPV) Challenge Program and EINECS (European chemical Substances Information System), respectively, was predicted, and the accuracy of prediction was estimated. The developed models are available online at http://www.qspr.org site.
Journal of Computer-aided Molecular Design | 2011
Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Matthias Rupp; Wolfram Teetz; Stefan Brandmaier; Ahmed Abdelaziz; Volodymyr V. Prokopenko; Vsevolod Yu. Tanchuk; Roberto Todeschini; Alexandre Varnek; Gilles Marcou; Peter Ertl; Vladimir Potemkin; Maria A. Grishina; Johann Gasteiger; Christof H. Schwab; I. I. Baskin; V. A. Palyulin; E. V. Radchenko; William J. Welsh; Vladyslav Kholodovych; Dmitriy Chekmarev; Artem Cherkasov; João Aires-de-Sousa; Qingyou Zhang; Andreas Bender; Florian Nigsch; Luc Patiny
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.
Journal of Chemical Information and Modeling | 2010
Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Artem Cherkasov; Jiazhong Li; Paola Gramatica; Katja Hansen; Timon Schroeter; Klaus-Robert Müller; Lili Xi; Huanxiang Liu; Xiaojun Yao; Tomas Öberg; Farhad Hormozdiari; Phuong Dao; Cenk Sahinalp; Roberto Todeschini; Pavel G. Polishchuk; A. G. Artemenko; Victor E. Kuz’min; Todd M. Martin; Douglas M. Young; Denis Fourches; Eugene N. Muratov; Alexander Tropsha; I. I. Baskin; Dragos Horvath; Gilles Marcou; Christophe Muller
The estimation of accuracy and applicability of QSAR and QSPR models for biological and physicochemical properties represents a critical problem. The developed parameter of distance to model (DM) is defined as a metric of similarity between the training and test set compounds that have been subjected to QSAR/QSPR modeling. In our previous work, we demonstrated the utility and optimal performance of DM metrics that have been based on the standard deviation within an ensemble of QSAR models. The current study applies such analysis to 30 QSAR models for the Ames mutagenicity data set that were previously reported within the 2009 QSAR challenge. We demonstrate that the DMs based on an ensemble (consensus) model provide systematically better performance than other DMs. The presented approach identifies 30-60% of compounds having an accuracy of prediction similar to the interlaboratory accuracy of the Ames test, which is estimated to be 90%. Thus, the in silico predictions can be used to halve the cost of experimental measurements by providing a similar prediction accuracy. The developed model has been made publicly available at http://ochem.eu/models/1 .
Journal of Chemical Information and Modeling | 2011
Sergii Novotarskyi; Iurii Sushko; Robert Körner; Anil Kumar Pandey; Igor V. Tetko
Prediction of CYP450 inhibition activity of small molecules poses an important task due to high risk of drug-drug interactions. CYP1A2 is an important member of CYP450 superfamily and accounts for 15% of total CYP450 presence in human liver. This article compares 80 in-silico QSAR models that were created by following the same procedure with different combinations of descriptors and machine learning methods. The training and test sets consist of 3745 and 3741 inhibitors and noninhibitors from PubChem BioAssay database. A heterogeneous external test set of 160 inhibitors was collected from literature. The studied descriptor sets involve E-state, Dragon and ISIDA SMF descriptors. Machine learning methods involve Associative Neural Networks (ASNN), K Nearest Neighbors (kNN), Random Tree (RT), C4.5 Tree (J48), and Support Vector Machines (SVM). The influence of descriptor selection on model accuracy was studied. The benefits of bagging modeling approach were shown. Applicability domain approach was successfully applied in this study and ways of increasing model accuracy through use of applicability domain measures were demonstrated as well as fragment-based model interpretation was performed. The most accurate models in this study achieved values of 83% and 68% correctly classified instances on the internal and external test sets, respectively. The applicability domain approach allowed increasing the prediction accuracy to 90% for 78% of the internal and 17% of the external test sets, respectively. The most accurate models are available online at http://ochem.eu/models/Q5747 .
Journal of Chemical Information and Modeling | 2009
Alexandre Varnek; Cédric Gaudin; Gilles Marcou; I. I. Baskin; Anil Kumar Pandey; Igor V. Tetko
Two inductive knowledge transfer approaches - multitask learning (MTL) and Feature Net (FN) - have been used to build predictive neural networks (ASNN) and PLS models for 11 types of tissue-air partition coefficients (TAPC). Unlike conventional single-task learning (STL) modeling focused only on a single target property without any relations to other properties, in the framework of inductive transfer approach, the individual models are viewed as nodes in the network of interrelated models built in parallel (MTL) or sequentially (FN). It has been demonstrated that MTL and FN techniques are extremely useful in structure-property modeling on small and structurally diverse data sets, when conventional STL modeling is unable to produce any predictive model. The predictive STL individual models were obtained for 4 out of 11 TAPC, whereas application of inductive knowledge transfer techniques resulted in models for 9 TAPC. Differences in prediction performances of the models as a function of the machine-learning method, and of the number of properties simultaneously involved in the learning, has been discussed.
Journal of Chemical Information and Modeling | 2008
Muthukumarasamy Karthikeyan; Subramanian Krishnan; Anil Kumar Pandey; Andreas Bender; Alexander Tropsha
We present the application of a Java remote method invocation (RMI) based open source architecture to distributed chemical computing. This architecture was previously employed for distributed data harvesting of chemical information from the Internet via the Google application programming interface (API; ChemXtreme). Due to its open source character and its flexibility, the underlying server/client framework can be quickly adopted to virtually every computational task that can be parallelized. Here, we present the server/client communication framework as well as an application to distributed computing of chemical properties on a large scale (currently the size of PubChem; about 18 million compounds), using both the Marvin toolkit as well as the open source JOELib package. As an application, for this set of compounds, the agreement of log P and TPSA between the packages was compared. Outliers were found to be mostly non-druglike compounds and differences could usually be explained by differences in the underlying algorithms. ChemStar is the first open source distributed chemical computing environment built on Java RMI, which is also easily adaptable to user demands due to its plug-in architecture. The complete source codes as well as calculated properties along with links to PubChem resources are available on the Internet via a graphical user interface at http://moltable.ncl.res.in/chemstar/.
Journal of Cheminformatics | 2011
Iurii Sushko; Anil Kumar Pandey; Sergii Novotarskyi; Robert Körner; Matthias Rupp; Wolfram Teetz; Stefan Brandmaier; Ahmed Abdelaziz; Volodymyr V. Prokopenko; Vsevolod Yu. Tanchuk; Roberto Todeschini; Alexandre Varnek; Gilles Marcou; Peter Ertl; V. A. Potemkin; Maria A. Grishina; Johann Gasteiger; I. I. Baskin; V. A. Palyulin; E. V. Radchenko; William J. Welsh; Vladyslav Kholodovych; Dmitriy Chekmarev; Artem Cherkasov; João Aires-de-Sousa; Qingyou Zhang; Andreas Bender; Florian Nigsch; Luc Patiny; Antony J. Williams
The Online Chemical Modeling Environment is a unique platform on the Web that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. The database is user-contributed and contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses on data quality and verification. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. Our intention is to make OCHEM an ultimate platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The OCHEM is free for the web users and it is available online at http://ochem.eu. “Computing chemistry on the web” [1] is becoming a reality.
Journal of Cheminformatics | 2010
Sergii Novotarskyi; Iurii Sushko; Robert Körner; Anil Kumar Pandey; Igor V. Tetko
Cytochromes P450 (CYP450) are a superfamily of enzymes, involved in metabolism of a large number of xenobiotic compounds. CYP450 are involved in degradation of a large amount of drugs, currently present on the market. The promiscuity with respect to substrates makes the CYP450 enzymes prone to inhibition by a large amount of drugs, which gives way to clinically significant drug-drug interactions. n nIn this work different machine learning methods were applied to classify the inhibitors/noninhibitors of human CYP450 1A2. The structures and the active/inactive classification concerning CYP1A2 inhibition were taken from PubChem BioAssay database. This assay uses human CYP1A2 to measure the demethylation of luciferin 6 methyl ether (Luciferin-ME; Promega-Glo) to luciferin. n nThe tested methods include k nearest neighbors (kNN), decision tree, random forest, support vector machine (SVM) and associative neural networks (ASNN). The descriptors used were those from the Dragon software, the fragment descriptors and the E-state indices. n nThe training and test sets were handled separately to avoid different possibilities of overfitting - including overfitting by descriptor selection. Different applicability domain (AD) approaches were used to estimate the confidence of classification. n nAs a result the models managed to correctly classify 80% of the test set instances. The accuracy of classification was found to be up to 95%, if only 30% most confident predictions were taken into account. The model was also applied to an external test set of 187 molecules, collected from literature and measured using a different etalon reaction. For this set accuracy of 78% was achieved on the 30% most confident predictions. n nAll the developed models are fast enough to be used for virtual screening of CYP1A2 inhibitors and noninhibitors. The developed models are publicly available on-line at the http://qspr.eu web site.
Journal of Cheminformatics | 2010
Iurii Sushko; Sergii Novotarskyi; Anil Kumar Pandey; Robert Körner; Igor V. Tetko
Classification models are frequent in QSAR modeling. It is of crucial importance to provide good accuracy estimation for classification. Applicability domain provides additional information to identify which compounds are classified with best accuracy and which are expected to have poor and unreliable predictions. The selection of the most reliable predictions can dramatically improve performance of methods while decreasing coverage of predictions [1]. n nIn binary classification problems, labels for machine learning methods are discrete {-1, 1}. Nonetheless, model usually yields prediction that is continuous. Most apparent metrics for accuracy estimation is distance between prediction point and edge of a class, i.e. the more is the distance between prediction the edge of the class, the more reliable and accurate is the prediction of given compound. This metric has been already used in several previous studies (e.g., [2]) and demonstrated good separation of reliable and non-reliable classifications. In quantitative predictions, the standard deviation of ensemble predictions has been found as the most accurate measure distance in a recent benchmarking [3]. n n nWe propose to integrate both metrics. Rather than giving a point estimate, this approach provides us with a probability distribution of finding particular compound in one of the classes. Suggested metrics is probability n n n n n n nwhere E is class domain a - ensembles average prediction, v -- variance of ensembles prediction, N(a, v, x) is probability density of the Gaussian distribution. Performance of this metric and its comparison to the traditional ones are evaluated for several QSAR/QSPR classification problems. The developed approach can be freely accessed to develop and estimate applicability domain of classification models at http://qspr.eu web site.
Journal of Chemometrics | 2010
Iurii Sushko; Sergii Novotarskyi; Robert Körner; Anil Kumar Pandey; Vasily V. Kovalishyn; Volodymyr V. Prokopenko; Igor V. Tetko