N. I. Zhokhova
Moscow State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by N. I. Zhokhova.
Journal of Computer-aided Molecular Design | 2013
I. I. Baskin; N. I. Zhokhova
The continuous molecular fields (CMF) approach is based on the application of continuous functions for the description of molecular fields instead of finite sets of molecular descriptors (such as interaction energies computed at grid nodes) commonly used for this purpose. These functions can be encapsulated into kernels and combined with kernel-based machine learning algorithms to provide a variety of novel methods for building classification and regression structure–activity models, visualizing chemical datasets and conducting virtual screening. In this article, the CMF approach is applied to building 3D-QSAR models for 8 datasets through the use of five types of molecular fields (the electrostatic, steric, hydrophobic, hydrogen-bond acceptor and donor ones), the linear convolution molecular kernel with the contribution of each atom approximated with a single isotropic Gaussian function, and the kernel ridge regression data analysis technique. It is shown that the CMF approach even in this simplest form provides either comparable or enhanced predictive performance in comparison with state-of-the-art 3D-QSAR methods.
Doklady Chemistry | 2009
I. I. Baskin; N. I. Zhokhova; V. A. Palyulin; A. N. Zefirov; N. S. Zefirov
Nowadays, the development of methodology of constructing quantitative structure‐activity and structure‐property relationship (QSAR/QSPR) models aimed at improving the descriptor representation of chemical compounds and at applying increasingly sophisticated methods of analysis has achieved the saturation level when the available methods make it possible to extract from databases almost all information useful for prediction. As stated in [1], in most cases, the predictive power of models constructed with the use of “fairly good” sets of descriptors and fairly good methods of data processing depends only slightly on both the descriptor set and the method used and is nearly completely determined by the database used for constructing a model. Thus, further improvement of the descriptor representation of chemical compounds and the introduction of new machine-learning methods will lead only to little progress, whereas radically new ideas are required for the actual breakthrough in this direction to overcome the limitations caused by a lack of useful information in chemical databases.
Doklady Chemistry | 2009
N. I. Zhokhova; I. I. Baskin; D. K. Bakhronov; V. A. Palyulin; N. S. Zefirov
The rapid development of methods of design of new pharmaceuticals calls for new efficient computa� tional approaches that can reliably predict various types of biological activity of organic compounds to be synthesized. This is due to the fact that the available methods widely used to search for quantitative struc� ture-activity relationships (QSARs) have significant drawbacks. In particular, common methods of con� structing 3D QSARs, such as comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA), underlying the stateoftheart approaches to the design of new phar� maceuticals are very sensitive to the dimensions, reso� lution, and spatial alignment of the hypothetical grid constructed around a molecule and used for approxi� mating the electrostatic, steric, and hydrophobic molecular fields, the potentials of the latter being cal� culated at the grid points as molecular structure descriptors (1-3). This leads to the ambiguity of the resulting 3D QSAR models and, hence, to the unreli� ability of the prediction based on these models. In this work, we propose a new method for con� structing 3D QSAR models, namely, the method of continuous molecular fields (MCMF). The basic idea of this approach is in direct analysis of continuous molecular fields rather than a discrete array of their potentials calculated at the points of the discrete grid of finite size (as in the standard CoMFA and CoMSIA methods). Such a description better cor� responds to the physical nature of molecular fields; therefore, we can expect better statistical character� istics of 3D QSAR models upon such a substitution. Until recently, it was impossible to use continuous molecular fields in the framework of statistical analysis since common statistical procedures are intended to operate only with finite and limited number of molec� ular descriptors. Therefore, we were interested to real� ize this idea on the basis of the latest statistical approaches, for example, the support vector machines (SVM), which is free of this limitation and can operate with an infinite number of variables (4). This is achieved by using socalled kernels (5). As is known, for any kernel, there must exist a linear vector space (referred to as the reproducing kernel Hilbert space) in which the former can be uniquely represented as the scalar product of the corresponding vectors. It is evi� dent that the scalar product of the molecular field potential values at the grid points is also a kernel (by definition). Inasmuch as an increase in the grid dimensions and a decrease in the grid cell size do not violate this property, the integral of the product of molecular fields taken over the entire physical space also remains a kernel, which can be used in appropri� ate statistical methods, such as support vector regres� sion. Thus, this offers possibilities for constructing sta� tistical models based on the description of molecular objects as continuous molecular fields. The basic element of the MCMF proposed in this work is the procedure of calculation of kernels. The use of these kernels will be exemplified by constructing QSARs in the framework of the statistical method of support vector regression (5).
Russian Journal of Physical Chemistry A | 2007
N. I. Zhokhova; V. A. Palyulin; I. I. Baskin; A. N. Zefirov; N. S. Zefirov
The quantitative structure-property relationship (QSPR) method was used to study the enthalpy of vaporization at 25°C of 65 organic compounds representing 13 different classes. As an alternative to the dependence of the enthalpy of vaporization on the boiling temperature, a neural network QSPR model is suggested that allows this property to be predicted with the use of descriptors taking into account the fragment composition of molecules.
Russian Journal of Applied Chemistry | 2003
N. I. Zhokhova; I. I. Baskin; V. A. Palyulin; A. N. Zefirov; N. S. Zefirov
The enthalpies of sublimation of organic compounds belonging to various classes were studied for the first time in terms of the fragment approach based on the QSPR (Quantitative Structure-Property Relationships) method. The applicability of this technique to calculation of this parameter was demonstrated and a model that makes it possible to predict the enthalpy of sublimation of compounds on the basis of descriptors taking into account the fragment composition of a molecule was suggested.
Russian Chemical Bulletin | 2003
N. I. Zhokhova; I. I. Baskin; V. A. Palyulin; A. N. Zefirov; N. S. Zefirov
Applicability of the fragmental approach developed in the framework of the QSPR methodology to prediction of the molecular polarizability of various classes of organic compounds is demonstrated. The model proposed allows reliable prediction of the molecular polarizability of organic compounds based on their chemical composition and a set of fragmental descriptors, which characterize the multiple and aromatic bonds as well as fused aromatic systems.
Chemistry Central Journal | 2009
I. I. Baskin; N. I. Zhokhova; V. A. Palyulin; N. S. Zefirov
Due to the existence of many additive properties in chemistry and physics (the energy is the most prominent one), AIL is especially well suited for solving numerous problems in chemoand bioinformatics, ranging from QSAR/ QSPR studies up to molecular modeling, computing binding energies, etc. We suggest two basic approaches to AIL: (i) the use of special additive neural networks (developed by us for this purpose), and (ii) the use of kernel-based approaches, such as the support vector machines, with a special type of kernels:
Journal of Computer-aided Molecular Design | 2015
Gleb V. Sitnikov; N. I. Zhokhova; Yury A. Ustynyuk; Alexandre Varnek; I. I. Baskin
A novel type of molecular fields, Continuous Indicator Fields (CIFs), is suggested to provide 3D structural description of molecules. The values of CIFs are calculated as the degree to which a point with given 3D coordinates belongs to an atom of a certain type. They can be used similarly to standard physicochemical fields for building 3D structure–activity models. One can build CIF-based 3D structure–activity models in the framework of the continuous molecular fields approach described earlier (J Comput-Aided Mol Des 27 (5):427–442, 2013) for the case of physicochemical molecular fields. CIFs are thought to complement and further extend traditional physicochemical fields. The models built with CIFs can be interpreted in terms of preferable and undesirable positions of certain types of atoms in space. This helps to understand which changes in chemical structure should be made in order to design a compound possessing desirable properties. We have demonstrated that CIFs can be considered as 3D analogues of 2D topological molecular fragments. The performance of this approach is demonstrated in structure–activity studies of thrombin inhibitors, multidentate N-heterocyclic ligands for Am3+/Eu3+ separation, and coloring dyes.
Doklady Chemistry | 2010
N. I. Zhokhova; I. I. Baskin; A. N. Zefirov; V. A. Palyulin; N. S. Zefirov
Among a great number of QSPR/QSAR approaches to the prediction of physical and chemical properties and biological activity of organic com� pounds, the methods using fragmental descriptors play a specific role [1, 2]. The values of the latter can be either the occurrence numbers or indicators of the presence of some fragments in the structures of chem� ical compounds. Advantages of these descriptors are their transparent meaning and the possibility of fast automatic generation on the basis of only the struc� tural formula. Fragmental descriptors can be calcu� lated without knowledge of the 3D structure or elec� tronic structure of molecules and, therefore, can be easily used for operating large databases. One of the disadvantages of fragmental descriptors is the problem of rare fragments that can be absent in the training set but can exist in the compounds for which the prediction is performed. Since the contribu� tions of rare fragments cannot be determined on the basis of the training set, considerable errors of predic� tion are expected for compounds containing such fragments. We suggest solving this problem by intro� ducing additional descriptors with the values being to an extent related to the contributions of fragments to the predicted property. For this purpose, we also sug� gest using special fragmental descriptors with the val� ues being calculated by combining the properties of the atoms that constitute these fragments. Such descriptors are referred to as pseudofragmental descriptors in order to distinguish them from “proper” descriptors assigned the values of the occurrence num� bers or indicators of the presence of certain fragments in the structures of chemical compounds. The atomic properties that are believed to influence the contribu� tions of fragmental descriptors to the predicted prop� erty, for example, the atomic weight, number of elec� trons, covalent radius, electronegativity, ionization potential, etc., can be used for predicting physical and chemical properties of organic molecules. It is also important for the used combinations of properties to have a clear physical meaning since this provides a bet� ter chance for the existence of correlation of their val� ues with fragmental contributions. If such a correla� tion exists, a small number of pseudofragmental descriptors enter into statistical models instead of numerous proper fragmental descriptors, including potentially rare, thus acting as a compressed generali� zation of the latter. This largely solves the problem of rare fragments if the pseudofragmental descriptors are constructed on the basis of frequently encountered fragments consisting of separate atoms or short chains of arbitrary atoms, which are present almost in all molecules.
Moscow University Chemistry Bulletin | 2007
N. I. Zhokhova; E. V. Bobkov; I. I. Baskin; V. A. Palyulin; A. N. Zefirov; N. S. Zefirov
The QSPR model for predicting the Gibbs free energy of formation of complexes of different organic compounds with β-cyclodextrin has been constructed by the multiple linear regression method with the use of the double cross-validation procedure. The model has good predictive power.