David J. Livingstone
University of Portsmouth
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by David J. Livingstone.
Journal of Computer-aided Molecular Design | 2005
Igor V. Tetko; Johann Gasteiger; Roberto Todeschini; A. Mauri; David J. Livingstone; Peter Ertl; V. A. Palyulin; E. V. Radchenko; Nikolai S. Zefirov; Alexander Makarenko; Vsevolod Yu. Tanchuk; Volodymyr V. Prokopenko
Internet technology offers an excellent opportunity for the development of tools by the cooperative effort of various groups and institutions. We have developed a multi-platform software system, Virtual Computational Chemistry Laboratory, http://www.vcclab.org, allowing the computational chemist to perform a comprehensive series of molecular indices/properties calculations and data analysis. The implemented software is based on a three-tier architecture that is one of the standard technologies to provide client-server services on the Internet. The developed software includes several popular programs, including the indices generation program, DRAGON, a 3D structure generator, CORINA, a program to predict lipophilicity and aqueous solubility of chemicals, ALOGPS and others. All these programs are running at the host institutes located in five countries over Europe. In this article we review the main features and statistics of the developed system that can be used as a prototype for academic and industry models.
Journal of Chemical Information and Computer Sciences | 1995
Igor V. Tetko; David J. Livingstone; A. I. Luik
The application of feed forward back propagation artificial neural networks with one hidden layer (ANN) to perform the equivalent of multiple linear regression (MLR) has been examined using artificial structured data sets and real literature data. The predictive ability of the networks has been estimated using a training/ test set protocol. The results have shown advantages of ANN over MLR analysis. The ANNs do not require high order terms or indicator variables to establish complex structure-activity relationships. Overfitting does not have any influence on network prediction ability when overtraining is avoided by cross-validation. Application of ANN ensembles has allowed the avoidance of chance correlations and satisfactory predictions of new data have been obtained for a wide range of numbers of neurons in the hidden layer.
Journal of Chemical Information and Computer Sciences | 2000
David C. Whitley; Martyn G. Ford; David J. Livingstone
An unsupervised learning method is proposed for variable selection and its performance assessed using three typical QSAR data sets. The aims of this procedure are to generate a subset of descriptors from any given data set in which the resultant variables are relevant, redundancy is eliminated, and multicollinearity is reduced. Continuum regression, an algorithm encompassing ordinary least squares regression, regression on principal components, and partial least squares regression, was used to construct models from the selected variables. The variable selection routine is shown to produce simple, robust, and easily interpreted models for the chosen data sets.
European Journal of Medicinal Chemistry | 1999
David T. Manallack; David J. Livingstone
Over the last decade neural networks have become an efficient method for data analysis in the field of drug discovery. The early problems encountered with neural networks such as overfitting and overtraining have been addressed resulting in a technique that surpasses traditional statistical methods. Neural networks have thus largely lived up to their promise, which was to overcome QSAR statistical problems. The next revolution in QSAR will no doubt involve research into producing better descriptors used in these studies to improve our ability to relate chemical structure to biological activity. This review focuses on the applications of neural network methods and their development over the last five years.
Journal of Chemical Information and Computer Sciences | 1996
Igor V. Tetko; A. E. P. Villa; David J. Livingstone
Quantitative structure-activity relationship (QSAR) studies usually require an estimation of the relevance of a very large set of initial variables. Determination of the most important variables allows theoretically a better generalization by all pattern recognition methods. This study introduces and investigates five pruning algorithms designed to estimate the importance of input variables in feed-forward artificial neural network trained by back propagation algorithm (ANN) applications and to prune nonrelevant ones in a statistically reliable way. The analyzed algorithms performed similar variable estimations for simulated data sets, but differences were detected for real QSAR examples. Improvement of ANN prediction ability was shown after the pruning of redundant input variables. The statistical coefficients computed by ANNs for QSAR examples were better than those of multiple linear regression. Restrictions of the proposed algorithms and the potential use of ANNs are discussed.
Journal of Computer-aided Molecular Design | 1997
David J. Livingstone; David T. Manallack; Igor V. Tetko
The origins and operation of artificial neural networks are briefly described and their early application to data modelling in drug design is reviewed. Four problems in the use of neural networks in data modelling are discussed, namely overfitting, chance effects, overtraining and interpretation, and examples are given of the means by which the first three of these may be avoided. The use of neural networks as a variable selection tool is shown and the advantage of networks as a nonlinear data modelling device is discussed. The display of multivariate data in two dimensions employing a neural network is illustrated using experimental and theoretical data for a set of charge transfer complexes.
Journal of Chemical Information and Computer Sciences | 1998
Vasyl Kovalishyn; Igor V. Tetko; A. I. Luik; Vladyslav Kholodovych; A. E. P. Villa; David J. Livingstone
Pruning methods for feed-forward artificial neural networks trained by the cascade-correlation learning algorithm are proposed. The cascade-correlation algorithm starts with a small network and dynamically adds new nodes until the analyzed problem has been solved. This feature of the algorithm removes the requirement to predefine the architecture of the neural network prior to network training. The developed pruning methods are used to estimate the importance of large sets of initial variables for quantitative structure−activity relationship studies and simulated data sets. The calculated results are compared with the performance of fixed-size back-propagation neural networks and multiple regression analysis and are carefully validated using different training/test set protocols, such as leave-one-out and full cross-validation procedures. The results suggest that the pruning methods can be successfully used to optimize the set of variables for the cascade-correlation learning algorithm neural networks. Th...
Journal of Computer-aided Molecular Design | 2001
David J. Livingstone; Martyn G. Ford; Jarmo Huuskonen; David W. Salt
It has been shown that water solubility and octanol/water partition coefficient for a large diverse set of compounds can be predicted simultaneously using molecular descriptors derived solely from a two dimensional representation of molecular structure. These properties have been modelled using multiple linear regression, artificial neural networks and a statistical method known as canonical correlation analysis. The neural networks give slightly better models both in terms of fitting and prediction presumably due to the fact that they include non-linear terms. The statistical methods, on the other hand, provide information concerning the explanation of variance and allow easy interrogation of the models. Models were fitted using a training set of 552 compounds, a validation set and test set each containing 68 molecules and two separate literature test sets for solubility and partition.
European Journal of Medicinal Chemistry | 2000
Jarmo Huuskonen; Jukka Rantanen; David J. Livingstone
We describe robust methods for estimating the aqueous solubility of a set of 734 organic compounds from different structural classes based on multiple linear regression (MLR) and artificial neural networks (ANN) model. The structures were represented by atom-type electrotopological state (E-state) indices. The squared correlation coefficient and standard deviation for the MLR with 34 structural parameters were r(2) = 0.94 and s = 0.58 for the training set of 675 compounds. For the test set of 21 compounds, the equivalent statistics were r(2)(pred) = 0.80 and s = 0.87, respectively. Neural networks gave a significant improvement using the same set of parameters, and the standard deviations were s = 0.52 for the training set and s = 0.75 for the test set when an artificial neural network with five neurons in the hidden layer was used. The results clearly show that accurate models can be rapidly calculated for the estimation of aqueous solubility for a large and diverse set of organic compounds using easily calculated structural parameters.
Journal of Computer-aided Molecular Design | 1989
Brian D. Hudson; David J. Livingstone; Elizabeth Rahr
SummaryPattern recognition methods, particularly the ‘unsupervised learning’ techniques, are well suited for the preliminary analysis of the large data sets produced by computer chemistry. The use of linear and non-linear display methods for such exploratory analysis are exemplified with the aid of two data sets of biologically active molecules. Advantages and disadvantages of these techniques are discussed.