Molecular Informatics | 2021

Machine Learning Methods to Predict the Terrestrial and Marine Origin of Natural Products

 

Abstract


In recent years there has been a growing interest in studying the differences between the chemical and biological space represented by natural products (NPs) of terrestrial and marine origin. In order to learn more about these two chemical spaces, marine natural products (MNPs) and terrestrial natural products (TNPs), a machine learning (ML) approach was developed in the current work to predict three classes, MNPs, TNPs and a third class of NPs that appear in both the terrestrial and marine environments. In total 22,398 NPs were retrieved from the Reaxys® database, from those 10,790 molecules are recorded as MNPs, 10,857 as TNPs, and 761 NPs appear registered as both MNPs and TNPs. Several ML algorithms such as Random Forest, Support Vector Machines, and deep learning Multilayer Perceptron networks have been benchmarked. The best performance was achieved with a consensus classification model, which predicted the external test set with an overall predictive accuracy up to 81\u2009%. As far as we know this approach has never been intended and therefore allow to be used to better understand the chemical space defined by MNPs, TNPs or both, but also in virtual screening to define the applicability domain of QSAR models of MNPs and TNPs.

Volume 40
Pages None
DOI 10.1002/minf.202060034
Language English
Journal Molecular Informatics

Full Text