Archive | 2019

Gradient boosting for the prediction of gas chromatographic retention indices

 
 
 

Abstract


The estimation of gas chromatographic retention indices based on compounds structures is an importantproblem. Predicted retention indices can be used in a mass spectral library search for the identificationof unknowns. Various machine learning methods are used for this task, but methods based on decisiontrees, in particular gradient boosting, are not used widely. The aim of this work is to examine the usability ofthis method for the retention index prediction. 177 molecular descriptors computed with Chemistry Development\xa0Kit are used as the input representation of a molecule. Random subsets of the whole NIST 17 database\xa0are used as training, test and validation sets. 8000 trees with 6 leaves each are used. A neural network\xa0with one hidden layer (90 hidden nodes) is used for the comparison. The same data sets and the set of descriptors\xa0are used for the neural network and gradient boosting. The model based on gradient boosting outperforms\xa0the neural network with one hidden layer for subsets of NIST 17 and for the set of essential oils.The performance of this model is comparable or better than performance of other modern retention prediction\xa0models. The average relative deviation is ~3.0%, the median relative deviation is ~1.7% for subsets of NIST\xa017. The median absolute deviation is ~34 retention index units. Only non-polar liquid stationary phases (such\xa0as polydimethylsiloxane, 5% phenyl 95% polydimethylsiloxane, squalane) are considered. Errors obtained\xa0with different machine learning algorithms and with the same representation of the molecule strongly correlate\xa0with each other.

Volume 19
Pages 630-635
DOI 10.17308/sorpchrom.2019.19/2223
Language English
Journal None

Full Text