Journal of Geochemical Exploration | 2019
Predictive lithologic mapping of South Korea from geochemical data using decision trees
Abstract
Abstract Two machine learning algorithms, C4.5 and random forest, collectively known as decision trees were utilized to directly establish the relationship between geochemical maps of South Korea and its geology. Using a large database containing geochemical and lithologic properties, inconsistencies in the a priori lithologic information were fixed using confusion matrix analysis and F-measure comparison via iterative C4.5 implementation. This corrective method resulted in eighteen rock classes but the succeeding C4.5 and random forest application only focused on classifying the 10 most common rock units. Geologic age was included as an attribute at such stage. Results were assessed using accuracy, precision, recall, kappa statistics, and F-measure. Average concentration of major oxides using records of correctly classified rock units were evaluated through Z-score normalization. C4.5 classification successfully predicted the spatial distribution of key lithologic units at 87% whereas random forest classification was at 96%. For both decision tree models, average standardized concentration of major oxides in each lithology adhere to perceived geologic knowledge, thereby proving the validity of the results. Rock age is determined as the most important predictor whereas major elements Al2O3, Na2O, and MgO together with trace elements Cr, Ni, and Cu are the strongest numeric predictors. Misinterpreted data points are mainly due to interpolation errors at or near map polygon boundaries, especially where map polygons are less than 50\u202fkm2, and/or natural and anthropogenic contamination. Despite the misclassifications, decision trees are proven to be effective techniques in classifying lithologic units, and thus can reproduce a reliable geologic map from geochemical data of South Korea.