Expert Syst. Appl. | 2021

A two-layer feature selection method using Genetic Algorithm and Elastic Net

 
 

Abstract


Abstract Feature selection, as a critical pre-processing step for machine learning, aims at determining representative predictors from a high-dimensional feature space dataset to improve the prediction accuracy. However, the increase in feature space dimensionality, comparing to the number of observations, poses a severe challenge to many existing feature selection methods considering computational efficiency and prediction performance. This paper presents a new two-layer feature selection approach that combines a wrapper and an embedded method in constructing an appropriate subset of predictors. In the first layer of the proposed method, Genetic Algorithm (GA) has been adopted as a wrapper to search for the optimal subset of predictors, which aims to reduce the number of predictors and the prediction error. As one of the meta-heuristic approaches, GA is selected due to its computational efficiency; however, GAs do not guarantee the optimality. To address this issue, a second layer is added to the proposed method to eliminate any remaining redundant/irrelevant predictors to improve the prediction accuracy. Elastic Net (EN) has been selected as the embedded method in the second layer because of its flexibility in adjusting the penalty terms in the regularization process and time efficiency. This two-layer approach has been applied on a Maize genetic dataset from NAM population, which consists of multiple subsets of datasets with different ratios of the number of predictors to the number of observations. The numerical results confirm the superiority of the proposed model.

Volume 166
Pages 114072
DOI 10.1016/J.ESWA.2020.114072
Language English
Journal Expert Syst. Appl.

Full Text