Stochastic Environmental Research and Risk Assessment | 2021

Investigating the impact of input variable selection on daily solar radiation prediction accuracy using data-driven models: a case study in northern Iran

 
 
 
 
 

Abstract


Data-driven models have been explored in numerous studies for solar radiation (Rs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${R}_{s}$$\\end{document}) prediction. However, the use of different input variable selection (IVS) methods for improving Rs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${R}_{s}$$\\end{document} prediction accuracy has mostly been neglected. This study explores various IVS methods, including Gamma test (GT), Procrustes analysis (PA) and Edgeworth approximation-based conditional mutual information (EA) and evaluates their ability to improve Rs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${R}_{s}$$\\end{document} prediction accuracy by coupling them with popular non-linear data-driven models, multilayer perceptron (MLP), support vector machine, extreme learning machine and multi-gene genetic programming (MGGP). The partial correlation input selection method was coupled with multiple linear regression to serve as a linear benchmark. Meteorological data from eight stations in northern Iran was used for building the Rs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${R}_{s}$$\\end{document} prediction models. The type and number of variables selected at each station was dissimilar and dependent on the IVS method. The models utilizing EA selected fewer variables compared to the GT method and had higher accuracy, while models using PA selected fewer variables than all methods but were not able to adequately predict Rs\\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{mathrsfs} \\usepackage{upgreek} \\setlength{\\oddsidemargin}{-69pt} \\begin{document}$${R}_{s}$$\\end{document}. It was also found that predictive performance substantially varied when pairing the IVS methods with different model types. For example, MLP, the model with the best average performance, when coupled with EA instead of PA resulted in a\u2009~\u200927% improvement (decrease) in the normalized root mean square error (nRMSE). The results also indicated that MGGP produced the least accurate predictions, where the nRMSE increased by up to 40% compared to MLP when the EA method was used for IVS. Finally, IVS hyper-parameter adjustment (which is routinely overlooked in the literature) profoundly affected the results and is recommended as a very important step to consider when developing data-driven models for solar radiation prediction.

Volume None
Pages 1 - 25
DOI 10.1007/s00477-021-02070-5
Language English
Journal Stochastic Environmental Research and Risk Assessment

Full Text