Model ensembles of artificial neural networks and support vector regression for improved accuracy in the prediction of vegetation conditions
Chrisgone Adede, Robert Oboko, Peter W. Wagacha, Clement Atzberger
11 Model ensembles of artificial neural networks and support vector regression for improved accuracy in the prediction of vegetation conditions Chrisgone
Adede *, Robert
Oboko , Peter W. Wagacha and Clement
Atzberger School of Computing and
Informatics,
University of Nairobi (UoN),
P.O.
Box
GPO,
Nairobi,
Kenya; [email protected] (R.O.); [email protected] (P.W.W.) National
Drought
Management
Authority (NDMA),
Lonrho
House ‐ Standard
Street,
Box
Nairobi
Kenya Institute of Surveying,
Remote
Sensing and
Land
Information,
University of Natural
Resources (BOKU),
Peter
Jordan
Strasse A ‐ Vienna,
Austria; [email protected] * Correspondence: [email protected]
Received: date;
Accepted: date;
Published: date
Abstract:
There is increasing need for highly predictive and stable models for the prediction of drought as an aid to better planning for drought response. This paper presents the performance of both homogenous and heterogenous model ensembles in the prediction of drought severity using the study case techniques of artificial neural networks (ANN) and support vector regression (SVR). For each of the homogenous and heterogenous model ensembles, the study investigates the performance of three model ensembling approaches: linear averaging (non ‐ weighted), ranked weighted averaging and model stacking using artificial neural networks. Using the approach of “over ‐ produce then select”, the study used years of data on selected variables for predictive drought monitoring to build individual ANN and
SVR models from which models were selected for the building of the model ensembles. The results indicate marginal superiority of heterogenous to homogenous model ensembles. Model stacking is shown to realize models that are superior in performance in the prediction of future vegetation conditions as compared to the linear averaging and weighted averaging approaches. The best performance from the heterogenous stacked model ensembles recorded an R of in the prediction of future vegetation conditions as compared to an R of and R of for both ANN and
SVR respectively in the traditional champion model approaches to the realization of predictive models. We conclude that despite the computational resource intensiveness of the model ensembling approach to drought prediction, the returns in terms of model performance is worth the investment, especially in the context of the recent exponential increase in computational power. Keywords : ensemble; support vector regression; artificial neural networks; overfitting; drought risk management; drought forecasting; ensemble member selection Introduction
One point of divergence in the practice of drought monitoring is the definition of drought. The differences in the definition of drought is informed by the variance in interest on and the attendant types of drought that is any of meteorological, hydrological, agricultural or socio ‐ economic as documented in UNOOSA (2015).
Despite the lack of a standard definition, the need for drought monitoring is well understood in the context of loses arising from their occurrence and the need for planned action. Losses from past droughts are documented, for example in Government of Kenya (2012) and
Cody (2010) with a detailed review of a range of impacts in Ding,
Hayes & Widhalm (2011).
Drought monitoring happens in the context of drought early warning systems (DEWS) that are increasingly either near real time (NRT) or ex ‐ ante (predictive). NRT systems, for example include the univariate
BOKU system (Klisch & Atzberger, and the
Famine
Early
Warning
Systems
Network (FEWSNET) (Brown et al, that are based on MODIS vegetation datasets and the US drought monitor in Svoboda et al (2002) which is a multi ‐ variate system. Predictive systems are either based on a single variable/index, a multi ‐ variable index or on multiple indices (variables). Single variable indexes, especially that use the standardised precipitation index (SPI) are deployed in Ali et al. (2017) and Khadr (2016).
The single variable indices are generally easy to interpreted as comparted to multi ‐ variable (super ‐ index) indices like the case of the Enhanced
Combined
Drought
Index (ECDI) (Enenkel, that integrates four input datasets (rainfall, soil moisture, land surface temperature and vegetation status).
Increasingly, the use of multiple variables in predictive drought system is gaining currency in an approach where multiple indices are used in to build the predictive systems. Such systems include the us of variables and indices in Tadesse et al. (2014) and that in Adede et al. (2019) that uses variables in the prediciton of future vegetation conditions. The increasing popularity of predictive DEWs in the light of increased damages suffered from droughts together with the proliferation of multiple indices and variables for drought monitoring has led to the need for multi ‐ variate models that are both highly predictive and stable over time to support proactive drought risks management (DRM) initiatives. One way to improve stability and accuracy of models is model ensembling. Variously, model ensembling is defined as the formulation of multiple individually trained models and the subsequent combination of their outputs (Cortes, Kuznetsov & Mohri,
Dietterich,
Opitz & Maclin, and Re & Valentini, In this sense, model ensembling is akin to the innate human nature of seeking multiple opinions in decision making. Model ensembling, therefore, aims to produce more accurate and more stable predictors that arise from the ability to average out noise and therefore achieve better generalizability (Güne ş , Opitz & Maclin,
The common issues in model ensembling include the process for over ‐ production of ensemble base models, the selection of the based model for ensemble membership and the combination of the outputs of the ensemble members. The process of over ‐ production of models has hyper ‐ parameter tuning at the core. Hyper ‐ parameter tuning becomes even more critical in the instances when automation of model building and selection is a key objective like in N ay, Burchfield & Gilligan (2018) in which three (3) hyper ‐ parameters needed to be tuned for gradient boosted machine (GBM). The problem of the selection of ensemble membership deals with the question of which sub ‐ set of the models from the model over ‐ production process offers the best predictive power. The problem of ensemble membership selection is for example documented in Partalas,
Tsoumakas & Vlahavas (2012), Re & Valentini (2012) and
Reid (2007).
The distinct approach to ensemble member prediction includes greedy search (Partalas, Tsoumakas & Vlahavas, that realizes the global best sub ‐ set by taking local optimal decision when changing the current set, ensemble pruning in Re & Valentini (2012) that uses both statistical and semi ‐ definite programming approaches and the statistical ensemble method in Escolano et al. (2003) that uses resampling to estimate accuracy of individual members and multiple comparisons to choose ensemble membership. Generally, it is agreed that no single approach fits all when it comes to ensemble member selection. Related work on ensemble modeling includes the use of bagging and boosting (Belayneh et al. ,2016) and Opitz & Maclin (1999).
While both bagging and boosting are ensembles that aim to generate strong learners from weak learners, the major difference is that bagging averages predictions from multiple sub ‐ sets of the training data with the aim to reduce variance while boosting sequentially learns predictors, first from the whole data set then subsequently on the training set based on previous performance with the aim to increasing bias. Apart from bugging and boosting, an increasingly common approach to model ensembling is stacking that uses a meta/ super leaner to combine weak base predictors to reduce generalization error. Dzeroski & Zenko (2004) documents stacking as having better performance as compared to the selection of the the best classifier. The study in Belayneh et al. (2016), for example, uses both bagging and boosting in drought prediction using wavelet transforms while Ganguli & Reddy (2014) used the copula method on support vector regression (SVR) to simulate ensembles of drought forecasts. Common to both Belayneh et al. (2016) and Ganguli & Reddy (2014) is the use of a single drought index in the prediction of meteorological drought. On the contrary, the systems in Wardlow et al (2012) and Tadesse et al. (2010), for example, use multiple indices in the forecasting of future vegetation conditions. Model ensembles can either be homogenous thus based on same technique (Adhikari & Agrawal, or heterogenous and hence multi ‐ technique (Reid, The objective of this study is to investigate the performance of both homogenous and heterogenous multi ‐ variate model ensembles in the prediction of vegetation conditions month ahead as a proxy to drought conditions using ANN and
SVR as the chosen case study techniques. The homogenous and heterogenous model ensembles are realized using three different ensemble approaches: non ‐ weighted averaging, weighted averaging and ANN driven model stacking. Material and
Methods
Study
Area
The study area (Figure is also documented in Adede et al. (2019) as covering an area of over and is located in Northern
Kenya and covers four counties that are classified as arid counties within Kenya’s arid and semi ‐ arid lands (ASALs). The area is characterized by both low rainfall and low vegetation cover with an average normalized difference vegetation index (NDVI) of below except for May and
Nov that peaks at The rainfall averages around for
Turkana,
Marsabit and
Mandera and a little higher at around in Wajir.
The region experiences a bimodal rainfall pattern with the two seasons in March ‐ May (MAM) and
October ‐ December (OND).
Across the counties in the region, ‐ months are considered wet. Figure The study area (to the right) and its location within
Kenya.
The inset (left) provides the location of Kenya in Africa while the map of Kenya (center) shows the grouping of the Kenyan counties into arid and semi ‐ arid lands (ASAL) and non ‐ ASAL.
Pre ‐ modelling The
Data
The study uses variables and indices derived from precipitation, vegetation, temperature and evapotranspiration datasets. The variables are grouped into three categories: vegetation, precipitation and water balance (influencer) datasets.
The data used in the study covered the period March to December and are as provided in Table Table The study base datasets ‐ categories, sources and description. Category
Base dataset
Source Description
Vegetation
Normalised
Difference
Vegetation
Index (NDVI).
Land
Processes
Distributed
Active
Archive
Center (LPDAAC) as documented in Didan (2015a) and
Didan (2015b)
Combination of both MODIS
Terra (MOD13Q1) & MODIS
Aqua (MCD13Q1) using the
Whittaker smoothing approach (Klisch & Atzberger,
Precipitation
Rainfall
Estimates (RFE)
RFE from both
Tropical
Applications of Meteorology using
SATellite (TAMSAT) and
Climate
Hazard
Group
InfraRed
Precipitation (CHIRPS) as documented in Tarnavsky et al. (2014) and Funk et al. (2014) respectively. TAMSAT version product & CHIRPS version product aggregated and spatially sub ‐ set as in the BKO system.
Water
Balance
Land
Surface
Temperature data (LST)
LPDAAC (Wan,
Hook & Hulley,
MODIS
Terra
Land
Surface
Temperature/Emissivity ‐ Day L3 Global
SIN
Grid
V006 product (MOD11A2)
Evapotranspiration (EVT)
LPDAAC (Running,
S.,
Mu,
Q.,
Zhao,
M.,
MODIS/Terra
Net
Evapotranspiration ‐ Day L4 Global
SIN
Grid
V006 (MOD16A2)
Potential
Evapotranspiration (PET)
LPDAAC (Running,
S.,
Mu,
Q.,
Zhao,
M.,
MODIS/Terra
Net
Evapotranspiration ‐ Day L4 Global
SIN
Grid
V006 (MOD16A2)
Standardized
Precipitation ‐ Evapotranspiration (SPEI) Be SPEI
Global
Drought
Monitor (Beguería et al ,2014) The difference between precipitation and potential evapotranspiration standardised and used like in in standardised precipitation index (SPI) The variables
The above datasets are transformed to indices/ variables that are then used for the predictive study following two approaches ‐ the calculation of relative range difference approach and the standardization approach. The relative range difference is calculated as shown in equation The standardization approach, on the other hand, involves fitting the bases dataset to an appropriate probability distribution so that the mean of the transformed variable is zero (0) and the standard deviation is equivalent to (1). Both the relative range and standardised transformations are done at pixel level ( c ) for each time step ( i ) prior to aggregation. (cid:1844)(cid:1844) (cid:3035) (cid:4666)(cid:1855), (cid:1861)(cid:4667) (cid:3404) 100 ∗ (cid:4666) (cid:4670)(cid:3025)(cid:4666)(cid:3030),(cid:3036)(cid:4667)(cid:2879)(cid:3014)(cid:3010)(cid:3015)(cid:4666)(cid:3030),(cid:3036)(cid:4667)(cid:4671)(cid:4670)(cid:3014)(cid:3002)(cid:3025)(cid:4666)(cid:3030),(cid:3036)(cid:4667)(cid:2879)(cid:3014)(cid:3010)(cid:3015)(cid:4666)(cid:3030),(cid:3036)(cid:4667)(cid:4671) (cid:4667) Where (cid:1844)(cid:1844) (cid:3035) is the scaled relative range difference, X(c,i) the current value,
MIN (c,i) the historical minimum and
MAX (c,i) the historical maximum. (1)
The variables/ indices used in this predictive modelling study are from the datasets above and are described in Table Table Variables used in the study to predict vegetation condition index. Near infrared (NIR) and
Red are the spectral reflectances in near infrared and red spectral channels of MODIS satellite. No Variable
Variable
Description
Index
Calculation NDVIDekad
NDVI for last dekad of the month NDVI = (NIR ‐ Red)/(NIR+Red) VCIdekad
VCI for the last dekad of the month Transformation of NDVI following on Equation (1) VCI1M
VCI aggregated over the month
Transformation of NDVI following on Equation (1) VCI3M
VCI aggregated over the last months Transformation of NDVI following on Equation (1) TAMSAT_RFE1M
TAMSAT
Rainfall
Estimate aggregated over the month
RFE from
TAMSAT version product (in mm) Tarnavsky et al. (2014) TAMSAT_RFE3M
TAMSAT
Rainfall
Estimate aggregated over the last months RFE from
TAMSAT version product (in mm) Tarnavsky et al. (2014) TAMSAT_RCI1M
TAMSAT
Rainfall
Condition
Index aggregated over the last months TAMSAT
RFE calculated using
Equation (1) TAMSAT_RCI3M
TAMSAT
Rainfall
Condition
Index aggregated over the last months TAMSAT
RFE calculated using
Equation (1) TAMSAT_SPI1M
TAMSAT
Standardized
Precipitation
Index aggregated over the last month TAMSAT
RFE transformed to a normal distribution so that SPImean c,i = WMO (2012) TAMSAT_SPI3M
TAMSAT
Standardized
Precipitation
Index aggregated over the last months TAMSAT
RFE transformed to a normal distribution so that SPImean c,i = WMO (2012) CHIRPS_RFE1M
CHIRPS
Rainfall
Estimate aggregated over the month
RFE from
CHIRPS version product (in mm) Funk et al. (2014) CHIRPS_RFE3M
CHIRPS
Rainfall
Estimate aggregated over the last months RFE from
CHIRPS version product (in mm) Funk et al. (2014) CHIRPS_RCI1M
CHIRPS
Rainfall
Condition
Index aggregated over the last month CHIRPS
RFE calculated using
Equation (1) CHIRPS_RCI3M
CHIRPS
Rainfall
Condition
Index aggregated over the last months CHIRPS
RFE calculated using
Equation (1) CHIRPS_SPI1M
CHIRPS
Standardized
Precipitation
Index aggregated over the last month CHIRPS
RFE transformed to a normal distribution so that SPImean c,i = WMO (2012) CHIRPS_SPI3M
CHIRPS
Standardized
Precipitation
Index aggregated over the last months Same as Index
No. LST1M
Land
Surface
Temperature aggregated over the month
Average
LST over the last one month EVT1M
Evapotranspiration aggregated over the month
Average
MODIS
EVT over the last one month PET1M
Potential
Evapotranspiration aggregated over the month
Average
MODIS
PET over the last one month TCI1M
Temperature
Condition
Index aggregated over the month
MODIS
LST transformed using
Equation (1) SPEI1M
Standardized
Precipitation
Evapotranspiration aggregated over the month
Follows the standardization approach on the difference between precipitation (cid:4666)(cid:1842) (cid:3036) (cid:4667) and potential evapotranspiration ( (cid:1842)(cid:1831)(cid:1846) (cid:3036) (cid:4667) using the log logistic probability distribution SPEI3M
Standardized
Precipitation
Evapotranspiration aggregated over the last months Follows the standardization approach on the difference between precipitation (cid:4666)(cid:1842) (cid:3036) (cid:4667) and potential evapotranspiration ( (cid:1842)(cid:1831)(cid:1846) (cid:3036) (cid:4667) using the log logistic probability distribution Variables ‐ are vegetation indices while variables ‐ to variables are two sets of precipitation indices from TAMSAT (5 ‐ and CHIRPS (11 ‐ respectively. The study methodology is designed to select between TAMSAT and
CHIRPS for the modeling process.
Variables ‐ are the commonly used together with the vegetation and precipitation indices in predictive drought modeling and are used in this study as influencer variables. The vegetation datasets in Table (variable ‐ are smoothed using a modified Whittaker smoothing algorithm and are directly sourced from the
Institute for
Surveying,
Remote
Sensing and
Land
Information,
University of Natural
Resources and
Life
Sciences (BOKU).
The vegetation datasets are calculated at pixel level prior to aggregation in both time and space. The vegetation condition index (VCI) variables are calculated following on the relative range formula in Equation The precipitation datasets from both
CHIRPS and
TAMSAT are also calculated at pixel level prior to aggregation. While the
RCI based datasets (variables & follow on the Equation the SPI variables (9,10,15 & have the base dataset (RFE) fitted to a probability distribution then transformed to a normal distribution so that the SPI has a mean of zero (0) and a standard deviation of one (1) as recommended by WMO(
The
SPEI datasets are also standardised following the log logistic probability distribution (Beguería et al., Vicente ‐ Serrano et al., The learning scenario To be able to develop predictive drought monitoring models, the phenomenon of drought needs to well defined. From the literature review, drought has key characteristics in its definition including spatial coverage, a severity dimension and temporal aspect. With these key concepts, we formulate the predictive drought monitoring problem using Equation (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) (cid:3404) (cid:1858)(cid:4666)(cid:1876) (cid:2869) , (cid:1876) (cid:2870) , (cid:1876) (cid:2871) ⋯ (cid:1876) (cid:3041) (cid:4667) (2)Where (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) is a quantification of drought severity (intensity) for a spatial extent i at time j , (cid:1858) is a function that accepts a set of n (n ≥ variables and transforms them to approximate the real value for drought severity (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) . The n variables (cid:1876) (cid:2869) , (cid:1876) (cid:2870) , (cid:1876) (cid:2871) ⋯ (cid:1876) (cid:3041) are predictor variables that are used in drought monitoring. The learning scenario in this study therefore involves the need to define (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) for all the four counties in the study area and to over ‐ produce and select the appropriate f’s to be selected for model ensembling. We interpret f as machine learning (ML) technique generated functions using artificial neural networks (ANN) and support vector regression (SVR) as case study techniques. Definition of drought ( (cid:1830) (cid:4666)(cid:2919),(cid:3037)(cid:4667) (cid:4667) Given that drought severity , (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) (cid:4667) , cannot be directly quantified, the practice is to use proxy index(es) in its quantification. The most common proxies used in the definition of drought ( (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) (cid:4667) are the standardised precipitation index (SPI) and the Vegetation
Condition
Index (VCI), both sets of which data and variables are part of the study. Based on McKee et al. (1993), the SPI standardizes precipitation through an initial transformation and subsequent fitting to a normal distribution. The results are values typically between ‐ and +3 with higher values indicating wet conditions. The
VCI (Liu & Kogan,
Klisch & Atzberger, on the other hand, is a relative range normalization of the NDVI in the range (both end points included) that is generally scaled to the ‐ range. The occurrence of a new minimum or maximum will, however, result in values below and beyond respectively. The study considered the
SPI and the
VCI approach in the definition of drought on the ‐ month aggregation of the indices. From the properties of both the SPI and
VCI described above, we chose the VCI as the basis of definition of drought for three reasons. First, is the ease in its interpretation, that with a range of to Second is the fact that, as an index, it is used in the monitoring of agricultural drought which is a later stage drought as compared to meteorological drought indicated by SPI and; third is the fact that it is a measured quantity as opposed to the SPI that is, for the case of this study, is a modelled quantity. The task of drought prediction is therefore expressed as the task of predicting future (1 month ahead) VCI values aggregated over three months (VCI3M) using lagged values of the study variables. Methods
Variable selection
The study processed a duplicate set of precipitation variables, both from TAMSAT and
CHIRPS.
The selection of variables in this study was thus formulated as the selection between TAMSAT and
CHIRPS variables for the prediction of future drought intensity as defined by VCI3M.
Multiple methods were used in the selection between the two datasets to quantify the relationship between the variables and vegetation conditions as quantified by VCI3M.
Variable selection was, however, preceded by the establishment of appropriateness for purpose of the methods through the test for normality using both density plots and Shapiro
Wilk test.
Subsequently, correlational analysis, step ‐ wise regression, Akaike information criterion (AIC),
Relative importance (RImp) of variables and a modelling approach to variable selection were used to establish which datasets between TAMSAT and
CHIRPS to use in the modelling process. Modelling methodology
With the drought prediction problem formulated as in Equation the modelling methodology is reduced to the search for all the f’s that approximate the drought severity ( (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) ). This definition mirrors that of the generic machine learning (ML) problem and, in the context of model ensembling using the over ‐ produce, select and combine approach, it equates a model space search for all the functions ( f’s ) from the set of all functions ( (cid:1832) (cid:3022) ) deemed to approximate ( (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) ) with some degree of accuracy measured by some model performance metrics. The functions ( f’s ), are from the family of artificial neural networks (ANN) and support vector regression (SVR), in the context of this study. The above formulation gives rise to the following key concepts. The variable space (cid:4666)(cid:1876) (cid:2869) , (cid:1876) (cid:2870) , (cid:1876) (cid:2871) ⋯ (cid:1876) (cid:3041) (cid:4667) corresponds to all the variables used to measure ( (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) ), the model space ( (cid:1832) (cid:3022) ) is the set of all possible models that can be derived from all the possible combination of the variables deemed to measure both ( (cid:1830) (cid:4666)(cid:3036),(cid:3037)(cid:4667) ) using the machine learning technique M that is either ANN or SVR.
Model building
Bagging, as a method of model ensembling is used as a standard to build both the ANN and
SVR models in a setup in which the training dataset is resampled as indicated in the model building process presented in Figure Not indicated in the model building process is normalization of the variables prior to building of the models to ensure the input variables were all at a comparable range. The main steps in the models building process (Figure are ‐ sampling, actual model building and model performance evaluation for both ANN and
SVR techniques.
Figure 2: Model building process from model over-production to model selection for ensemble membership. The actual model building process using both ANN and SVR are preceded by a model space reduction process undertaken by two distinct steps. First, the formulation of assumptions and second, the use of a cut-off criteria on models considered predictive enough to be include in the ensemble.
Rea all models (from file)
BeginAll
Models
Built? Select next model K=K+1
Log model bootstrap performance
Bootstraps(K ≤ Split training data (70:30)
Train {ANN & SVR} models on {ANN & SVR} models on Select
Ensemble
Members
Combine
Ensemble
Outputs
Evaluate
Ensembles
End
Yes N N Model space reduction & the ensemble membership Post selection between
TAMSAT and
CHIRPS variables, the study has variables. The initial cardinality of the modelling space, with VCI3M as the target variable, is a massive models from all the lagged predictors. The distribution of the models by count for each given number of variables, from to is as illustrated in Figure Figure Model space of the drought severity prediction problem. The figure indicates the number of models of each length for a total of models. To achieve a reduction in the cardinality of the model space, we group the variables into categories and follow this up with assumptions. The variables are grouped into three categories: precipitation (VCI3M,
NDVIDekad,
VCI1M & VCIdekad), vegetation (RFE1M,
RFE3M,
RCI1M,
RCI3M,
SPI1M & SPI3M) and influencer variables (LST1M,
EVT1M,
PET1M,
TCI1M,
SPEI1M & SPEI3M).
The influencer variables have an element of either accounting for impact of temperature on drought severity or indicating water balance as a function of both supply and demand. The key assumption made by the study is that for clarity and interpretability of models, one variable of same category is sufficient in a single model. This assumption was validated for soundness in Adede et al. (2019). The process of model space reduction is presented in Figure Figure The model space reduction process.
The assumption of including only one variable of a type: precipitation, vegetation and those of water balance (influencer variables) massively reduces the set of possible models from to All the models were build using both the
ANN and
SVR techniques A further reduction was achieved by only retaining the models that had R ≥ from both the ANN and
SVR techniques.
The final set was reduced by two models, to from the ANN technique that we judged to have more than a loss in performance between training and validation and were thus overfit. Given that multi ‐ model ensembles combine the predictions from different models and the fact there is need to search for the smallest sub ‐ set that is most predictive, we used the models from the ANN process as the basis for the selection of the ensemble membership. The ensemble membership was realized from an experimental process. The experimental process involved the ranking of all the models by descending R , the iterative elimination of the lowest ranked models in batches of from the ensemble and a recalculation of the performance of the ensemble as measured by R . The elimination stops when a reduction in performance is realized. After the recording of a drop ‐ in performance, the last batch of are eliminated and added to the model one at a time from the best former only for that that do not degraded ensemble performance. The membership is then chosen as the least membership for which R is greedily maximized. Sampling process To build the requisite set of models, the dataset (March to December ‐ is partitioned into in ‐ sample and out ‐ sample data in an approach similar to that used in Adede et al. (2019) and Nay,
Burchfield & Gilligan (2018).
The in ‐ sample data is, subsequently, repeatedly and randomly split into training and validation datasets in the ration of for each iteration of the model building process. The partitioning of the dataset is as shown in Figure Figure The in ‐ sample (70:30) approach to model building and validation with a separate (out ‐ sample) dataset for model testing. ANN and
SVR model building process ‐ model parameters and model assessment The models formulated after all assumptions were accounted for were subjected to an automated brute ‐ force process that built, assessed and logged the performance of each of the models, both for the ANN and for the
SVR case study techniques.
The choice of the model hyper ‐ parameters for the ANN process followed the rule of thumb in Huang (2003) together with an experimental process in obtaining the appropriate number of hidden layers. The
ANN were trained using resilient backpropagation (RPROP).
RPROP is documented to have faster convergence speed, accuracy and robustness without the requirement for parameter tuning as documented Riedmiller & Braun (1993) and in Chen & Lin (2011).
For
SVR, we did an initial run that realized the best performance of all the models at an epsilon of and a cost parameter of The search for the single optimal configuration for the
SVR technique followed the grid search approach using the statistical computing software ‐ R. Model performance evaluation was done using the determinant of correlation (R ). The error measures like mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE) were also calculated. In Sample (2001 – ‐ Sample (2016 – Model ensembling approaches
The different models built were selected and their scores combined in two distinct approaches: homogenous ensembles in which only models from one technique are combined and heterogenous ensembles that combines the outputs from both ANN and
SVR techniques.
For both the homogenous and heterogenous ensembles, three methods of ensembling were investigated including linear averaging (non ‐ weighted), rank weighted averaging and model stacking. While linear averaging assumes similar weights for the individual models, weighted averaging uses the performance of the individual models in evaluation to assign weights on their prediction on the out ‐ of ‐ sample data. The linear average approach and the rank weighted approach follow the equations in and respectively. (cid:2869)(cid:3041) ∑ (cid:1868) (cid:3036)(cid:3041)(cid:3036)(cid:2880)(cid:2869) …………. (3) where (cid:1868) (cid:3036) as the prediction from the i th model and (cid:1866) is the number of models in the ensemble (cid:2869)(cid:2924) ∑ p (cid:2919) ∗ w (cid:2919)(cid:2924)(cid:2919)(cid:2880)(cid:2869) …………. (4) Where w (cid:2919) is the normalized weight for each model in the ensemble such that ∑ w (cid:2919)(cid:2924)(cid:2919)(cid:2880)(cid:2869) (cid:3404) 1. The weights are therefore stretched between and centred around min with the max as the scale. The model stacking approach, however, builds a meta ‐ model using the ANN process to learn weights for the individual models using a perceptron learning approach. Performance evaluation of the model ensembles The process for evaluating the performance of the model ensembles had the following steps ‐ (1) the selection of a common measure of performance; (2) the identification of the base models for comparison and; (3) scenarios of performance of the best individual models as compared to model ensembles. Performance in regression was evaluated using R while performance in classification was evaluated using both accuracy and area under the receiver operating characteristic curve (AUROC). The best models from both the
ANN and
SVR processes were used as the base models. Performance in classification was analyzed following the five (5) vegetation deficit classes defined on VCI3M as used in Klisch,
Atzberger & Luminari (2015),
Klisch & Atzberger (2016),
Meroni et al (2019) and Adede et al. (2019). This classification is presented in Table Table Drought classes using the approach in Klisch,
Atzberger & Luminari (2015),
Klisch & Atzberger (2016),
Meroni et al (2019) and Adede et al. (2019). VCI3M
VCI3M
Description of Class
Drought
Class
Limit
Lower
Limit
Upper ≤ <10 Extreme vegetation deficit <20 Severe vegetation deficit <35 Moderate vegetation deficit <50 Normal vegetation conditions ≥ Above normal vegetation conditions Results and
Discussion
Selection between
TAMSAT & CHIRPS
The choice between
TAMSAT and
CHIRPS was undertaken using multiple methods.
While the
SPI variables are normalized, the other variables were tested to normality using the Shapiro
Wilk test for normality.
The
Shapiro ‐ Wilk test results are as provide in Table Table Shapiro ‐ Wilk test on normalized CHIRPS and
TAMSAT datasets No Variable p ‐ value TAMSAT_RFE1M CHIRPS_RFE1M TAMSAT_RFE3M CHIRPS_RFE3M TAMSAT_RCI1M CHIRPS_RCI1M TAMSAT_RCI3M CHIRPS_RCI3M
With the null hypothesis corresponding to the sample drown belonging to a normally distributed population, we reject normality for all cases where the p ‐ value ≥ at α =0.05. The non ‐ SPI variables are not normally distributed.
Given the mix of normally distributed and non ‐ normally distributed variables, the choice of methods for analysis therefore used non ‐ parametric methods. The spearman’s correlation on the ‐ month lag of the variables returned the results in Table for each of the indicators for each of the TAMSAT and
CHIRPS datasets.
Table Spearman’s
Correlation of VCI3M against ‐ month lag of TAMSAT/CHIRPS
RFE1M
RFE3M RCI1M RCI3M SPI1M SPI3M
Mean
TAMSAT
CHIPRS
The
TAMSAT variables are highly correlated to drought severity as compared to CHIRPS variables, except for
RCI1M.
Using bi ‐ directional step ‐ wise regression, the results offer a mixed case in which four variables were selected: TAMSAT_SPI3M_lag1,
CHIRPS_RCI3M_lag1,
CHIRPS_SPI3M_lag1 and
TAMSAT_RCI1M_lag1.
Though
TAMSAT_SPI3M ranks high,
CHIRPS variables are also as competitive. TAMSAT ranks, consistently higher on SPI that is widely used to drought monitoring. The
Akaike information criterion (AIC), generally used to estimate the quality of a model relative to others had the ‐ month aggregates of both SPI and
RCI from
TAMSAT as the best predictors. These results were confirmed by the relative importance of variables as partitioned by R^2 that also ranked
SPI3M and
RCI3M from
TAMSAT ahead of the other variables. A final selection of variables using SVR and general additive model (GAM) techniques posted the result in Figure The
SVR model generally outperforms the
GAM model for each of the variables except for TAMSAT_RCI3M and
TAMSAT_SPI1M where similar performance is realized for the two models. The top performers in each case is are two TAMSAT variables. Figure R for SVR and
GAM models for variable selection
From the multiple selection metrics, even though both precipitation datasets were competitive in drought prediction, TAMSAT generally produced better ‐ ranked variables and was the dataset chosen for model building. Multi ‐ collinearity of predictor variavles In investigation of possible multi ‐ collinearity between the pairs of the predictor variables is provided in Figure Figure The collinearity matrix for X (predictors) variables From the correlation matrix (Figure the following are observed: (1) relatively high correlation (of up to for VCI1M and
NDVIDekad) between vegetation datasets.
The assumption not to use multiple vegetation variables in the same model is thus justified. (2) SPI and
RCI are highly correlated even though, in general, the pairings of precipitation data could be used in the same model. (3) the modifier variables of LST,
EVT,
PET,
TCI and
SPEI have acceptable correlation coefficients with vegetation and precipitation pairings.
Grouping the variables in the modeling process into precipitation, vegetation and water balance (modifier) variables is thus a sound assumption. Correlation between predicted variable and all predictors
The drought intensity formulated based on VCI3M and with higher values being essentially lower severity. As indicated in Table the variables TCI,
LST and
PET are negatively correlated to VCI3M while all others have a positively correlation to VCI3M.
With the assumption not to have two variables of the same type, it is clear that the problem of multi ‐ collinearity is avoided given that the lags of variables highly correlated with VCI3M i.e.
VCI3M,
VCI1M and
VCIdekad lagged by month are not used together in the same model. Table Correlation between non ‐ precipitation data and VCI3M
Variable
Correlation with drought severity (VCI3M)
TCI1M_lag1 ‐ LST1M_lag1 ‐ PET1M_lag1 ‐ NDVIDekad_lag1
SPEI1M_lag1
RFE1M
SPEI3M_lag1
RCI1M
SPI1M
RFE3M
EVT1M_lag1
RCI3M
SPI3M
VCI3M_lag1
VCI1M_lag1
VCIdekad_lag1
Performance of ANN in model Training models from the different combination of the study variables were subjected to training using both ANN and
SVR techniques to predict values of VCI3M month ahead. For the
ANN process, models, marking of the models, were found to have been overfit as indicated on the performance of the ANN models in Figure Figure Performance of ANN models in the prediction of VCI3M grouped by descending performance (R ) From
Figure out of the models subjected to the model training process, models (59.43%) have an R ≥ in the validation dataset. Interestingly, the overfitting is much less a problem amongst the models with R ≥ as only models representing of the models are considered to suffer over ‐ fitting. The first models ordered by decreasing R in the validation dataset are presented in Table Table Performance of the top ANN models in training and validation ordered by descending R in the validation dataset No Model R Training R Validation
Overfit
Index
Overfit VCI3M_lag1 + TAMSAT_RCI3M_lag1 + SPEI1M_lag1 VCIdekad_lag1 + TAMSAT_RFE1M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + TCI1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RCI3M_lag1 + TCI1M_lag1 ‐ VCI3M_lag1 + TAMSAT_SPI3M_lag1 + SPEI1M_lag1 VCI1M_lag1 + TAMSAT_SPI3M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + PET1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + TCI1M_lag1 ‐ VCI1M_lag1 + TAMSAT_SPI3M_lag1 + PET1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + LST1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + PET1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RFE3M_lag1 + TCI1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RFE1M_lag1 + TCI1M_lag1 ‐ VCI1M_lag1 + TAMSAT_SPI3M_lag1 + LST1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + PET1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RFE3M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + SPEI1M_lag1 ‐ VCI3M_lag1 + TAMSAT_RCI3M_lag1 + SPEI3M_lag1 VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + LST1M_lag1 ‐ VCI1M_lag1 + TAMSAT_SPI1M_lag1 + PET1M_lag1 ‐ VCI1M_lag1 + TAMSAT_SPI1M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + LST1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RCI3M_lag1 + PET1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI1M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + SPEI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RFE1M_lag1 + LST1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RCI3M_lag1 + LST1M_lag1 ‐ VCI1M_lag1 + TAMSAT_RFE1M_lag1 + LST1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RFE1M_lag1 + SPEI1M_lag1 ‐ The top models are noted to have no case of model over ‐ fitting. In fact, models out of the models (10% of the top are considered to be underfit models are therefore performed better in the validation dataset as compared to the training dataset. An extended analysis shows this trend as replicated in the top models that have no occurrence of model over ‐ fitting with of the top models actually underfit. Performance of SVR in model Training
When subjected to the SVR process, of the models (59.43%) have R ≥ in the validation dataset. This performance is comparable to the ANN technique. In terms of model overfitting, of the models (8.61%) are overfit as shown in Figure Figure Performance of SVR models in the prediction of VCI3M by grouped by descending performance (R ) Like the case for
ANN, there is no occurrence of overfitting in the top and top models ordered by descending R in the validation dataset. In fact, the occurrence of overfitting for the SVR technique is confined to models with R ≤ In as setting where selection of models is based on the R ≥ cut off, the problem of model over ‐ overfitting would be confined to the ANN technique as compared to the SVR technique.
The tendency to suffer over ‐ fitting in ANN with increase in performance is for example documented in Mitchel (1997).
Like in the case for the ANN technique, we present, in Table the performance in both training and validation the top SVR models by descending R in the validation dataset. Table Performance of the top SVR models in training and validation ordered by descending R in the validation dataset No Model R Training R Validation
OverfitIndex
Overfit VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + TCI1M_lag1 VCI3M_lag1 + TAMSAT_RCI3M_lag1 + SPEI1M_lag1 VCI3M_lag1 + TAMSAT_SPI3M_lag1 + SPEI1M_lag1 VCI1M_lag1 + TAMSAT_SPI3M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + PET1M_lag1 VCI1M_lag1 + TAMSAT_RCI3M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + LST1M_lag1 VCIdekad_lag1 + TAMSAT_RFE1M_lag1 + TCI1M_lag1 VCI1M_lag1 + TAMSAT_SPI3M_lag1 + PET1M_lag1 VCI1M_lag1 + TAMSAT_SPI3M_lag1 + LST1M_lag1 VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + PET1M_lag1 VCI1M_lag1 + TAMSAT_RFE1M_lag1 + TCI1M_lag1 ‐ VCI1M_lag1 + TAMSAT_SPI1M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_RCI1M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + LST1M_lag1 VCIdekad_lag1 + TAMSAT_RFE3M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_RFE1M_lag1 + LST1M_lag1 VCIdekad_lag1 + TAMSAT_SPI1M_lag1 + LST1M_lag1 VCI3M_lag1 + TAMSAT_RCI3M_lag1 + TCI1M_lag1 VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + SPEI1M_lag1 VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + SPEI1M_lag1 VCIdekad_lag1 + TAMSAT_RCI3M_lag1 + PET1M_lag1 VCI1M_lag1 + TAMSAT_RCI3M_lag1 + LST1M_lag1 VCI1M_lag1 + TAMSAT_SPI1M_lag1 + PET1M_lag1 VCI3M_lag1 + TAMSAT_RCI3M_lag1 + SPEI3M_lag1 VCI1M_lag1 + TAMSAT_RFE3M_lag1 + TCI1M_lag1 ‐ VCIdekad_lag1 + TAMSAT_SPI3M_lag1 + EVT1M_lag1 VCI1M_lag1 + TAMSAT_RCI1M_lag1 + TCI1M_lag1 Comparative performance of the ANN & SVR techniques
The analysis of the performance of the pairings of the ANN and
SVR models is as presented in Figure
The
ANN and
SVR techniques turned out competitive in model validation. models (52%) posted similar performance with ANN outperforming
SVR in pairings (43%) and SVR outperforming
ANN in pairings (5%). Figure
Performance difference between
ANN and
SVR model pairings.
The analysis of the competitiveness of the ANN and
SVR techniques is done using the summary statistics of summary statistics of minimum, maximum, average on the models that have R ≥ and are not overfit from the ANN process. As indicated in Table the techniques are quite competitive in this set of models. Table Summary of performance (R ) for each technique in model validation. Technique
Min
Max
Average
Range
SVR
ANN
Given no models with R ≥ were overfit from the SVR process, the choice of models for model ensembling was therefore a function of models from the ANN process.
The selection of the appropriate models for ensemble membership was thus from the ANN models paired with the corresponding
SVR models of the same formula. Selection of ensemble membership Ensemble membership, also model pruning, is the selection phase of model ensembling. The different reasons for model selection as documented in Mendes ‐ Moreira et al. (2012) are to reduce computational costs, if possible, to increase prediction accuracy and to avoid the multi ‐ collinearity problem. From the models that had an R ≥ and were not overfit, the construction of the ensemble membership faced two questions. First, was if all the models were sufficient for the size of an ensemble and two, if there existed a smaller ensemble size that would perform the same, if not better than the This question was answered following on the experimental process described in the methodology section and whose results are as plotted in Figure
Figure
Ensemble membership selection showing the reduction from models to models in the ensemble using the back ‐ forward selection procedure. The models are eliminated in batches of but in instances where a drop of performance is realized, a forward selection beginning with the last smallest ensemble size is done by the addition of one model at a time (orange lines in the plot). The elimination of the ranked models in batches of and the forward selection in single units realizes the best smallest performance of the ensemble membership at a total of models. For the ensemble membership, we opted for the size of due to the trade ‐ off of size and performance. This is guaranteed to have a reduced computational complexity associated with the ensemble size choice whilst not losing performance. Performance of ANN and
SVR champion models in out ‐ of ‐ sample data In training, two different models are identified as the champion for each of the ANN and
SVR techniques.
Both the
ANN and
SVR champion models posted an R of in the validation dataset. The performance in training was also a comparable R of for the ANN to for the SVR model.
The performance of the ANN and
SVR processes in the out ‐ of ‐ sample dataset using the data points is a contradiction of the performance in model training. The
ANN champion posted a relatively stable performance of an R of (4% loss in performance) while the SVR champion loses performance to record an R of The above is despite the fact that a new SVR model arises with a higher performance than the SVR champion model.
The performance in the test data at county level for both the ANN and the
SVR is as provided in Table
Table
Performance (R ) of the champions models for each of the counties in the study area. Mandera
Marsabit
Turkana
Wajir
ANN
SVR
The results at county level indicates the consistency of ANN champion in out ‐ performing the SVR champion except for
Turkana county where
SVR out performs
ANN.
Given that the models built are non ‐ county specific, the performance of the best ANN and
SVR models across the counties remains acceptable at an R ≥ and R ≥ for ANN and
SVR respectively.
The study therefore uses the R of from Table as the baseline performance of the best model approach to model selection. These values will be used as the basis for comparison with the performance of both the homogenous and heterogenous model ensembles. Performance of homogenous model ensembles From the models were selected for the investigation of model ensembles, we build both homogenous models of ANN and
SVR independently.
The performance of the homogenous ANN and
SVR models were then investigated in both regression and classification. Performance of homogenous ensembles in regression The performance of the homogenous model ensembles in regression is provided in Table and Table for ANN and
SVR techniques respectively.
For each technique, results from the three ensembles approaches of non ‐ weighted, weighted and stacked are presented. Table
Performance (R ) of the ANN homogenous model ensembles for each county.
Each approach has the results derived from the non ‐ weighted, weighted and stacked approaches to model ensembling. Approach Mandera Marsabit Turkana Wajir
Overall
ANN
Champion
ANN
Homogenous
Simple
Average
ANN
Homogenous
Weighted
Average
ANN
Homogenous
Stacked Table
Performance (R ) of the SVR homogenous model ensembles for each county.
Each approach has the results derived from the non ‐ weighted, weighted and stacked approaches to model ensembling. Approach Mandera Marsabit Turkana Wajir
Overall
SVR
Champion
SVR
Homogenous
Simple
Average
SVR
Homogenous
Weighted
Average
SVR
Homogenous
Stacked
For both
ANN and
SVR techniques, the model stacking approach to the formulation of model ensembles offers the best improvement as compared to the best model approach. In general, across the counties, the weighted together with the non ‐ weighted approaches to model ensembling provide competitive performance to the best model approach. posts mixed results when compared to the non ‐ weighted approach. In fact, for both the weighted averaging and the simple averaging approach, the performance remains competitive as compared to the best model approach. Cases of loss of performance are also recorded using these two approaches like the case of SVR for
Turkana county and simple average for
Mandera county.
Performance of homogenous ensembles in classification The summary of the performance of both the ANN and
SVR homogenous ensembles is presented using both model accuracy in Table and Table respectively. Table
Classification accuracy for the
ANN homogenous ensembles.
Mandera Marsabit Turkana
Wajir
Overall
ANN
Champion
ANN
Homogenous
Simple
Average
ANN
Homogenous
Weighted
Average
ANN
Homogenous
Stacked
Table
Classification accuracy for the
SVR homogenous ensembles.
Mandera Marsabit Turkana
Wajir
Overall
SVR
Champion
ANN
Homogenous
Simple
Average
ANN
Homogenous
Weighted
Average
ANN
Homogenous
Stacked
Using the three approaches to the building of model ensembles, for both the ANN and
SVR approaches, it is clear that the homogenous ensembles are superior to the traditional champion model selection approach. The simple averaging (non ‐ weighted) approach together with the weighted averaging approach are noted to offer improved performance for the homogenous ANN model ensembles. In the case of SVR homogenous model ensembles, even though performance gains are recorded as compared to the base champion model, it is the case that simple non ‐ weighted averaging loses performance since there existed alternative models to the champion with higher performance. Performance of heterogenous model ensembles The performance of the heterogenous model ensembles of ANN and
SVR was assessed both in regression and classification. Like in the case of homogenous models, we use the champion models (ANN & SVR) as the base models for the evaluation. Given that the predictions for the models were averaged for each model across the techniques, the still were the average of models for each input data point in the test data. Performance of heterogenous ensembles in regression The performance of the heterogenous models is presented, at county level in Table with the champion ANN and champion
SVR as the base models. Table
Performance (R ) of the heterogenous model ensembles for each county. Each approach has the results derived from the non ‐ weighted, weighted and stacked approaches to model ensembling. Mandera
Marsabit
Turkana
Wajir
Overall
ANN
Champion
SVR
Champion
Heterogenous
Simple
Average
Heterogenous
Weighted
Average
Heterogenous
Stacked
With the champion model approach as the benchmark, loss of performance in regression is recorded for Mandera and
Wajir counties for
Simple
Averaging and
Weighted
Averaging respectively with performance in Turkana considered relatively stable for these two approaches.
The heterogenous stacked approach in regression is seen to offer the best improvement in performance across each of the counties and on the entire dataset. The improvement in performance by R using the heterogenous stacked approach and the ANN champion as the base ranges from (Turkana) to in Wajir.
Performance of heterogenous ensemble in classification The classification accuracy of the heterogenous ensemble as compared to that of the champion models is presented in Table
Table
Classification accuracy of the heterogenous ensemble Mandera
Marsabit
Turkana
Wajir
Overall
ANN
Champion SVR
Champion Heterogenous
Simple
Average Heterogenous
Weighted
Average Heterogenous
Stacked The accuracy of the heterogenous model ensembles in classification (Table shows quite an overall improvement in performance of and for the ANN and
SVR processes respectively.
For
Turkana county, the
SVR champion however outperforms the
ANN champion and the three approaches to homogenous model ensembling. Further analysis of the performance of the heterogenous stacked ensembles Given the superiority of the stacking approach to model ensembling as applied to heterogenous ensembles, we present a further analysis of their performance ‐ both in regression and in classification of the out ‐ of ‐ sample data. The performance of the heterogenous stacked approach to ensembling in regression is presented in Figure for each of the data points in the out ‐ of ‐ sample datasets of the counties in the study area. Figure
Plot of the actual values of VCI3M versus the values predicted month ahead from the heterogenous stacked ensembles in the test data over months for (a) Mandera (R =0.94); (b) Marsabit (R =0.94); (c) Turkana (R =0.91) and (d) Wajir (R =0.96). The heterogenous stacked model, as indicated in Figure posts quite a good agreement between the measured vegetation conditions as compared to the predicted values with an R of between and for the counties. In classification, the stacking approach on the heterogenous models realizes the performance presented earlier in Table
The month by month performance of the heterogenous stacked classifier is as presented in Figure
Figure
Performance of the heterogenous ensemble classifier for the each of the counties showing months of difference in grey and those of agreement in blue. Predictions are done month ahead. The classification accuracies are: (a) for
Mandera county; (b) for
Marsabit county; (c) for
Turkana county and; (d) for
Wajir county. The months in blue indicate when the class prediction is correct while grey indicates non ‐ correct predictions. This illustrates the heterogenous stacked ensemble accuracy range from a best of in Marsabit to a minimum of for Mandera.
Clearly, the performance of the heterogenous stacked classifiers with an overall accuracy of is superior to that of the best champion model (ANN) that posts accuracy of over the entire test dataset. Even any county level, the heterogenous stacked classifier outperforms the champion classifiers across all counties except for
Turkana where the
SVR champion performs better.
Moderate to extreme vegetation deficits correspond to the occurrence of drought events. Their correct prediction forms the best judgment on the utility of the outputs of the modelling approaches. Table presents the performance in the prediction of moderate to extreme vegetation deficits across the different classes for the counties in the study area. Table
Performance in the prediction of moderate to extreme drought of the heterogenous stacked ensembles compared to the best ANN and
SVR models
County
ANN
Champion
SVR
Champion
Heterogenous
Stacked
Ensemble
Mandera
Marsabit
Turkana
Wajir
Overall
The heterogenous stacked ensemble is shown in Table to offer the best performance in the prediction of moderate to extreme vegetation deficit. The distributions across the counties is also acceptable as compared, for example, to the SVR champion that performs below chance (accuracy less than for
Mandera county. Conclusion
The precipitation datasets from both
TAMSAT and
CHIRPS were not normally distributed. A multiple metrics investigation indicated the marginal superiority of TAMSAT and compared to CHIRPS in correlation to drought intensity. The use of the backward ‐ forward approach in the selection of models for ensemble membership was realized to be a viable one as it reduced the model space to a total of models from a set of models. The traditional approach to model selection that ends up with one champion model based on their performance on the validation dataset is shown to be prone to loss of model performance as evidenced by the SVR process in which a loss in performance from to was recorded. The building of model ensembles would not only guarantee stability but ensure increased accuracy in model performance. The model ensembling approaches investigated in the study included non ‐ weighted averaging, weighted averaging and model stacking as applied to both homogenous and heterogenous model ensembling approaches. Empirically, it was shown that heterogenous ensembles are generally more robust as compared to homogenous ensembles. Also, model stacking is indicated to be the surest way to realize model ensembles that are better in performance as compared to the champion model approach. In ‐ fact, it is empirically shown that a loss in performance could be suffered when the averaging approaches are used especially when the models in the ensembles are selected based on a common performance metric. The performance of the models learnt using the heterogenous model stacking approach are noted to be robust both in terms of regression and classification and also in the performance when generalized for the individual units even when models learn were not administrative unit specific. This is a key finding since it implies the approach is robust enough to learn a single model applicable in the prediction across multiple administration units. The performance of the models in the prediction of moderate to extreme vegetation deficit for the models records an R of between and at county level. Given that the prediction of these conditions forms the practical application for the ensemble models for a good performance and a guaranteed utility of the forecast since they are way better than chance. The study however, advices the use of more techniques in the model ensembles and the building of many more ensembles using different ensemble sizes to fully settle the question of performance of model ensembles. There is a guaranteed gain in performance in using model stacking as an approach to building homogenous model ensembles. This is evidenced by the ‐ improvement in accuracy for the ANN homogenous ensemble across the four counties.
Author
Contributions:
Conceptualization,
Chrisgone
Adede;
Formal analysis,
Chrisgone
Adede;
Investigation,
Chrisgone
Adede;
Methodology,
Chrisgone
Adede,
Robert
Oboko,
Peter
Wagacha and
Clement
Atzberger;
Supervision,
Robert
Oboko and
Peter
Wagacha;
Validation,
Robert
Oboko,
Peter
Wagacha and
Clement
Atzberger;
Visualization,
Chrisgone
Adede;
Writing – original draft, Chrisgone
Adede,
Robert
Oboko and
Peter
Wagacha;
Writing – review & editing, Clement
Atzberger.
Funding:
This research received no direct external funding. The data used in the study was however, partly funded by the European
Commission’s funding under a grant contact to the Institute for
Surveying,
Remote
Sensing and
Land
Information,
University of Natural
Resources and
Life
Sciences (BOKU).
Acknowledgements:
Our appreciation to the National
Drought
Management
Authority for providing the data from the operational drought monitoring system. We are also grateful to Luigi
Luminari for the continued discussion of the ideas of the paper towards shaping it to have outputs applicable in an operational drought monitoring environment. The helpful contribution of the editors and reviewers are also acknowledged. Conflict of Interest:
The authors declare no conflict of interest. References UNOOSA (2015),
Data application of the month: Drought monitoring. UN ‐ SPIDER .[Online].
Available at: ‐ spider.org/links ‐ and ‐ resources/data ‐ sources/daotm ‐ drought (Accessed on November Government of Kenya. (2012).
Kenya
Post ‐ Disaster
Needs
Assessment: ‐ Drought . Retrieved from Cody, B. A. (2010). California drought:
Hydrological and regulatory water supply issues.
DIANE
Publishing. Ding,
Y.,
Hayes, M. J., & Widhalm, M. (2011). Measuring economic impacts of drought: a review and discussion. Disaster
Prevention and
Management: An International
Journal , (4), ‐ Klisch,
A., & Atzberger, C. (2016). Operational
Drought
Monitoring in Kenya
Using
MODIS
NDVI
Time
Series.
Remote
Sensing , (4), Brown,
J.,
Howard,
D.,
Wylie,
B.,
Frieze,
A.,
Ji,
L., & Gacke, C. (2015). Application ‐ ready expedited MODIS data for operational land surface monitoring of vegetation condition. Remote
Sensing , (12), ‐ Svoboda,
M.,
LeComte,
D.,
Hayes,
M.,
Heim,
R.,
Gleason,
K.,
Angel,
J., ... & Miskus, D. (2002). The drought monitor.
Bulletin of the American
Meteorological
Society , (8), ‐ Ali,
Z.,
Hussain,
I.,
Faisal,
M.,
Nazir, H. M.,
Hussain,
T.,
Shad, M. Y., ... & Hussain
Gani, S. (2017). Forecasting
Drought
Using
Multilayer
Perceptron
Artificial
Neural
Network
Model.
Advances in Meteorology , . Khadr, M. (2016). Forecasting of meteorological drought using hidden Markov model (case study: the upper
Blue
Nile river basin,
Ethiopia).
Ain
Shams
Engineering
Journal , (1), ‐ Enenkel,
M.,
Steiner,
C.,
Mistelbauer,
T.,
Dorigo,
W.,
Wagner,
W.,
See,
L., ... & Rogenhofer, E. (2016). A combined satellite ‐ derived drought indicator to support humanitarian aid organizations. Remote
Sensing , (4), Tadesse,
T.,
Demisse, G. B.,
Zaitchik,
B., & Dinku, T. (2014). Satellite ‐ based hybrid drought monitoring tool for prediction of vegetation condition in Eastern
Africa: A case study for Ethiopia.
Water
Resources
Research , (3), ‐ Adede,
C.,
Oboko,
R.,
Wagacha, P. W., & Atzberger, C. (2019). A Mixed
Model
Approach to Vegetation
Condition
Prediction
Using
Artificial
Neural
Networks (ANN):
Case of Kenya’s
Operational
Drought
Monitoring.
Remote
Sensing , (9), Cortes,
C.,
Kuznetsov,
V., & Mohri, M. (2014, January).
Ensemble methods for structured prediction. In International
Conference on Machine
Learning (pp. ‐ Dietterich, T. G. (2000, June).
Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. ‐ Springer,
Berlin,
Heidelberg.
Opitz,
D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of artificial intelligence research , , ‐ Re,
M., & Valentini, G. (2012). Ensemble
Methods.
Advances in machine learning and data mining for astronomy , ‐ Güne ş , F.,
Wolfinger,
R., & Tan, P. Y. (2017). Stacked ensemble models for improved prediction accuracy. In Proc.
Static
Anal.
Symp. (pp. ‐ Nay,
J.,
Burchfield,
E., & Gilligan, J. (2018). A machine ‐ learning approach to forecasting remotely sensed vegetation health. International journal of remote sensing , (6), ‐ Partalas,
I.,
Tsoumakas,
G., & Vlahavas, I. (2012). A study on greedy algorithms for ensemble pruning. Aristotle
University of Thessaloniki,
Thessaloniki,
Greece . Reid, S. (2007). A review of heterogeneous ensemble methods. Department of Computer
Science,
University of Colorado at Boulder . Escolano, A. Y.,
Junquera, J. P.,
Vázquez, E. G., & Riaño, P. G. (2003). A new Meta
Machine
Learning (MML) method based on combining non ‐ significant different neural networks. In ESANN (pp. ‐ Belayneh,
A.,
Adamowski,
J.,
Khalil,
B., & Quilty, J. (2016). Coupling machine learning methods with wavelet transforms and the bootstrap and boosting ensemble approaches for drought prediction.
Atmospheric
Research , , ‐ Dzeroski,
S., & Zenko, B. (2004). Is combining classifiers with stacking better than selecting the best one?. Machine learning , (3), ‐ Ganguli,
P., & Reddy, M. J. (2014). Ensemble prediction of regional droughts using climate inputs and the SVM–copula approach.
Hydrological processes , (19), ‐ Tadesse,
T.,
Wardlow, B. D.,
Hayes, M. J.,
Svoboda, M. D., & Brown, J. F. (2010). The
Vegetation
Outlook (VegOut): A new method for predicting vegetation seasonal greenness. GIScience & Remote
Sensing , (1), ‐ Wardlow, B. D.,
Tadesse,
T.,
Brown, J. F.,
Callahan,
K.,
Swain,
S., & Hunt, E. (2012). Vegetation
Drought
Response
Index An Integration of Satellite,
Climate, and
Biophysical
Data.
Adhikari,
R., & Agrawal, R. K. (2013). A homogeneous ensemble of artificial neural networks for time series forecasting. arXiv preprint arXiv:1302.6210 . Tarnavsky,
E.;
Grimes,
D.;
Maidment,
R.;
Black,
E.;
Allan,
R.P.;
Stringer,
M.;
Chadwick,
R.;
Kayitakire, F. Extension of the TAMSAT satellite ‐ based rainfall monitoring over Africa and from to present . J. Appl.
Meteorol.
Climatol.
Funk, C. C.,
Peterson, P. J.,
Landsfeld, M. F.,
Pedreros, D. H.,
Verdin, J. P.,
Rowland, J. D., & Pedreros, P. (2014). A quasi ‐ global precipitation time series for drought monitoring . US Geological
Survey
Data
Series , Didan, K. (2015a). MOD13Q1
MODIS/Terra
Vegetation
Indices ‐ Day L3 Global
SIN
Grid
V006 [Data set].
NASA
EOSDIS LP DAAC. doi:
Didan, K. (2015b). MYD13Q1
MODIS/Aqua
Vegetation
Indices ‐ Day L3 Global
SIN
Grid
V006 [Data set].
NASA
EOSDIS LP DAAC. doi:
Wan,
Z.,
Hook,
S.,
Hulley, G. (2015). MOD11A2
MODIS/Terra
Land
Surface
Temperature/Emissivity ‐ Day L3 Global
SIN
Grid
V006 [Data set].
NASA
EOSDIS LP DAAC. doi:
Running,
S.,
Mu,
Q.,
Zhao, M. (2017). MOD16A2
MODIS/Terra
Net
Evapotranspiration ‐ Day L4 Global
SIN
Grid
V006 [Data set].
NASA
EOSDIS
Land
Processes
DAAC. doi:
Beguería,
S.,
Vicente ‐ Serrano, S. M.,
Reig,
F., & Latorre, B. (2014). Standardized precipitation evapotranspiration index (SPEI) revisited: parameter fitting, evapotranspiration models, tools, datasets and drought monitoring.
International
Journal of Climatology , (10), ‐ World
Meteorological
Organization (WMO).
Standardized
Precipitation
Index
User
Guide.
WMO ‐ No.
Available online: (accessed on April
Vicente ‐ Serrano, S. M.,
Beguería,
S.,
López ‐ Moreno, J. I.,
Angulo,
M., & El Kenawy, A. (2010). A new global gridded dataset (1901–2006) of a multiscalar drought index: comparison with current drought index datasets based on the Palmer
Drought
Severity
Index.
Journal of Hydrometeorology , (4), ‐ McKee, T. B.,
Doesken, N. J., & Kleist, J. (1993, January).
The relationship of drought frequency and duration to time scales. In Proceedings of the Conference on Applied
Climatology (Vol.
No. pp. ‐ Boston,
MA:
American
Meteorological
Society.
Liu, W. T., & Kogan, F. N. (1996). Monitoring regional drought using the vegetation condition index.
International
Journal of Remote
Sensing , (14), ‐ Huang, G. B. (2003). Learning capability and storage capacity of two ‐ hidden ‐ layer feedforward networks. IEEE
Transactions on Neural
Networks , (2), ‐ Riedmiller,
M., & Braun, H. (1993, March). A direct adaptive method for faster backpropagation learning: The
RPROP algorithm. In Proceedings of the IEEE international conference on neural networks (Vol. pp. ‐ Chen, C. S., & Lin, J. M. (2011). Applying
Rprop neural network for the prediction of the mobile station location. Sensors , (4), ‐ Klisch,
A.,
Atzberger,
C., & Luminari, L. ʺ SATELLITE ‐ BASED
DROUGHT
MONITORING IN KENYA IN AN OPERATIONAL
SETTING. ʺ International
Archives of the Photogrammetry,
Remote
Sensing & Spatial
Information
Sciences (2015).
Meroni,
M.,
Fasbender,
D.,
Rembold,
F.,
Atzberger,
C., & Klisch, A. (2019). Near real ‐ time vegetation anomaly detection with MODIS
NDVI:
Timeliness vs. accuracy and effect of anomaly computation options. Remote
Sensing of Environment , , ‐521