# Machine Learning Matching of Sentinel-2 and GPS Combine Harvester Data to Estimate Within-Field Wheat Grain Yield

### Abstract

This study aimed to find the most suitable data combination and estimation model of within-field durum wheat (Triticum durum) grain yield using Sentinel-2. The study was conducted in Spain, as one of the top European producers. Within-field grain yield data was obtained from a GPS combine harvester machine for 7 fields in 2018, which were consecutively processed to match Sentinel-2 10 m pixel size. Vegetation indices NDVI and GNDVI as well as biophysical parameters LAI and FAPAR were calculated from Sentinel-2 bands using the SNAP Sentinel2 ToolBox. Besides those, the Sentinel-2 10 m resolution spectral bands were used as variables for modeling, including multilinear, random forest and support vector machine regression. Various combinations of variables were tested to find the most suitable training dataset. At validation, the combination of LAI, FAPAR and Sentinel-2 10 m bands were found to be the most suitable $(\\mathrm{R}^{2}=0.77$ and $\\text{RSME}=1.01\\mathrm{t}/\\text{ha})$). The model developed with vegetation indices alone yielded the lowest $\\mathrm{R}^{2}$ and highest RMSE (0.54 and 1.44 t/ha respectively). When compared to other models with the same training dataset, RF outperformed support vector machine $(\\mathrm{R}^{2}=\\overline{0}.73$ and $\\text{RMSE}=1.11\\ \\mathrm{t}/\\text{ha})$ and multilinear regressions $(\\mathrm{R}^{2}=0.65$ and $\\text{RMSE }=1.26\\mathrm{t}/\\text{ha})$. Hence, this study showed the efficiency of RF and Sentinel-2 data to estimate within-field wheat grain yield by using 10 m bands and biophysical parameters to train the model.