[PDF] Long-Range Seasonal Forecasting of 2m-Temperature with Machine Learning

Abstract

A significant challenge in seasonal climate prediction is whether a prediction can beat climatology. We hereby present results from two data-driven models - a convolutional (CNN) and a recurrent (RNN) neural network - that predict 2 m temperature out to 52 weeks for six geographically-diverse locations. The motivation for testing the two classes of ML models is to allow the CNN to leverage information related to teleconnections and the RNN to leverage long-term historical temporal signals. The ML models boast improved accuracy of long-range temperature forecasts up to a lead time of 30 weeks for PCC and up 52 weeks for RMSESS, however only for select locations. Further iteration is required to ensure the ML models have value beyond regions where the climatology has a noticeably reduced correlation skill, namely the tropics.

Full PDF

LLong-Range Seasonal Forecasting of 2m-Temperaturewith Machine Learning

Etienne E. Vos

IBM ResearchSouth Africa etienne.vosibm.com

Ashley Gritzman

IBM ResearchSouth Africa

Sibusisiwe Makhanya

IBM ResearchSouth Africa

Thabang Mashinini

IBM ResearchSouth Africa

Campbell Watson

IBM ResearchUSA

Abstract

A signiﬁcant challenge in seasonal climate prediction is whether a predictioncan beat climatology. We hereby present results from two data-driven models– a convolutional (CNN) and a recurrent (RNN) neural network – that predict2 m temperature out to 52 weeks for six geographically-diverse locations. Themotivation for testing the two classes of ML models is to allow the CNN toleverage information related to teleconnections and the RNN to leverage long-termhistorical temporal signals. The ML models boast improved accuracy of long-rangetemperature forecasts up to a lead time of 30 weeks for PCC and up 52 weeks forRMSESS, however only for select locations. Further iteration is required to ensurethe ML models have value beyond regions where the climatology has a noticeablyreduced correlation skill, namely the tropics.

Climate change as a result of global warming is a pressing international problem. Concerns aremounting over the signiﬁcant changes in the variability and extremes of weather, with an increasingpossibility of catastrophes from the activation of tipping points in the earth’s climate system [1, 2].There is therefore an increased interest in accurate long-range seasonal forecasts of key climatevariables such as surface temperature and precipitation given their relevance to developing strategiesthat mitigate anticipated seasonal impacts on various sectors, including disaster risk reduction andprevention [3].Numerical climate models (NCMs) have a long history of being used to produce accurate weather andclimate predictions, albeit at the cost of running large and expensive physics-based simulations (e.g. [4,5]). The focus of this work is to investigate how convolutional (CNN) and recurrent (RNN) neuralnetworks can be applied to seasonal forecasting of 2m temperature in lieu of NCMs, and if they arecapable of improving upon a generally accepted benchmark that is the -year climatology.Previous works [6, 7, 8, 9] have shown that these data-driven approaches can perform adequatelywith respect to physics-based simulations and, in certain cases, surpass them to some extent. Forexample, [9] developed a CNN model with consistently superior all-season correlation skill ( > . )when compared to a state of the art dynamical model (SINTEX-F) [5] for predicting the Nino3.4index for lead times of up to 17 months. a r X i v : . [ phy s i c s . a o - ph ] J a n .2 0.3 0.4 0.5 0.6 0.7 0.8 0.9PCC 1 2 3 4MAE Figure 1:

Left:

A global map of the PCC calculated between the ERA5 reanalysis data and the30-year standard climatology for 2m-temperature.

Right:

Similar to the left panel, but for MAE.Darker red regions indicate lower PCC or higher MAE values.

For this work, the ERA5 reanalysis data [10] is used for training (1979 – 2007), validation (2008 –2011) and testing (2012 – 2020) of ML models. Data is pre-processed by regridding global ﬁelds ofvariables from a native spatial resolution of . ◦ × . ◦ to ◦ × ◦ , as well as aggregating overtime from hourly intervals to weekly. The predictor variables used here are mb geopotential( gp), mb geopotential ( gp), and m temperature (t2m) ﬁelds, the latter of which is also thepredictand.Training and inference for both the CNN- and LSTM-type models were set up in a similar manner:given a series of inputs s in = { x k − h in +1 , ..., x k } that spans an input-horizon of h in time steps, with k ∈ { t m , gp , gp } and x the global ﬁeld of variable k at a given time step, the task of themodels is to produce predictions y out that estimate the ground-truth values x t m h f , which is the 2mtemperature for a given target location at a lead time of h f (forecast-horizon) steps ahead of the latestinput time step. This is done by minimizing the mean squared error (MSE) loss between y out and x t m h f via gradient descent. The ﬁnal results are multiple sets of time series predictions of the test datafrom 2013 up to 2020, each of which is essentially a rolling time-series forecast with a constant h f lead time, where h f ∈ [1 : 52] . The year 2012 is reserved as a buffer-year for the input horizon.Predictions are made for single target locations so that separate models had to be trained for alllocations. The following locations were chosen at low and mid/high latitudes across the globe toeffectively illustrate the capabilities and limitations of the CNN and LSTM models: Low latitudes:

Honolulu, USA ( . ◦ N, . ◦ W); Panama City ( . ◦ N, . ◦ W)Singapore ( . ◦ N, . ◦ E), Middle of the Paciﬁc Ocean ( . ◦ N, . ◦ W) Mid/High latitudes:

Moscow, Russia ( . ◦ N, . ◦ E); London, UK ( . ◦ N, . ◦ W)Christchurch, NZ ( . ◦ N, . ◦ E); Perth, Australia ( . ◦ S, . ◦ E)In addition to training a separate model for each location, a separate CNN model was required tomake predictions for each lead time. This setup was mirrored for the LSTM by using a many-to-onemodel. The main difference between the CNN and LSTM approaches is that inputs to the CNN arefull global ﬁelds of all of the predictor variables over an input horizon of h in = 6 weeks, whereasinputs to the LSTM are multi-variate time series of the predictor variables extracted at the position ofthe target location over an input horizon of h in = 52 weeks.The metrics used to evaluate the ﬁnal results on the test data are the Pearson Correlation Coefﬁcient(PCC) and the Root Mean Square Error Skill Score (RMSESS), given by the following equations:PCC = (cid:80) ni =1 ( x i − ¯ x )( y i − ¯ y ) (cid:112)(cid:80) ni =1 ( x i − ¯ x ) (cid:80) ni =1 ( y i − ¯ y ) and RMSESS = 1 − RMSE model

RMSE clim , (1)2 m T e m p e r a t u r e ( C ) Panama CityLead time: 10 weeksGround truthClim. CNNLSTM Panama CityLead time: 40 weeks 0.50.60.70.80.91.0 PCC 30Y Clim.CNN: Panama CityLSTM: Panama City2015 2016 2017 2018 2019 2020Time10152025 m T e m p e r a t u r e ( C ) PerthLead time: 10 weeks 2015 2016 2017 2018 2019 2020TimePerthLead time: 40 weeks 10 20 30 40 50Lead time (weeks)0.50.60.70.80.91.0

PCC 30Y Clim.CNN: PerthLSTM: Perth

Figure 2:

Left & Center Panels:

Time series plots comparing ERA5 weekly data to the climatologyand predictions from the CNN and LSTM for Panama City ( top ) and Perth ( bottom ). Left and centerpanels correspond to predictions at lead times of weeks and weeks, respectively. Right Panels:

The PCC at different lead times for Panama City ( top ) and Perth ( bottom ).where x and y represent the ground-truth and predicted samples, respectively, with ¯ x and ¯ y thecorresponding sample means over the test data. The RMSESS compares the model’s RMSE to thatof the -year climatology. It is generally difﬁcult to improve upon the climatology in terms ofcorrelation and absolute error.The CNN architecture consists of 4 convolution blocks (Conv2D → ReLU → MaxPool → Dropout),followed by a 50-unit fully-connected layer and a single-unit output layer. Fields that comprise s in are stacked as input channels for the CNN. The LSTM architecture consists of an RNN layer with 64LSTM units, followed by a fully-connected layer with 32 units, and a single-unit output layer. TheLSTM does not produce any intermediate predictions, and only produces an output prediction afterreading in the full input horizon of 52 weeks.Anomaly ﬁelds with respect to the standard -year climatology were used for all variables duringtraining and inference. The climatology was subsequently added to the outputs to obtain the actualvalues for ﬁnal evaluation with the PCC and RMSESS metrics. The motivation for investigating the two classes of ML models is to allow the CNN to leverageinformation related to teleconnections in the predictor variables to improve its forecasting skill, whilethe LSTM should be able to leverage long-term historical temporal information to achieve the same.In this work, ML results for selected target locations are compared against a baseline prediction,which is the -year standard climatology calculated from weekly-aggregated ERA5 data between1981 and 2010 (similar to the approach by [11]).For low latitude locations (near the equator), the climatology has a noticeably reduced correlationskill, as shown in Figure 1. Using Panama City as an example, we show in Figure 2 (top panels) thatthe CNN and LSTM are able to improve on the climatology’s PCC skill up to lead times of around weeks and weeks, respectively. For a lead time of weeks, both models predict the peaksand troughs with reasonable accuracy, capturing to some extent the warmer than usual summers andwinters during 2015, 2016 and 2019. As expected, correlation skill reduces for larger lead times asindicated by the red and green PCC curves that fall below the climatology line. This can also be seenin the week lead time series plot, where CNN and LSTM predictions don’t seem to deviate muchfrom climatology, except for a few instances of warmer summers and winters.In the bottom panels of Figure 2, Perth is used as an example of a mid/high-latitude location forwhich the climatology alone already has a PCC skill of ∼ . . The time series plots show that Perthexhibits a regular annual cycle that is well represented by the climatology so that, for the most part,deviations from the climatology for Perth likely only represent high-frequency noise. This likelyexplains why the CNN and LSTM models fail to learn any useful patterns outside of the annual cycle.3 .40.20.00.20.40.6 R M S E SS CNN

HonoluluMid-Pacific Panama CitySingapore

LSTM

HonoluluMid-Pacific Panama CitySingapore

10 20 30 40 50Lead time (weeks)0.40.20.00.20.40.6 R M S E SS CNN

Christ ChurchLondon MoscowPerth

10 20 30 40 50Lead time (weeks)LSTM

Christ ChurchLondon MoscowPerth

Figure 3: Plots of the RMSESS for lead times of 1 - 52 weeks, and for different locations.

LeftPanels:

RMSESS results from the CNN for locations where predictions have improved skill relativeto the climatology ( top ), and for locations where predictions have similar or reduced skill than theclimatology.

Right Panels:

Similar to the left panels, but for the LSTM.Figure 3 gives the RMSESS results for the CNN (left panels) and the LSTM (right panels). Theseresults convey a similar message as those in Figure 2, but in terms of RMSE. A RMSESS value > indicates that the ML model has a lower RMSE than the climatology and, conversely, a value < means the climatology has a lower RMSE than the model. For low-latitude locations (top panels) theCNN predictions are able to improve on the climatology for almost all lead times considered. Thesame is true for the LSTM, except for the Mid-Paciﬁc location which falls below the climatologyfor lead times > weeks. Evidently, neither model fares any better than the climatology for themid/high latitude locations (bottom panels), even at lead times of < weeks. The LSTM does,however, marginally improve on the RMSE for London. The standard -year climatology, often used as a baseline for seasonal forecasts, does not performequally well across the globe as highlighted in Figure 1. However, the -year climatology accuratelyrepresents the most important modes of variability for m temperature at all locations with relativelyhigh PCC ( > . ), i.e. outside of the tropics.Despite the -year climatology being generally difﬁcult to outperform, this study shows thatML methods do achieve comparable, and for some locations, improved, accuracy of long-rangetemperature forecasts up to a lead time of 30 weeks for PCC and up 52 weeks for RMSESS. Beingable to improve upon such a baseline in the context of seasonal forecasting is an invaluable advantagewhen considering preparedness against extreme climate events that have characterized climate changeimpacts over the past two decades.Other future considerations and improvements on this work include using a more accurate climatology,training on larger datasets like CMIP 5/6, implementing a U-Net approach [12] in order to generatepredictions across the entire globe, as well as to combine the CNN and LSTM models for a uniﬁedapproach that exploits the spatio-temporal dynamics of the underlying processes. Acknowledgments and Disclosure of Funding

The authors would like to thank Brian White for his mentorship and advice during the preparation ofthis paper. 4 eferences [1] T. M. Lenton, J. Rockström, O. Gaffney, S. Rahmstorf, K. Richardson, W. Steffen, and H. J.Schellnhuber, “Climate tipping points — too risky to bet against,”

Nature , vol. 575, pp. 592–595,2019.[2] IPCC, “Summary for policymakers,” in

Special Report: Global Warming of . ◦ C , p. 32, 2018.[3] W. J. Merryﬁeld, J. Baehr, L. Batté, E. J. Becker, A. H. Butler, C. A. S. Coelho, G. Danaba-soglu, P. A. Dirmeyer, F. J. Doblas-Reyes, D. I. V. Domeisen, et al. , “Current and emergingdevelopments in subseasonal to decadal prediction,” Bulletin of the American MeteorologicalSociety , vol. 101, no. 6, pp. E869–E896, 2020.[4] S. J. Johnson, T. N. Stockdale, L. Ferranti, M. A. Balmaseda, F. Molteni, L. Magnusson,S. Tietsche, D. Decremer, A. Weisheimer, G. Balsamo, S. P. E. Keeley, K. Mogensen, H. Zuo,and B. M. Monge-Sanz, “SEAS5: the new ECMWF seasonal forecast system,”

GeoscientiﬁcModel Development , vol. 12, no. 3, pp. 1087–1117, 2019.[5] T. Doi, S. K. Behera, and T. Yamagata, “Improved seasonal prediction using the SINTEX-F2coupled model,”

Journal of Advances in Modeling Earth Systems , vol. 8, no. 4, pp. 1847–1867,2016.[6] L. Xu, N. Chen, X. Zhang, and Z. Chen, “A data-driven multi-model ensemble for deterministicand probabilistic precipitation forecasting at seasonal scale,”

Climate Dynamics , vol. 54, no. 7-8,pp. 3355–3374, 2020.[7] J. Cohen, D. Coumou, J. Hwang, L. Mackey, P. Orenstein, S. Totz, and E. Tziperman, “S2Sreboot: An argument for greater inclusion of machine learning in subseasonal to seasonalforecasts,”

WIREs Climate Change , vol. 10, no. 2, p. e00567, 2019.[8] M. Kämäräinen, P. Uotila, A. Y. Karpechko, O. Hyvärinen, I. Lehtonen, and J. Räisänen,“Statistical learning methods as a basis for skillful seasonal temperature forecasts in Europe,”

Journal of Climate , vol. 32, no. 17, pp. 5363–5379, 2019.[9] Y.-G. Ham, J.-H. Kim, and J.-J. Luo, “Deep learning for multi-year ENSO forecasts,”

Nature ,vol. 573, no. 7775, pp. 568–572, 2019.[10] H. Hersbach, B. Bell, P. Berrisford, S. Hirahara, A. Horányi, J. Muñoz-Sabater, J. Nicolas,C. Peubey, R. Radu, D. Schepers, et al. , “The ERA5 global reanalysis,”

Quarterly Journal ofthe Royal Meteorological Society , vol. 146, no. 730, pp. 1999–2049, 2020.[11] M. Janoušek, “ERA-interim daily climatology,”

ECMWF , 2011.[12] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical imagesegmentation,” arXiv preprint arXiv:1505.04597arXiv preprint arXiv:1505.04597