[PDF] Machine learning for weather and climate are worlds apart

Abstract

Modern weather and climate models share a common heritage, and often even components, however they are used in different ways to answer fundamentally different questions. As such, attempts to emulate them using machine learning should reflect this. While the use of machine learning to emulate weather forecast models is a relatively new endeavour there is a rich history of climate model emulation. This is primarily because while weather modelling is an initial condition problem which intimately depends on the current state of the atmosphere, climate modelling is predominantly a boundary condition problem. In order to emulate the response of the climate to different drivers therefore, representation of the full dynamical evolution of the atmosphere is neither necessary, or in many cases, desirable. Climate scientists are typically interested in different questions also. Indeed emulating the steady-state climate response has been possible for many years and provides significant speed increases that allow solving inverse problems for e.g. parameter estimation. Nevertheless, the large datasets, non-linear relationships and limited training data make Climate a domain which is rich in interesting machine learning challenges. Here I seek to set out the current state of climate model emulation and demonstrate how, despite some challenges, recent advances in machine learning provide new opportunities for creating useful statistical models of the climate.

Full PDF

rrsta.royalsocietypublishing.org

Perspective

Article submitted to journal

Author for correspondence:

D. Watson-Parrise-mail: [email protected]

Machine learning for weatherand climate are worlds apart

D. Watson-Parris Atmospheric, Oceanic and Planetary Physics,Department of Physics, University of Oxford, UK

Modern weather and climate models share acommon heritage, and often even components,however they are used in different ways to answerfundamentally different questions. As such, attemptsto emulate them using machine learning shouldreﬂect this. While the use of machine learningto emulate weather forecast models is a relativelynew endeavour there is a rich history of climatemodel emulation. This is primarily because whileweather modelling is an initial condition problemwhich intimately depends on the current state of theatmosphere, climate modelling is predominantly aboundary condition problem. In order to emulate theresponse of the climate to different drivers therefore,representation of the full dynamical evolution of theatmosphere is neither necessary, or in many cases,desirable. Climate scientists are typically interested indifferent questions also. Indeed emulating the steady-state climate response has been possible for manyyears and provides signiﬁcant speed increases thatallow solving inverse problems for e.g. parameterestimation. Nevertheless, the large datasets, non-linear relationships and limited training data makeClimate a domain which is rich in interesting machinelearning challenges.Here I seek to set out the current state ofclimate model emulation and demonstrate how,despite some challenges, recent advances in machinelearning provide new opportunities for creatinguseful statistical models of the climate. © The Authors. Published by the Royal Society under the terms of theCreative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author andsource are credited. a r X i v : . [ phy s i c s . a o - ph ] O c t r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A ..................................................................

1. Introduction

Climate models in general, and general circulation models (GCMs) in particular, are theprimary tools used for generating projections of climate change under different future socio-economic scenarios. Fully coupled GCMs, which include atmosphere, cryosphere, land and oceancomponents, are referred to as Earth System Models (ESMs) and are the gold-standard of climatemodelling. Due to the large range of spatial and temporal scales and huge number of processesbeing modelled these are extremely computationally expensive to run and are often only run incoordinated international experiments designed in order to explore particular scientiﬁc questions.They also create huge volumes of data which can be difﬁcult to analyse and interpret usingtraditional tools and methods. There is naturally great interest then in how machine learning(ML) might help to reduce the computational expense in generating this data, or in extractingmore value from the data once it is produced [1,2]. Here I focus on a third aspect, discussing thecurrent state-of-the-art in climate model emulation for uncertainty quantiﬁcation and reduction,and highlighting opportunities for new machine learning tools to greatly improve this.The need for fast computer simulation emulators has long been recognised in the context ofperforming inference, where these are often referred to as ’surrogate’ models [3]. These surrogatesare trained on a few selected samples of the full, expensive simulations using supervised machinelearning tools. As a non-linear, non-parametric regression technique, Gaussian processes (GP) aretypically used [4] because of their ﬂexibility and accurate uncertainty estimates. Traditionally, thecomputational cost of training a GP scales as O ( N ) , where N is the number of training datapoints, inhibiting their use for large datasets. Recent developments however, have demonstratednew techniques for alleviating these constraints making them competitive with other techniquessuch as Neural Networks (NNs) [5]. However they are constructed, these surrogate models allowapproximating model inversion (determining the inputs given certain outputs) where the exactinverse is not available [6], which is invariably the case for complex models and certainly true forwhole climate models. These inverse methods allow the tuning of particular parameters againstobservations, the analysis and exploration of model uncertainties to different inputs, and theconstraint on some of these uncertainties using history matching.The uncertainties of GCMs and their output can be broadly categorised in to: 1) Internalvariability due to the chaotic ﬂuctuations of the earth system over different time-scales; 2) Modeluncertainty due to incomplete or incorrect process representations (structural uncertainty); 3)Model parametric uncertainty due to uncertain input parameters; 4) Scenario uncertainty dueto assumptions and incomplete knowledge of the greenhouse gas (GHG) and aerosol and othershort-lived climate forcer (SLCF) emissions pathways.Numerical weather prediction (NWP) models share a common heritage with the atmosphericcomponents of GCMs and are subject to the same uncertainties, however with different emphasis.While in weather prediction the uncertainties in the initial state of the system (1) are a keycomponent, climate projection uncertainties are dominated by model (2+3) and scenario (4)uncertainties over 50 and 100 year timescales respectively [7,8]. Figure 1 shows the fractionaluncertainty in the projection of temperature across the CMIP6 multi-model ensemble anddemonstrates this clearly . The internal variability dominates the uncertainty for the ﬁrst 10years but rapidly becomes less important as the model, and ultimately scenario uncertaintiesstart to dominate. In exploring climate questions one can thus often neglect internal variabilityand emulate only the steady state response of the system, signiﬁcantly simplifying the machinelearning problem.Quantifying, and ultimately minimising the remaining uncertainties is central to effortsto improve climate projections [9,10], but is also of value when seeking to improve ourunderstanding of the physical climate [11]. By framing the discussion of climate emulation aroundthese key uncertainties I hope to demonstrate how machine learning could help in this endeavour.In the rest of this paper I will describe the ways in which climate emulation is already looking to These uncertainties are calculated as in [7] and described in Appendix A. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. (a) Figure 1.

The fractional uncertainty in CMIP6 projections of surface air temperature due to internal variability, modeluncertainty and scenario uncertainty as a function of time in to the future. reduce uncertainties in each of the key areas outlined above, before providing an outlook over theways new and rapidly evolving ML techniques might transform these efforts in the future.

2. Climate emulation (a) Internal variability

While short-period (up to a few weeks) internal-variability and uncertainty in the exact currentstate of the atmosphere dominate uncertainties in weather forecasts, in many climate simulationsthis is essentially treated as noise which is either controlled for [12,13], or averaged away. In suchsettings emulating the atmospheric variability is not useful. Longer period, decadal, variabilitycan however be important in climate settings, particularly when comparing historical simulationswith observations [e.g. [14]]. The use of ensembles of simulations, which sample this uncertainty,enables weather forecast models to generate probabilistic forecasts with improved skill [15]and understand natural variability over climate timescales [16]. These ensembles are extremelycomputationally expensive to create however, and recent efforts have explored creating machinelearning based emulators which could sample this uncertainty more efﬁciently.One approach is to emulate the dynamical evolution of these numerical models directly,and this has been explored for both weather [17,18] and climate [19–21]. While these areobviously very early efforts in this direction they demonstrate that developing machine learningmodels which can compete with their traditional counterparts in numerical weather prediction isextremely challenging, and extending this to climate time-scales even more so, especially giventhe difﬁculty in maintaining the energy and mass conservation required for a stable simulation.Where an estimate of the decadal variability is needed, a more promising approach may be toemulate the variability directly from existing ensembles [22]. (b) Model structural uncertainty

Model uncertainties due to incomplete or incorrect representations of the underlying processesare extremely hard to quantify directly and are often neglected entirely when evaluatingindividual models against observations. Some estimate can be made by comparing the outputsfrom multiple models performing the same experiment, often referred to as multi-model r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. ensembles (MMEs), although interpreting any differences is not trivial as many of the models inuse around the world are not truly independent and share underlying components [23]. Further,some models are also demonstrably better or worse in certain aspects [24] making simple averagesover such ensembles potentially misleading. Nevertheless used appropriately, large multi-modelensembles, such as provided by the Coupled Model Inter-comparison Project (CMIP) 5 [25] andCMIP 6 [26] experiments, provide valuable insights in this regard. For example, some earlymachine learning work in the ﬁeld developed approaches for combining models from the CMIP5ensemble [27].It is worth noting that one of the key ways numerical weather forecasts and regional climatemodels reduce model uncertainties is by post-processing the predictions using statistical errorcorrection [28]. A novel approach using ML has recently been proposed for climate models [29]which could provide valuable model improvements, although clearly such an approach can onlybe validated for observed climate states. (c) Model parametric uncertainty The numerical discretization which is necessary to integrate GCMs forward in time deﬁnes aspatial (and temporal) scale below which any physical process must be ’parameterized’. Theseparameterizations are often only approximate representations of the processes they represent andthe input parameters must be tuned so as best to reﬂect the observed climate. There are invariablymany combinations of such parameters which can produce a plausible model, a problem termedequiﬁnality [30], and so large parametric uncertainty can persist in even the best models. Therepresentation of clouds, for which even the largest examples occur on scales much smaller thantypical climate model grid resolutions, is a key uncertainty in this regard [31]. Climate feedbacksdue to changes in clouds to a given temperature perturbation have been shown to be particularlysensitive to their parameterizations in climate models [32].There is a long history of exploring these parametric uncertainties using ensembles of climatesimulations sampled across parameter space [33], including multi-thousand member grandensembles generated using large networks of home computers [34]. Simple linear regressionemulators [35,36], and more recently Gaussian Process (GP) [3] emulators, are then built tospan this space so that sensitivity analysis [37] and parameter inference can be performed bycomparison against relevant observations [38,39].An example of an emulator trained on such a perturbed parameter ensemble (PPE) is shown inFigure 2. Three parameters identiﬁed as being important for the calculation of the absorptivity ofaerosol in the atmosphere were perturbed across a wide range of values using a latin hyper-cubesampling. Using a Python package designed to simplify climate model emulation the globaldistribution of Absorption Aerosol Optical Depth (AAOD) is predicted for a particular parametercombination by both a GP and Convolutional Neural Network (CNN) emulator. The errorsintroduced by emulation are small compared to observation and model-observation comparisonerrors. This emulator can then be used for comparison against observation to rule out implausibleparameter combinations, or infer the optimal set depending on the objective [40]. Difﬁcultiesin scaling traditional emulators to large datasets and the problem of ﬁnding relevant summarystatistics has limited their use somewhat and I discuss the opportunities recent advances in MLcould provide in the following section.Machine learning could also be used to completely replace these parameterizations, learningdirectly from high resolution simulations [41,42] or even observations [43]. While these canoffer some speed improvements they will not drastically decrease the computational expenseof running a whole climate model. Indeed, much of their value comes from being able to run improved parameterizations, which in turn would lead to better projections (and better trainingdata for whole-model emulators). https://github.com/duncanwp/GCEm r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. Figure 2.

The annual mean absorption aerosol optical depth (AAOD) for a particular set of (three) aerosol micro-physicalparameters not shown to the emulator during training. a) shows the true modelled output, b) shows the emulated outputusing a Gaussian Process, c) shows the emulated output using a simple convolution neural network, d-e) show thedifferences between the modelled data and the Gaussian process emulator and the neural network emulator respectively. (d) Scenario uncertainty

Over longer time-scales of more than 50 years the scenario uncertainty starts to dominate themodel uncertainties. Similarly to the parametric uncertainty discussed above, these uncertaintiesrelate to the inputs of the climate models. The primary distinction is that these input parametersare derived from socio-political considerations, and so cannot be reduced through improvedmodelling or understanding of the physical climate. Improved sampling of these uncertaintieswould nevertheless prove valuable to policy makers who need to weigh the cost and impactof different mitigation and adaptation strategies and currently mostly rely on one-dimensionalimpulse response models [44,45], or simple pattern scaling approaches [46]. Impulse responsemodels are physically interpretable and can capture non-linear behaviour, but are inherentlyunable to model regional climate changes, while the pattern scaling approaches rely on asimple scaling of spatial distributions of e.g. precipitation by global mean temperature changes,neglecting strong non-linearities in these relationships.Given the similarity to emulating parametric uncertainty, statistical emulators of the regionalclimate have been developed [47,48] although these have been quite bespoke and focus on therelatively simple problem of emulating temperature. Approaches including non-linear patternscaling [49] and GP emulation over million-year time-scales [50] hint at the possibility of usingmodern machine learning tools to produce robust and general emulators over future scenarios.The opportunities, and signiﬁcant challenges, of realising these possibilities are discussed in thenext section.

3. Challenges and Opportunities

Many of the challenges and opportunities which arise in the pursuit of using the plethora of newML techniques which have recently become available to emulate the climate are common amongthe potential applications detailed above, and I elucidate some of them below. (a) Challenges (i) Few training samples

One of the reasons the latest deep-learning techniques have proved so successful is the enormousnumber of data samples available for the training of these algorithms. While a single climatemodel integration can certainly produce many terabytes of data, the numbers of model samplesspanning the dimensions over which one might want to emulate is often small. For example, r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. the Community Earth System Model (CESM) Large Ensemble [51] contains only 40 independentmembers but is over 500Tb in size, as each member represents a long time-series of detailedclimate variables. Many multi-model ensembles contain even fewer members. While GPs arewell suited to such problems, NNs can easily overﬁt the limited data. Using neural architecturesearch [52] to ﬁnd the simplest network able to ﬁt the data can help relieve this to some extent. (ii) Out of distribution The training of climate emulators requires an underlying training dataset which spans allpossible outcomes to ensure the model does not try and predict outside of the distribution ofthe training dataset [53]. This requires careful consideration when creating ensembles [38] andshould perhaps be considered when designing future multi-model experiments [54] to ensureemulators interpolate between training points rather than extrapolate beyond them. Besides wellcalibrated uncertainties, the use of automatic out-of-distribution detection techniques could provevaluable [55]. (iii) Accurate quantiﬁcation of uncertainties

Climate model emulation introduces another source of uncertainty in any predictions, and theseneed to be robustly quantiﬁed in order for the prediction to be useful. While GPs provide theseby construction, uncertainties of NN predictions can be approximated using dropout [56]. This isof particular importance given the previous two challenges. (iv) Short-term and seasonal prediction

Internal variability plays a key role over shorter timescales and cannot be simply averaged awaywhen considering seasonal prediction. Some element of dynamic evolution of the atmosphericstate is thus needed in order to accurately emulate these systems, although this can still take theform of simple statistical models of the large scale dynamics [57]. (b) Opportunities

New ML tools and techniques provide opportunities for climate scientists to improve on, andexplore new applications for, existing emulators. Besides the important societal impacts of climateresearch, the large datasets also provides unique opportunities for ML research. (i) Large, open datasets

While not always designed to train machine learning emulators, large climate model ensemblesfrom the latest climate models are now available in the cloud, with the tools and infrastructureto easily access them [58]. This includes ensembles of climate simulations exploring scenariouncertainty [59], model uncertainty [60,61], and natural variability [51], including some at veryhigh-resolution [62]. Training an emulator over combinations of these complimentary ensemblesto explore joint uncertainties, or to maximise the available training data, is one promising avenuefor further research. This wealth of large spatio-temporal datasets situated next to tremendouscomputing power also provides opportunities to develop and train more complex emulators. (ii) New emulators

To date, emulation has relied on relatively simple techniques on highly aggregated climatedata. However, the rapid development of new ML architectures, such as deep GPs [63,64],Neural Architecture Search [52] and Spherical-CNNs [65] provide exciting opportunities todevelop larger, more accurate emulators. These could provide higher spatio-temporal resolutionoutputs, complementing existing down-scaling techniques [66], or better calibrated uncertaintiesto account for the large co-variabilities often encountered in climate relevant outputs. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. As described above, due to their huge computational expense climate models are typically runat a coarse resolution, with simpliﬁed (parameterized) models used to represent all processeswhich occur at scales smaller than around ∼ (iii) Improved inference Many current parameter estimation approaches rely on simple rejection sampling to performmodel inference, whereby the emulator is sampled from a large number of times and allparameter combinations for which the outputs disagree with observations are rejected. Thisgradually provides a posterior probability distribution for the input parameters although itrequires subjective error metrics and performs poorly for high-dimensional outputs. Simulationbased inference is a rich sub-ﬁeld of machine learning, and many improved techniques are nowavailable [69]. Active learning using Bayesian optimization can ensure that training samplesare generated where they provide most information for the emulator, and new probabilisticprogramming tools can use additional diagnostics to improve inference by no longer treatingthe models as black-boxes. There are also opportunities for automated model calibration andtuning [70] and summary statistic detection to improve the current state-of-the-art. (iv) Observational emulators

I have primarily focused on the emulation of physical climate models as these are the only toolsavailable for generating future projections. In principle an emulator could be trained on the largesatellite based datasets which are now available with the hope that this would provide someskill in future predictions. For example, by training an emulator on observed precipitation andmeteorology one could hope to estimate future precipitation changes under a future climate.Many signiﬁcant challenges exist in designing such a system however, in particular therelatively short observational record and the reliance on interpolating in to unknown future states.Encoding strong physical constraints [71] on such a model, for example by enforcing conservationof mass and energy, may provide a useful complement to traditional climate model projections.

4. Outlook

While climate may just be an accumulation of weather, and similar numerical models are usedin each domain, as often in the physical sciences more is different [72]. Different processesdominate the responses, different questions are being asked and different uncertainties dominatethe predictions. In many respects these differences make climate projections easier to emulatethan weather forecasts and much work has been achieved already, but signiﬁcant opportunities,and some challenges, remain.The improved techniques available through the recent advances in ML will allow for improvedparameter estimation and model tuning; direct emulation of internal variability; emulationof non-linear regional climate responses with higher accuracy and resolution; and potentiallyobservation based models. These will both beneﬁt from, and offer insights into, the underlyingphysical processes governing our climate.In order to realise these opportunities we must foster collaborations between the climateand ML communities to develop a shared understanding of the problems and tools availableto solve them. Workshops such as this, climatechange.ai and Climate Informatics ( ) are invaluable in doing so. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. Data Accessibility.

The CMIP6 data used here is available through the Earth System Grid Federation andcan be accessed through different international nodes e.g.: https://esgf-index1.ceda.ac.uk/search/cmip6-ceda/. The black carbon PPE data is available here: https://doi.org/10.5281/zenodo.3856644

Competing Interests.

The author declares that they have no competing interests.

Funding.

The author receives funding from the European Union’s Horizon 2020 research and innovationprogramme iMIRACLI under Marie Skłodowska-Curie grant agreement No 860100 and also gratefullyacknowledges funding from the NERC ACRUISE project NE/S005390/1.

Acknowledgements.

The author acknowledges the World Climate Research Programme, which, throughits Working Group on Coupled Modelling, coordinated and promoted CMIP6. I thank the climate modelinggroups for producing and making available their model output, the Earth System Grid Federation (ESGF)for archiving the data and providing access, and the multiple funding agencies who support CMIP6 andESGF. I also gratefully acknowledge the support of Amazon Web Services through an AWS Machine LearningResearch Award. I thank Mat Chantry for valuable feedback and discussions during the writing of themanuscript.

A. CMIP6 uncertainty analysis

The uncertainty analysis presented in Figure 1 is calculated using global, annual mean surface airtemperature from 20 models that participated in CMIP6 across six scenarios. I follow the approachof [7] but choose not to weight the models since their skill is not of concern, and it makes nosigniﬁcant difference to the results presented here.The time-series for each model ( m ) and scenario ( s ) can be represented as: X m,s ( t ) = x m,s ( t ) + i m,s + (cid:15) m,s ( t ) (A 1)where x is a fourth-order polynomial ﬁt using Ordinary Least Squares, i is a reference temperature(taken as the mean between 2015-2020 inclusive) and (cid:15) is the residual. The internal variability isassumed to be constant and is deﬁned as the model-mean variance in the residual: V = | var s,t ( (cid:15) m,s,t ) | m (A 2)The model uncertainty is the scenario-mean variance in the model estimates: M ( t ) = | var m ( x m,s,t ) | s (A 3)while the scenario uncertainty is the variance of the multi-model mean: S ( t ) = var s ( | x m,s,t | m ) (A 4)The total variance is then the sum of each of these terms: T ( t ) = V + S ( t ) + M ( t ) . (A 5) References

1. Huntingford C, Jeffers ES, Bonsall MB, Christensen HM, Lees T, Yang H.Machine learning and artiﬁcial intelligence to aid climate change research and preparedness.Environmental Research Letters. 2019;14(12):124007.2. Reichstein M, Camps-Valls G, Stevens B, Jung M, Denzler J, Carvalhais N, et al.Deep learning and process understanding for data-driven Earth system science.Nature. 2019;566(7743):195–204.3. O’Hagan A.Bayesian analysis of computer code outputs: A tutorial.Reliability Engineering & System Safety. 2006;91(10-11):1290–1300.4. Kennedy MC, O’Hagan A.Bayesian calibration of computer models.Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2001;63(3):425–464. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. Available from: https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/1467-9868.00294 .5. Wang KA, Pleiss G, Gardner JR, Tyree S, Weinberger KQ, Wilson AG. Exact GaussianProcesses on a Million Data Points; 2019.6. Diggle PJ, Gratton RJ.Monte Carlo Methods of Inference for Implicit Statistical Models.Journal of the Royal Statistical Society Series B (Methodological). 1984;46(2):193–227.Available from: .7. Hawkins E, Sutton R.The Potential to Narrow Uncertainty in Regional Climate Predictions.Bulletin of the American Meteorological Society. 2009;90(8):1095–1108.8. Wilcox LJ, Liu Z, Samset BH, Hawkins E, Lund MT, Nordling K, et al.Accelerated increases in global and Asian summer monsoon precipitation from future aerosolreductions.Atmospheric Chemistry and Physics Discussions. 2020:1–30.9. Allen MR, Stainforth DA.Towards objective probabalistic climate forecasting.Nature. 2002;419(6903):228–228.10. Collins M.Ensembles and probabilities: a new era in the prediction of climate change.Philosophical Transactions of the Royal Society A: Mathematical, Physical and EngineeringSciences. 2007;365(1857):1957–1970.11. Carslaw KS, Lee LA, Reddington CL, Pringle KJ, Rap A, Forster PM, et al.Large contribution of natural aerosols to uncertainty in indirect forcing.Nature. 2013 Nov;503(7474):67.12. Lohmann U, Hoose C.Sensitivity studies of different aerosol indirect effects in mixed-phase clouds.Atmospheric Chemistry and Physics. 2009;9(22):8917–8934.13. Jeuken ABM, Siegmund PC, Heijboer LC, Feichter J, Bengtsson L.On the potential of assimilating meteorological analyses in a global climate model for thepurpose of model validation.Journal of Geophysical Research: Atmospheres. 1996;101(D12):16939–16950.Available from: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/96JD01218 .14. Stevens B.Rethinking the Lower Bound on Aerosol Radiative Forcing.Journal of Climate. 2015 Mar;28(12):4794–4819.15. Bauer P, Thorpe A, Brunet G.The quiet revolution of numerical weather prediction.Nature. 2015;525(7567):47–55.16. Maher N, Milinski S, Suarez-Gutierrez L, Botzet M, Dobrynin M, Kornblueh L, et al.The Max Planck Institute Grand Ensemble: Enabling the Exploration of Climate SystemVariability.Journal of Advances in Modeling Earth Systems. 2019;11(7):2050–2069.17. Rasp S, Dueben PD, Scher S, Weyn JA, Mouatadid S, Thuerey N.WeatherBench: A benchmark dataset for data-driven weather forecasting.arXiv. 2020.18. Weyn JA, Durran DR, Caruana R.Can Machines Learn to Predict Weather? Using Deep Learning to Predict Gridded 500-hPaGeopotential Height From Historical Weather Data.Journal of Advances in Modeling Earth Systems. 2019;11(8):2680–2693.Available from: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2019MS001705 .19. Weber T, Corotan A, Hutchinson B, Kravitz B, Link R.Technical note: Deep learning for creating surrogate models of precipitation in Earth systemmodels.Atmospheric Chemistry and Physics. 2020;20(4):2303–2317.20. Scher S, Messori G. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. Weather and climate forecasting with neural networks: using general circulation models(GCMs) with different complexity as a study ground.Geoscientiﬁc Model Development. 2019;12(7):2797–2809.21. Scher S.Toward Data-Driven Weather and Climate Forecasting: Approximating a Simple GeneralCirculation Model With Deep Learning.Geophysical Research Letters. 2018;45(22):12,616–12,622.Available from: https://agupubs.onlinelibrary.wiley.com/doi/abs/10.1029/2018GL080704 .22. Castruccio S, Hu Z, Sanderson B, Karspeck A, Hammerling D.Reproducing Internal Variability with Few Ensemble Runs.Journal of Climate. 2019 11;32(24):8511–8522.Available from: https://doi.org/10.1175/JCLI-D-19-0280.1 .23. Knutti R, Masson D, Gettelman A.Climate model genealogy: Generation CMIP5 and how we got there.Geophysical Research Letters. 2013;40(6):1194–1199.24. Pincus R, Batstone CP, Hofmann RJP, Taylor KE, Glecker PJ.Evaluating the present-day simulation of clouds, precipitation, and radiation in climatemodels.Journal of Geophysical Research. 2008;113(D14).25. Taylor KE, Stouffer RJ, Meehl GA.An Overview of CMIP5 and the Experiment Design.Bulletin of the American Meteorological Society. 2011 Oct;93(4):485–498.26. Eyring V, Bony S, Meehl GA, Senior CA, Stevens B, Stouffer RJ, et al.Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimentaldesign and organization.Geoscientiﬁc Model Development. 2016;9(5):1937–1958.27. Monteleoni C, Schmidt GA, Saroha S, Asplund E.Tracking climate models.Statistical Analysis and Data Mining. 2011;4(4):372–392.28. Wilcke RAI, Mendlik T, Gobiet A.Multi-variable error correction of regional climate models.Climatic Change. 2013 Oct;120(4):871–887.Available from: https://doi.org/10.1007/s10584-013-0845-x .29. Watson PAG.Applying machine learning to improve simulations of a chaotic dynamical system usingempirical error correction.Journal of Advances in Modeling Earth Systems. 2019.30. Beven K, Freer J.Equiﬁnality, data assimilation, and uncertainty estimation in mechanistic modelling ofcomplex environmental systems using the GLUE methodology.Journal of Hydrology. 2001;249(1–4):11–29.31. Stevens B, Bony S.What Are Climate Models Missing?Science. 2013 May;340(6136):1053–1054.32. Ceppi P, Brient F, Zelinka MD, Hartmann DL.Cloud feedback mechanisms and their representation in global climate models.Wiley Interdisciplinary Reviews: Climate Change. 2017;8(4).33. Allen MR, Stott PA, Mitchell JFB, Schnur R, Delworth TL.Quantifying the uncertainty in forecasts of anthropogenic climate change.Nature. 2000;407(6804):617–620.34. Stainforth DA, Aina T, Christensen C, Collins M, Faull N, Frame DJ, et al.Uncertainty in predictions of the climate response to rising levels of greenhouse gases.Nature. 2005;433(7024):403–406.35. Rougier J, Sexton DMH.Inference in ensemble experiments.Philosophical Transactions of the Royal Society A: Mathematical, Physical and EngineeringSciences. 2007;365(1857):2133–2143. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A ..................................................................

36. Sexton DMH, Murphy JM, Collins M, Webb MJ.Multivariate probabilistic projections using imperfect climate models part I: outline ofmethodology.Climate Dynamics. 2012;38(11-12):2513–2542.37. Lee LA, Carslaw KS, Pringle KJ, Mann GW, Spracklen DV.Emulation of a complex global aerosol model to quantify sensitivity to uncertain parameters.Atmospheric Chemistry and Physics. 2011;11(23):12253–12273.38. Sexton DMH, Karmalkar AV, Murphy JM, Williams KD, Boutle IA, Morcrette CJ, et al.Finding plausible and diverse variants of a climate model. Part 1: establishing the relationshipbetween errors at weather and climate time scales.Climate Dynamics. 2019;53(1–2):989–1022.39. Watson-Parris D, Bellouin N, Deaconu L, Schutgens N, Yoshioka M, Regayre LA, et al.Constraining uncertainty in aerosol direct forcing.Geophysical Research Letters. 2020.40. Williamson DB, Blaker AT, Sinha B.Tuning without over-tuning: parametric uncertainty quantiﬁcation for the NEMO oceanmodel.Geoscientiﬁc Model Development. 2017;10(4):1789–1816.Available from: .41. Rasp S, Pritchard MS, Gentine P.Deep learning to represent subgrid processes in climate models.Proceedings of the National Academy of Sciences of the United States of America.2018;115(39):9684–9689.42. Brenowitz ND, Bretherton CS.Prognostic Validation of a Neural Network Uniﬁed Physics Parameterization.Geophysical Research Letters. 2018 Jun;45(12):6289–6298.43. Schneider T, Lan S, Stuart A, Teixeira J.Earth System Modeling 2.0: A Blueprint for Models That Learn From Observations andTargeted High-Resolution Simulations.Geophysical Research Letters. 2017;44(24):12,396–12,417.44. Meinshausen M, Raper SCB, Wigley TML.Emulating coupled atmosphere-ocean and carbon cycle models with a simpler model,MAGICC6 – Part 1: Model description and calibration.Atmospheric Chemistry and Physics. 2011;11(4):1417–1456.45. Smith CJ, Forster PM, Allen M, Leach N, Millar RJ, Passerello GA, et al.FAIR v1.3: a simple emissions-based impulse response and carbon cycle model.Geoscientiﬁc Model Development. 2018;11(6):2273–2297.46. Santer BD, Wigley TM, Schlesinger ME, Mitchell JF.Developing climate scenarios from equilibrium GCM results; 1990.47. Holden PB, Edwards NR.Dimensionally reduced emulation of an AOGCM for application to integrated assessmentmodelling: DIMENSIONALLY REDUCED AOGCM EMULATION.Geophysical Research Letters. 2010;37(21):n/a–n/a.48. Castruccio S, McInerney DJ, Stein ML, Crouch FL, Jacob RL, Moyer EJ.Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*.Journal of Climate. 2014;27(5):1829–1844.49. Beusch L, Gudmundsson L, Seneviratne SI.Emulating Earth system model temperatures with MESMER: from global mean temperaturetrajectories to grid-point-level realizations on land.Earth System Dynamics. 2020;11(1):139–159.50. Holden PB, Edwards NR, Rangel TF, Pereira EB, Tran GT, Wilkinson RD.PALEO-PGEM v1.0: a statistical emulator of Pliocene–Pleistocene climate.Geoscientiﬁc Model Development. 2019;12(12):5137–5155.51. Kay JE, Deser C, Phillips A, Mai A, Hannay C, Strand G, et al.The Community Earth System Model (CESM) Large Ensemble Project: A CommunityResource for Studying Climate Change in the Presence of Internal Climate Variability.Bulletin of the American Meteorological Society. 2015 09;96(8):1333–1349.Available from: https://doi.org/10.1175/BAMS-D-13-00255.1 . r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A ..................................................................

52. Kasim MF, Watson-Parris D, Deaconu L, Oliver S, Hatﬁeld P, Froula DH, et al.Up to two billion times acceleration of scientiﬁc simulations with deep neural architecturesearch.arXiv. 2020.53. Scher S, Messori G.Generalization properties of feed-forward neural networks trained on Lorenz systems.Nonlinear Processes in Geophysics. 2019;26(4):381–399.Available from: https://npg.copernicus.org/articles/26/381/2019/ .54. McCollum DL, Gambhir A, Rogelj J, Wilson C.Energy modellers should explore extremes more systematically in scenarios.Nature Energy. 2020;5(2):104–107.55. Ren J, Liu PJ, Fertig E, Snoek J, Poplin R, DePristo MA, et al.. Likelihood Ratios for Out-of-Distribution Detection; 2019.56. Gal Y, Ghahramani Z. Dropout as a Bayesian Approximation: Representing ModelUncertainty in Deep Learning; 2016.57. Cohen J, Coumou D, Hwang J, Mackey L, Orenstein P, Totz S, et al.S2S reboot: An argument for greater inclusion of machine learning in subseasonal to seasonalforecasts.Wiley Interdisciplinary Reviews: Climate Change. 2018;10(2).58. Abernathey R, kevin paul, joe hamman, matthew rocklin, chiara lepore, michael tippett, et al.Pangeo NSF Earthcube Proposal; 2017.Available from: https://figshare.com/articles/Pangeo_NSF_Earthcube_Proposal/5361094 .59. O’Neill BC, Tebaldi C, van Vuuren DP, Eyring V, Friedlingstein P, Hurtt G, et al.The Scenario Model Intercomparison Project (ScenarioMIP) for CMIP6.Geoscientiﬁc Model Development. 2016;9(9):3461–3482.Available from: https://gmd.copernicus.org/articles/9/3461/2016/ .60. Eyring V, Bony S, Meehl GA, Senior CA, Stevens B, Stouffer RJ, et al.Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimentaldesign and organization.Geoscientiﬁc Model Development. 2016;9(5):1937–1958.Available from: https://gmd.copernicus.org/articles/9/1937/2016/ .61. Watson-Parris D, Deaconu L.Example Perturbed Parameter Ensemble (Black Carbon).zenodo. 2020.Available from: https://doi.org/10.5281/zenodo.3856645 .62. Haarsma RJ, Roberts MJ, Vidale PL, Senior CA, Bellucci A, Bao Q, et al.High Resolution Model Intercomparison Project (HighResMIP v1.0) for CMIP6.Geoscientiﬁc Model Development. 2016;9(11):4185–4208.Available from: https://gmd.copernicus.org/articles/9/4185/2016/ .63. Damianou A, Lawrence N.Deep gaussian processes.In: Artiﬁcial Intelligence and Statistics; 2013. p. 207–215.64. Tran GT, Oliver KIC, Sóbester A, Toal DJJ, Holden PB, Marsh R, et al.Building a traceable climate model hierarchy with multi-level emulators.Advances in Statistical Climatology, Meteorology and Oceanography. 2016;2(1):17–37.65. Cohen TS, Geiger M, Koehler J, Welling M. Spherical CNNs; 2018.66. Gutowski Jr WJ, Giorgi F, Timbal B, Frigon A, Jacob D, Kang HS, et al.WCRP COordinated Regional Downscaling EXperiment (CORDEX): a diagnostic MIP forCMIP6.Geoscientiﬁc Model Development. 2016;9(11):4087–4095.Available from: https://gmd.copernicus.org/articles/9/4087/2016/ .67. Holden PB, Edwards NR, Garthwaite PH, Fraedrich K, Lunkeit F, Kirk E, et al.PLASIM-ENTSem v1.0: a spatio-temporal emulator of future climate change for impactsassessment.Geoscientiﬁc Model Development. 2014;7(1):433–451.68. Perdikaris P, Raissi M, Damianou A, Lawrence ND, Karniadakis GE.Nonlinear information fusion algorithms for data-efﬁcient multi-ﬁdelity modelling. r s t a . r o y a l s o c i e t y pub li s h i ng . o r g P h il . T r an s . R . S o c . A .................................................................. Proceedings Mathematical, physical, and engineering sciences. 2017;473(2198):20160751.69. Cranmer K, Brehmer J, Louppe G.The frontier of simulation-based inference.Proceedings of the National Academy of Sciences. 2020 May:201912789.Available from: http://dx.doi.org/10.1073/pnas.1912789117http://dx.doi.org/10.1073/pnas.1912789117