[PDF] Analysis of a bistable climate toy model with physics-based machine learning methods

Abstract

We propose a comprehensive framework able to address both the predictability of the first and of the second kind for high-dimensional chaotic models. For this purpose, we analyse the properties of a newly introduced multistable climate toy model constructed by coupling the Lorenz '96 model with a zero-dimensional energy balance model. First, the attractors of the system are identified with Monte Carlo Basin Bifurcation Analysis. Additionally, we are able to detect the Melancholia state separating the two attractors. Then, Neural Ordinary Differential Equations are applied in order to predict the future state of the system in both of the identified attractors.

Full PDF

EEPJ manuscript No. (will be inserted by the editor)

Analysis of a bistable climate toy model withphysics-based machine learning methods

Maximilian Gelbrecht , , a , Valerio Lucarini , , Niklas Boers , , , and J¨urgenKurths , , Potsdam Institute for Climate Impact Research, Potsdam, Germany Department of Physics, Humboldt-Universit¨at zu Berlin, Berlin, Germany Department of Mathematics and Statistics, University of Reading, Reading, United King-dom Centre for the Mathematics of Planet Earth, University of Reading, Reading, UnitedKingdom Department of Mathematics and Computer Science, Freie Universit¨at Berlin, Berlin, Ger-many Department of Mathematics and Global Systems Institute, University of Exeter, Exeter,United Kingdom Lobachevsky State University of Nizhny Novgorod, Nizhny Novgorod, Russia

Abstract.

We propose a comprehensive framework able to address boththe predictability of the ﬁrst and of the second kind for high-dimensionalchaotic models. For this purpose, we analyse the properties of a newlyintroduced multistable climate toy model constructed by coupling theLorenz ’96 model with a zero-dimensional energy balance model. First,the attractors of the system are identiﬁed with Monte Carlo Basin Bi-furcation Analysis. Additionally, we are able to detect the Melancholiastate separating the two attractors. Then, Neural Ordinary DiﬀerentialEquations are applied in order to predict the future state of the systemin both of the identiﬁed attractors.

Understanding the qualitative behaviour and dynamics of high-dimensional chaoticmodels of complex systems is a challenging task. The Earth’s climate [1, 2, 3], but alsopower grids [4], the human brain [5, 6], and perception [7] or gene expression networks[8] all exhibit, in certain range of the system’s parameters, multiple attractors withdiﬀerent basins of attractions. Energy balance climate models exhibit the well knownhysteresis behaviour with respect to the solar radiation between a cold snowball earthstate and a warmer state corresponding to the present-day climate, with discontinuoustransitions taking place at the lower and upper boundary of the region of bistability [1,2, 9, 10, 11]. Indeed, multistability also almost always gives rise to a potential abruptchange of the system when a bifurcation point is - adiabatically - reached and one stateloses it stability. In the context of geosciences, the crossing of a bifurcation point isusually referred to as tipping point [12]. Studying them is one of the crucial challenges a e-mail: [email protected] a r X i v : . [ phy s i c s . d a t a - a n ] N ov Will be inserted by the editor of understanding and hopefully mitigating climate change [13]. The notion of tippingpoint has been recently widened in order to accmomodate transitions that are causedspeciﬁcally by the presence of a non-vanishing rate of change of the parameter ofinterest or by noise [14]. Far from the tipping point, when the system is a regime ofmultistability, transitions between the competing states is not possible in the case ofautonomous dynamics, as the asymptotic state is determined by the initial condition,depending on which basin of attraction it belongs to. Initial conditions located on thebasin boundaries - which have vanishing Lebesgue measure - are, instead, attracted tothe edge states, which are saddles located on such basin boundaries [15, 16, 17]. Suchsaddles determine the global instabilities of the system and, additionally, if the systemis, under fairly general conditions, forced with Gaussian noise, they are the gatewaysfor noise-induced transitions between competing metastable states [18, 19, 3, 20].In order to characterise multistable systems, one needs to recover informationon each competing attractor, so that it is possible to dynamically distinguish them.Indeed, one should be able to compute dynamical properties as e.g. diﬀerent Lyapunovexponents or the basin stability [21]. Any approach aimed at performing predictions inpotentially multistable systems thus needs to ﬁrst identify the diﬀerent attractors andtheir basins of attractions to be able to make meaningful predictions on either of them.In this article, we will present a two-part framework to approach high-dimensional,spatiotemporally chaotic models, to understand their complex behaviour and theirbasins of attraction, and then predict future states of each of their attractors. InLorenz’ terminology, in this paper we try to bundle together a methodology to addressboth the predictability of the ﬁrst kind - the model sensitivity to inaccurate initialconditions, hindering inﬁnitely long deterministic predictions - and of the second kind,associated with the uncertainty on the asymptotic state reached by the system [22].We proceed as follows. The Monte Carlo Basin Bifurcation Analysis (MCBB) [23]is used to identify the basins of attractions with the largest volumes and how thevolume changes when control parameters of the system are changed. MCBB makesuse of random sampling and clustering techniques to quantify the volume of the basinsof attraction and track how they change when control parameters of the system arevaries. Based on the MCBB results, we learn which trajectories are asymptoticallyevolving towards which attractor.For achieving predictability of the ﬁrst kind, we apply a hybrid approach that com-plements potentially incomplete models with data-driven methods. Most numericalmodels describing a real world system and especially those of Earth system mod-els, can be seen as incomplete in some regard. This can be due to e.g. neglectinghigher order terms or by an unknown external inﬂuence. Predicting even only par-tially known chaotic systems has been successfully done with hybrid approaches thatcombine knowledge of the governing equation with data-driven methods [24, e.g.].One particular promising approach are Neural Diﬀerential Equation or Universal Dif-ferential Equations [25, 26] that enable us to train artiﬁcial neural networks (ANNs)inside of the diﬀerential equations.We will test our framework on a new prototypical bistable model that we introducebelow. The model is constructed by coupling a zero-dimensional classic energy balancemodel featuring bistability with the Lorenz96 (L96) model [27], which has gainedprominence as prototypical system featuring spatially extended chaos. This modelwill serve as an ideal testbed for our approach.The paper is structured as follows: First, we will introduce the Bistable ClimateToy Model, recap the dynamic properties of its key ingredient, the Lorenz96 model,and investigate the basic properties of the full coupled toy model. Subsequently, wewill apply our two-part approach by ﬁrst identifying and tracking the attractors ofthe model with MCBB, and then demonstrating the Neural Diﬀerential Equation ill be inserted by the editor 3 approach on the model by replacing its energy balance model with an ANN andmaking predictions of the model.

The Bistable Climate Toy Model is set up by coupling a Lorenz96 (L96) model [27, 28]to a zero-dimensional energy balance model (EBM). The L96 model describes a highlynontrivial dynamics on a one-dimensional periodic lattice composed of N grid points,and is rapidly becoming a reference for studying non-equilibrium steady states in spa-tially extended systems. The L96 model features processes of advection, forcing, anddissipation. It can be thought of as representative of the dynamics of the atmospherealong one latitudinal circle [27, 28], even if such correspondence is more metaphoricalthan actual, because the L96 model does not correspond to a truncated version of anyknown ﬂuid dynamical system. The L96 model has rapidly gained relevance in manydiﬀerent research areas in order to study bifurcations [29, 30, 31, 32, 33, 34, 35],to test parametrizations [36, 37, 38, 39, 40, 41, 42], to investigate extreme events[43, 44, 45, 46], to improve data assimilation schemes [47, 48, 49, 50] and ensembleforecasting techniques [51, 52, 53], to develop new tools for investigating predictability[54, 55, 56, 57], and for addressing basic issues in non-equilibrium statistical mechanics[58, 59, 60, 61, 62]. The L96 model is formulated as follows:˙ X n = ( X n +1 − X n − ) X n − − γX n + F, n = 1 , . . . , N (1)with periodic boundary conditions given by X j − N = X j = x j + N ∀ j = 1 , . . . , N . Theparameter F describes the forcing acting on the model, γ controls the intensity of thedissipation and, thus, of the contraction of phase space volume, and the nonlinearterm on the right hand side describes a non-standard advection. The energy of thesystem E = 1 / (cid:80) Nn =1 X n is conserved in the unviscid and unforced limit and actsas generator of the time translation, even though the system is not Hamiltonian.For a detailed analysis of the mechanics and energetics of the L96 model and of ageneralisation thereof we refer to [63].If γ = 1 (which is the default choice in most studies) and N (cid:29)

1, the model’sattractor is the ﬁxed point X k = F , k = 1 , . . . , N for 0 ≤ F ≤ /

9. This ﬁxed pointloses stability as F is increased and, after a complex set of bifurcations [32, 33, 34],the system settles in a chaotic regime for F ≥ . N [61, 63], and one can establish power laws that accuratelydescribe how some fundamental properties of the system - such as its energy andLyapunov exponents - depend of the parameter F [61].As far as numerical evidence goes, the L96 model possesses a unique asymptoticstate characterised by a physical measure supported on a compact attractor. In orderto introduce multistability in the L96 model, an eﬃcient strategy is to suitably coupleit with a multistable (say, bistable) model, as described in [64]. Our simple bistablemodel of reference is the zero-dimensional EBM of the form:˙ T = − d V ( T )d T (2)where V ( T ) is a conﬁning potential ( V ( T ) → ∞ suﬃciently fast as | T | → ∞ ) withtwo local minima separated by a local maximum. We then come to the following Will be inserted by the editor

Table 1.

ParametersReduced Solar Constant S a . a . σ / Reference Forcing F N T ∆ T X → T α T → X β coupled L96-EBM model:˙ T = S (cid:16) − a + a (cid:16) tanh (cid:16) T − ˜ T (cid:17)(cid:17)(cid:17) − σT − α (cid:18) E . · F − (cid:19) ˙ X n = ( X n +1 − X n − ) X n − − X n + F (cid:32) β T − ˜ T∆ T (cid:33) (3)where the usual periodic boundary conditions apply ( X j − N = X j = x j + N ∀ j =1 , . . . , N ), and E = E/N . The value used for the diﬀerent parameters - all intendedto be non-negative - can be found in Tab. 1. The L96 model and the EBM are uncou-pled if one sets α = β = 0. The coupling between the two models can be explained asfollows. If the temperature of the EBM is higher (lower) than the reference temper-ature ˜ T , the L96 model receives an enhanced (reduced) forcing, mimicking - in veryrough terms - the presence of higher energy in the atmosphere. A negative feedbackin the system is introduced as follows. If the energy per site of the L96 componentof the model exceeds the average value realised in the uncoupled case ¯ E ≈ . F / [61], the temperature of the system is accordingly reduced. Note that, according tothe framework set in [64], the L96 model is the fast component and the EBM is theslow component of the coupled model, where the fast component acts as an almoststochastic forcing on the slow component, and the slow component modulates thedynamics of the fast component.In Fig. 1 we portray the two competing attractors of the model correspondingto the Warm (W) state and Snowball (SB) state state in the reduced phase stateconstructed by performing a projection on the variables M = 1 /N (cid:80) Nj =1 X j , E , and T . One sees clearly that the W state has higher temperature and larger values for themean and for the intensity of the ﬂuctuations of the dynamic variables with respect tothe SB state, in agreement with the actual features of the competing W and SB statesof the climate system [65, 10, 3]. We additionally portray the Melancholia (M) Statesitting between the two competing attractors. The M State is a saddle embeddedin the boundary between the two co-existing basins of attraction and attracts theorbits whose initial conditions are on such basin boundary [17]. The M state, whichis the gateway for the noise-induced transitions between the co-existing attractorsregardless of the kind of noise included in the system [19, 3], has been constructedusing the edge-tracking algorithm [66], and features non-trivial dynamics. Indeed, asin [17], it is a chaotic saddle. ill be inserted by the editor 5 Fig. 1.

Competing Attractors and M state between them in the projection of the phasespace given by M = 1 /N (cid:80) Nj =1 X j , E = E/N , and T . Simulations performed with S = 16and N = 32. Warm (W): red line. Snowball (SB) state: blue line. Melancholia (M) State:green line. In the following, we outline a two-part framework to analyse and predict spatiotem-porally chaotic system such as the Bistable Climate Toy Model. First we demonstratehow the attractors of the system can be can be identiﬁed. Based on that knowledge,a method to make predictions of states on both attractors is then introduced.

Multistability is a universal phenomenon of complex systems and is most likely presentin several sub-systems of Earth’s climate [67, 68, 69, e.g.], as well as in the energybalance of the Earth [1, 2, 3]. When analysing and working with high-dimensionalmodels, knowledge of the largest basins of attractions is instrumental to understand-ing the model itself. The Monte Carlo Basin Bifurcation Analysis (MCBB) [23] is anumerical method tailored for analysing basins of attraction of high-dimensional sys-tems and how they vary when control parameters of the system are changed. MCBB isconceptually situated in between a thorough bifurcation analysis and a macroscopicorder parameter. By utilizing random sampling and clustering techniques, MCBBlearns classes of similar attractors C and their basins. To regard two attractors A as part of the same class, we require a notion of continuity of an invariant measure ρ A on the attractor: if for a control parameter p the diﬀerence between ρ A ( p ) and ρ A ( p + ∆p ) vanishes smoothly for ∆p →

0, we classify them as similar. Fig. 2 illus-trates this continuity requirement. By sampling trajectories we can built these classeswith a suitable pseudometric on the space of these measures. Directly comparing thehigh-dimensional trajectories with each other would put us close to a bifurcation anal-ysis, but is potentially prohibitively expensive. We might also not be interested in anin-depth bifurcation analysis, but rather in a coarser deﬁnition of similarity. For the

Will be inserted by the editor case of a climate model, similar climate regimes may for example be of interest. Webuilt the pseudometric on weighted diﬀerences of statistics S k ( ρ i ): D ( ρ i , ρ j ) = (cid:88) k w k | S k ( ρ i ) − S k ( ρ j ) | . (4)If one wishes not to distinguish between symmetric conﬁgurations, the weighted diﬀer-ence can also be replaced with a Wasserstein distance of histograms over the statistics.Especially in these cases two diﬀerent points ( x, y ) may have D ( x, y ) = 0, this is whywe are referring to D as a pseudometric and not a metric.In the following, we will use as statistics – the position of the attractor E k = (cid:104) x (cid:105) ρ k – the width of the attractor Var k = (cid:10) ( x − E k ) (cid:11) ρ k – the non-normality, measured with the Kullback–Leibler divergenceKL k = KL ( ρ k ||N ( E k , Var k )),where ρ k is the marginal distribution on system dimension k . By using these statis-tics on each of the system dimension separately, MCBB achieves better scalabilitywith the system dimension N . The classes of attractors can then be identiﬁed by ap-plying a clustering algorithm. Density-based clustering algorithms such as DBSCAN[70] are ideally suited for this purpose, since DBSCAN relies on a similar notion ofcontinuity as we require. One sample is connected to another sample if it is within (cid:15) DB -neighbourhood of the sample. A cluster is then formed by all the samples thathave a chain of connections to each other. The results of applying DBSCAN to sampletrajectories is C = C ( { D } ) where each sample trajectory is assigned to one of the N C clusters C i with i ∈ [1; N C ]. Note that DBSCAN can also identify samples as outliers.We regard all outliers, if there are any, as the zeroth cluster. The approximate relativebasin size of a class of asymptotic states can then be computed by applying a slidingparameter window and normalizing the results withˆ b C i ( p ) = || CL ( p ) i || / N C (cid:88) j || CL ( p ) j || (5) CL ( p ) i = (cid:110) j | ( C j = i ) ∩ (cid:16) p ( j ) ∈ [ p min ; p max ] (cid:17)(cid:111) . (6)This is the most important result from MCBB. b C i ( p ) shows the size of the largestbasins of the system and how they react to changes of the control parameter p . In theresults we will also see other possibilities to further evaluate the collected results fromMCBB. By identifying the attractors, we also know which initial conditions are partof which basin. After achieving information about the attractors of the systems andtheir basins, we can investigate these individual attractor and apply further methodsto predict trajectories on them. ill be inserted by the editor 7 Fig. 2.

Sketch of the notion of continuity of invariant measures ρ A that MCBB requiresfor attractors to belong to the same class of attractors. Solid red lines indicate stable statesand dashed lines unstable states. If the diﬀerence between ρ A ( p ) and ρ A ( p ± ∆p ) vanishessmoothly for ∆ → (cid:15) DB neighbourhoods. Most - if not all - numerical models of real-world processes are incomplete in somesense. Be it due to unknown eﬀects that are not modelled, or by omitting higher-orderterms of known eﬀects on purpose. A classical example are subgrid-scale processesthat are not explicitly resolved in general circulation models of the Earth’s atmo-sphere and oceans, and hence need to be parametrized. Hybrid modelling methodscan remedy deﬁciencies of incomplete models by combining an incomplete process-based model with data-driven methods that learn to compensate these deﬁciencies.This approach seems especially promising with climate models as we have easy accessto observational data and the climate system is so immensely complex that everymodel of it is always incomplete in some sense. We will demonstrate the Neural Or-dinary Diﬀerential Equation (NODE) approach [26, 25], which provides an elegantway to construct and parametrize such hybrid models, on the Bistable Climate Toymodel. NODEs integrate universal functions approximators such as artiﬁcal neuralnetworks (ANN) directly into the equation˙ x = f ( x, t, N ( x, t ; Θ )) (7)where N ( x, t ; Θ ) is a data-driven function approximator such as an Artiﬁcial NeuralNetwork (ANN) with parameters Θ . Integrating the NODE yields a predicted tra-jectory ˆ x ( t ; Θ ) and the integration time can be freely set, but our previous resultsshow that for chaotic systems very short integration times are necessary to ensurethat the learning process succeeds [71]. Similar to regular ANNs, NODEs are a su-pervised learning method and their parameters, in our case only the parameters ofthe ANN, are found by minimizing a loss function, most commonly the least-squareserror between observed and model-predicted trajectories L ( Θ ) = (cid:88) i t ( x ( i t ) − ˆ x ( i t ; Θ )) (8)using a stochastic, adaptive gradient descent with weight decay[72]. Additional regu-larization of the parameters of the ANN may be added to avoid overﬁtting. In orderto train the model one needs to be able to compute gradients of the loss function withrespect to all the parameters of the NODE. Will be inserted by the editor

While a regular ANN relies on backpropagation – which is essentially the chainrule – to compute these derivatives, this is not as straightforward for diﬀerential equa-tions solvers. However, with methods from adjoint sensitivity analysis and responsetheory in conjunction with automatic diﬀerentiation techniques, such gradients canbe computed as well. For a detailed overview of the algorithms used for this purpose,please see [26, 25].

We apply MCBB to the Bistable Climate Toy Model by sampling N T = 15 , U ( −

7; 7) for the L96 state variables X and U (240 , T . The solar constant is varied within S ∈ [5; 20]. The trajectories are integrated for 400 time units and the ﬁrst 80% of thetrajectory are not included in the analysis to avoid transient eﬀects. Only the statistics E k ,Var k ,KL k of the L96 model are used for the identiﬁcation of the attractors, but theEBM mean E EBM is saved for further analysis as well. Fig. 3 shows the approximaterelative basin volume estimated by MCBB. Two classes, i.e. clusters, are found. Thesystem is multistable in the interval of around S ∈ [7; 15] with each of the basinsbeing approximately equal-sized with respect to the distributions of initial conditionschosen. For (cid:15) DB = 0 .

05 the clustering algorithm detects several outliers. These aremostly the trajectories that go through the Melancholia state (M). As shown in Fig. 5they exhibit a saddle-like behaviour for the EBM variable. When (cid:15) DB is increased to0 . F . Asone can expect the ”blue” cluster in Fig. 3 exhibits the much larger values of theforcing (see Fig. 4b)) and is thus the warm state of the model and the ”red” clusterin Fig. 3 is the cold state of the model. Again, the hysteresis behaviour of the modelis evidently shown. For S <

S >

15 onlythe warm state is stable.

MCBB is thus able to identify the two attractors of the system. These two attrac-tors will, in general, exhibit diﬀerent properties, like e.g. diﬀerent maximum Lya-punov exponent. The maximum Lyapunov exponents computed with the method of[73] (implementation of DynamicalSystems.jl [74]) are λ (cold max ≈ .

04 for the cold and λ (warm max ≈ .

60 for the warm state. The larger Lyapunov exponent of the warm stateshows that, as expected, the L96 model is more chaotic for larger values of the forc-ing. When we want to predict the model’s behavior, we have to be aware of thatand evaluate predictions on both attractors separately. MCBB also classiﬁes all ini-tial conditions that were used by the algorithms to either of the attractors. In thatway, we have many possible initial conditions for prediction on these attractors. Wedemonstrate the capabilities of the NODE approach by ”forgetting” the equation ofthe EBM and replacing it with an ANN. In this case this is an artiﬁcial example, butin many observational scenarios and model, one has incomplete models whose deﬁ-ciencies can be corrected with the NODE approach. In our case replacing the EBM ill be inserted by the editor 9

Solar Constant S A pp r o x i m a t e R e l a t i v e B a s i n V o l u m e cold state warm state Fig. 3.

Approximate relative basin size of the Bistable Climate Toy Model when changingthe solar constant S estimated with MCBB. The model exhibits a cold and a warm state.Trajectories from the shaded area are used as training data for the NODE approach topredict each of these states. e ﬀ e c t i v e F o r c i n g warm statecold state Fig. 4.

Sliding histogram plot of the mean of the EBM dimension of the Bistable ClimateToy Model computed with MCBB. The two histogram plots of each of the identiﬁed states arejoined together to better illustrate the two stable branches of the EBM and their hysteresisbehaviour. On the y-axis the value of complete eﬀective forcing term of the L96, so F ∗ = F (cid:16) β T − ˜ T∆ T (cid:17) , is shown. The relative magnitude of each of these values appearing in theindividual sliding histograms is shown in shades of blue and red, however in this case aremostly all zero or one. with an ANN is supposed to mirror setups of more realistic models in which oneprobably has much better knowledge of the governing equations of the atmospherethan the energy balance.The ANN N is set up to have the same input variables as the EBM in Eq. 3 hasarguments: all variables of the L96 model and the EBM itself. Due to the spatial input,Convolutional Layers are best suited. Fig. 6 shows the ANN used. The ConvolutionalLayers have only the L96 dimensions as inputs, whereas the forcing, the result of theEBM itself, skips these layers and inputs directly into the densely connected layers.The swish activation function [75] swish( x ) = x/ (1+exp ( − x )) is used as an activationfunction and MaxPooling layers reduce the dimension. By replacing the EBM the full Fig. 5.

Example of the trajectory of the energy balance model for one of the Melancholiastates found by MCBB as an outlier. Visible is a trajectory typical for a saddle. The tra-jectory remains close to the Melancholia state for some ﬁnite time until it collapses into thesnowball/cold state.

NODE reads ˙ F = N ( X , F ; Θ )˙ X n = ( X n +1 − X n − ) X n − − X n + F ; (9)where Θ are the parameters of all ANN layers. An adaptive stochastic gradient descentwith weight decay (AdamW) [72] is used to minimize the loss function L ( Θ ) = (cid:88) n (cid:16) X n − ˆ X n ( Θ ) (cid:17) + (cid:88) i t ,n (cid:16) F − ˆ F n ( Θ ) (cid:17) + γ N Θ (cid:88) i || θ i || . (10)Similar to earlier results for applying NODE techniques to spatiotemporally chaoticsystems [71], we found that integrating the NODE for long time spans does not signif-icantly decrease the loss on neither the training nor the validation set, but increasesthe computational complexity massively. Therefore, the NODE is only integrated for ∆t = 0 .

05 with only one time step saved. Hence, we minimize the one-step-ahead lossof the predicted states ( ˆ X , ˆ F ) against the training data ( X , F ). As training data twoseparated trajectories, each 100 time steps long (at ∆t = 0 . E n ( i t ) = X n ( i t ) − ˆ X n ( i t ) (11)and the normalized error e ( i t ) = || X ( i t ) − ˆ X ( i t ) || < || X ( i t ) || > / t . (12)The forecast length or valid time of the NODE is then the ﬁrst time step t v where e ( i t ) > . λ max t = λ max ∆t · i t . Similar to how the NODE was trained with data ill be inserted by the editor 11 Fig. 6.

ANN setup used to replace the EBM in the NODE. Convolutional layers with twoﬁlters, i.e. channels, a (3 ×

1) kernel and a swish activation function are used on the L96dimensions, the output of these layers and the old forcing value F are used as inputs of twodensely connected layers. N in is chosen to have the correct input dimension which dependson the dimension of the L96 model. S p a t i a l C oo r d i n a t e a b warm statecold state n o n - n o r m a li z e d e rr o r Fig. 7.

NODE predictions of the Bistable Climate Toy Model, the non-normalized error E n ( i t ) of the L96 dimensions is shown. (a) shows a prediction on the cold state, (b) thewarm state. The valid time t v is marked with the dashed line. from both attractors, we also predict and evaluate trajectories from both attractors.Fig. 7 shows the trajectories of the NODE that were integrated from initial conditionsof the ﬁrst time step outside of the training dataset for both attractors. As expectedthe valid time is smaller for the more chaotic warm state than for the cold state. Forthe cold state the valid time 169 time steps or 8 . λ max t and for the warm state it is39 time steps or 5 . λ max t . We have presented a framework for addressing the predictability of the ﬁrst kindand of the second kind in high-dimensional chaotic systems. First, we understandthe qualitative properties of the system by discovering the attractors with the largestbasins of attractors and evaluate how the volume of these basins changes when control parameters of the system are varied. As we gathered knowledge on the attractors ofthe systems, the NODE approach allows one to predict the evolution of the systemseven when the model is only incomplete with respect to the data it is trained with.With the NODE we replaced a sub-module of the model with a data-driven functionapproximator in the form of an ANN. This approach has the potential to be appliedto more complex coupled models in conditions where only incomplete or no knowledgeof a speciﬁc part of the model is available. When predicting observational data withmodels, this could also be used to account for unknown or neglected eﬀects in themodel. The Bistable Climate Toy Model introduced here is an ideal testbed for thisapproach. The model is built by coupling a bistable EBM to the L96 model andexhibits two competing attractors in a vast range of the model’s parameters. Theseattractors correspond to a cold and a warm state, for which we are able to identifythe basins of attractions and deﬁne the bifurcations conducive to tipping points. Weare also able to identify accurately which trajectories lead to which of the attractors,so that we use these as training data for the NODE. In the subsequent applicationof the NODE approach, we purposefully forgot a part of the model, the EBM, andreplaced it with an ANN. The NODE can model these introduced deﬁciencies for bothattractors at the same time and make accurate prediction even when only presentedwith very short training datasets for the data-driven part to be trained on. Theresults on the presented toy model can also be seen as a ﬁrst step towards analysingand predicting more complex climate models with the presented methods. Especiallyapplying the NODE approach to e.g. atmospheric models and observational data isa highly promising outlook.

Acknowledgments

This paper was developed within the scope of the IRTG 1740/TRP2015/50122-0, funded by the DFG/FAPESP. The authors thank the German FederalMinistry of Education and Research and the Land Brandenburg for supporting thisproject by providing resources on the high performance computer system at the Pots-dam Institute for Climate Impact Research. NB and VL acknowledge funding fromthe European Union’s Horizon 2020 research and innovation program under grantagreement No 820970 (TiPES). NB acknowledges funding by the Volkswagen foun-dation. JK acknowledges funding by the Russian Ministry of Education and Scienceof the Russian Federation Agreement No. 075-15-2020-808.We wish to acknowledge the authors of the Julia libraries DiﬀEqFlux.jl [26], Dif-ferentialEquations.jl [76] and Flux.jl [77] that were used for this study.

References

1. Budyko, M. I. The eﬀect of solar radiation variations on the climate of the earth.

Tellus , 611–619 (1969). URL https://doi.org/10.3402/tellusa.v21i5.10109 . https://doi.org/10.3402/tellusa.v21i5.10109 .2. Sellers, W. D. A Global Climatic Model Based on the Energy Balanceof the Earth-Atmosphere System. Journal of Applied Meteorology , 392–400 (1969). URL https://doi.org/10.1175/1520-0450(1969)008<0392:AGCMBO>2.0.CO;2 . https://journals.ametsoc.org/jamc/article-pdf/8/3/392/4975545/1520-0450(1969)008_0392_agcmbo_2_0_co_2.pdf .3. Lucarini, V. & B´odai, T. Global stability properties of the climate: Melancholiastates, invariant measures, and phase transitions. Nonlinearity , R59–R92(2020). URL https://doi.org/10.1088%2F1361-6544%2Fab86cc . ill be inserted by the editor 13

4. Machowski, J., Bialek, J. & Bumby, J.

Power System Dynamics: Stability andControl, 2nd Edition (Wiley, 2008).5. Babloyantz, A. & Destexhe, A. Low-dimensional chaos in an instance ofepilepsy.

Proceedings of the National Academy of Sciences , 3513–3517 (1986).URL . .6. Lytton, W. W. Computer modelling of epilepsy. Nature Reviews Neuroscience (2008).7. Schwartz, J.-L., Grimault, N., Hup´e, J.-M., Moore, B. C. J. & Pressnitzer, D.Multistability in perception: binding sensory modalities, an overview. Philo-sophical Transactions of the Royal Society B: Biological Sciences , 896–905(2012). URL https://royalsocietypublishing.org/doi/abs/10.1098/rstb.2011.0254 . https://royalsocietypublishing.org/doi/pdf/10.1098/rstb.2011.0254 .8. Smole, P., Baxter, D. & Byrne, J. Mathematical modeling of gene networks. Neuron , 567–580 (2000).9. Ghil, M. Climate stability for a Sellers-type model. J. Atmos. Sci. , 3–20(1976).10. Lucarini, V., Fraedrich, K. & Lunkeit, F. Thermodynamic analysis of snowballearth hysteresis experiment: Eﬃciency, entropy production, and irreversibility. Q.J. Royal Met. Soc. , 2–11 (2010).11. Ghil, M. & Lucarini, V. The physics of climate variability and climate change.

Rev. Mod. Phys. , 035002 (2020). URL https://link.aps.org/doi/10.1103/RevModPhys.92.035002 .12. Lenton, T. M. et al. Tipping elements in the earth’s climate system.

Proceedingsof the National Academy of Sciences , 1786–1793 (2008). URL . .13. Lenton, T. M. et al. Climate tipping points—too risky to bet against (2019).14. Ashwin, P., Wieczorek, S., Vitolo, R. & Cox, P. Tipping points in open systems:bifurcation, noise-induced and rate-dependent examples in the climate system.

Philosophical Transactions of the Royal Society A: Mathematical, Physical andEngineering Sciences , 1166–1184 (2012).15. Grebogi, C., Ott, E. & Yorke, J. A. Fractal basin boundaries, long-lived chaotictransients, and unstable-unstable pair bifurcation.

Phys. Rev. Lett. , 935–938(1983). URL https://link.aps.org/doi/10.1103/PhysRevLett.50.935 .16. Vollmer, J., Schneider, T. M. & Eckhardt, B. Basin boundary, edge of chaosand edge state in a two-dimensional model. New Journal of Physics , 013040(2009). URL https://doi.org/10.1088%2F1367-2630%2F11%2F1%2F013040 .17. Lucarini, V. & B´odai, T. Edge states in the climate system: exploring globalinstabilities and critical transitions. Nonlinearity , R32–R66 (2017).18. Graham, R., Hamm, A. & T´el, T. Nonequilibrium potentials for dynamical sys-tems with fractal attractors or repellers. Phys. Rev. Lett. , 3089–3092 (1991).URL https://link.aps.org/doi/10.1103/PhysRevLett.66.3089 .19. Lucarini, V. & B´odai, T. Transitions across melancholia states in a climate model:Reconciling the deterministic and stochastic points of view. Phys. Rev. Lett. ,158701 (2019). URL https://link.aps.org/doi/10.1103/PhysRevLett.122.158701 .20. Margazoglou, G., Grafke, T., Laio, A. & Lucarini, V. Dynamical landscape andmultistability of the earth’s climate (2020). .21. Menck, P. J., Heitzig, J., Marwan, N. & Kurths, J. How basin stability com-plements the linear-stability paradigm.

Nature Physics , 89–92 (2013). URL https://doi.org/10.1038/nphys2516 .

22. Lorenz, E. N. The physical bases of climate and climate modelling. climatepredictability. In

GARP Publication Series , 132–136 (WMO, 1975).23. Gelbrecht, M., Kurths, J. & Hellmann, F. Monte carlo basin bifurcation analysis.

New Journal of Physics , 033032 (2020). URL https://doi.org/10.1088%2F1367-2630%2Fab7a05 .24. Pathak, J. et al. Hybrid forecasting of chaotic processes: Using machine learning inconjunction with a knowledge-based model.

Chaos: An Interdisciplinary Journalof Nonlinear Science , 041101 (2018). URL https://doi.org/10.1063/1.5028373 . https://doi.org/10.1063/1.5028373 .25. Chen, R. T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. Neural ordinarydiﬀerential equations (2018). .26. Rackauckas, C. et al. Universal diﬀerential equations for scientiﬁc machine learn-ing (2020). .27. Lorenz, E. Predictability: a problem partly solved. In

Seminar on Predictability,4-8 September 1995 , vol. 1, 1–18. ECMWF (ECMWF, Shinﬁeld Park, Reading,1995). URL .28. Lorenz, E. N. Designing Chaotic Models.

Journal of the AtmosphericSciences , 1574–1587 (2005). URL https://doi.org/10.1175/JAS3430.1 . https://journals.ametsoc.org/jas/article-pdf/62/5/1574/3483520/jas3430_1.pdf .29. Ott, E. Chaos in Dynamical Systems (Cambridge University Press, Cambridge,England, 1993).30. Broer, H., Sim´o, C. & Vitolo, R. Bifurcations and strange attractors in the Lorenz-84 climate model with seasonal forcing.

Nonlinearity , 1205–1267 (2002).31. Orrell, D. & Smith, L. Visualising bifurcations in high dimensional systems: Thespectral bifurcation diagram. International Journal of Bifurcation and Chaos ,3015–3027 (2003).32. van Kekem, D. L. & Sterk, A. E. Travelling waves and their bifurcations in thelorenz-96 model. Physica D: Nonlinear Phenomena , 38 – 60 (2018). URL .33. van Kekem, D. L. & Sterk, A. E. Wave propagation in the lorenz-96 model.

Nonlinear Processes in Geophysics , 301–314 (2018). URL https://npg.copernicus.org/articles/25/301/2018/ .34. van Kekem, D. L. & Sterk, A. E. Symmetries in the lorenz-96 model. International Journal of Bifurcation and Chaos , 1950008 (2019). URL https://doi.org/10.1142/S0218127419500081 . https://doi.org/10.1142/S0218127419500081 .35. Kerin, J. & Engler, H. On the lorenz ’96 model and some generalizations (2020). .36. Orrell, D. Model Error and Predictability over Diﬀerent Timescales in the Lorenz’96 Systems. Journal of the Atmospheric Sciences , 2219–2228 (2003).37. Wilks, D. Eﬀects of stochastic parametrizations in the Lorenz ’96 system. Quar-terly Journal of the Royal Meteorological Society , 389–407 (2005).38. Abramov, R. A simple stochastic parameterization for reduced models of multi-scale dynamics.

Fluids (2016). URL .39. Arnold, H. M., Moroz, I. M. & Palmer, T. N. Stochastic parametrizationsand model uncertainty in the lorenz system. Philosophical Transactions of theRoyal Society A: Mathematical, Physical and Engineering Sciences , 20110479(2013). URL https://royalsocietypublishing.org/doi/abs/10.1098/rsta.2011.0479 . https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2011.0479 . ill be inserted by the editor 15

40. Vissio, G. & Lucarini, V. A proof of concept for scale-adaptive parametrizations:the case of the Lorenz ’96 model.

Quarterly Journal of the Royal MeteorologicalSociety , 63–75 (2018).41. Vissio, G. Statistical mechanical methods for parametrization in geophysical ﬂuiddynamics.

Reports on Earth System Science (2018).42. Chattopadhyay, A., Hassanzadeh, P. & Subramanian, D. Data-driven predic-tions of a multiscale lorenz 96 chaotic system using machine-learning meth-ods: reservoir computing, artiﬁcial neural network, and long short-term mem-ory network.

Nonlinear Processes in Geophysics , 373–389 (2020). URL https://npg.copernicus.org/articles/27/373/2020/ .43. Blender, R. & Lucarini, V. Nambu representation of an extended lorenz modelwith viscous heating. Physica D: Nonlinear Phenomena , 86 – 91 (2013). URL .44. B´odai, T.

Extreme Value Analysis in Dynamical Systems: Two Case Studies ,392–429 (Cambridge University Press, 2017).45. Sterk, A. E. & van Kekem, D. L. Predictability of extreme waves in the lorenz-96 model near intermittency and quasi-periodicity.

Complexity , 9419024(2017). URL https://doi.org/10.1155/2017/9419024 .46. Hu, G., B´odai, T. & Lucarini, V. Eﬀects of stochastic parametrization on ex-treme value statistics.

Chaos: An Interdisciplinary Journal of Nonlinear Sci-ence , 083102 (2019). URL https://doi.org/10.1063/1.5095756 . https://doi.org/10.1063/1.5095756 .47. Trevisan, A. & Uboldi, F. Assimilation of Standard and Targeted Observationswithin the Unstable Subspace of the Observation – Analysis – Forecast CycleSystem. Journal of the Atmospheric Sciences , 103–113 (2004).48. Trevisan, A., Isidoro, D. & Talagrand, O. Four-dimensional variational assimi-lation in the unstable subspace and the optimal subspace dimension. QuarterlyJournal of the Royal Meteorological Society , 487–496 (2010).49. Hu, G. & Franzke, C. Data assimilation in a multi-scale model.

Mathematics ofClimate and Weather Forecasting , 118–139 (2017).50. Brajard, J., Carrassi, A., Bocquet, M. & Bertino, L. Combining data assimilationand machine learning to emulate a dynamical model from sparse and noisy obser-vations: A case study with the lorenz 96 model. Journal of Computational Science , 101171 (2020). URL .51. Wilks, D. S. Comparison of ensemble-mos methods in the lorenz ’96 setting. Meteorological Applications , 243–256 (2006).52. Duan, W. & Huo, Z. An Approach to Generating Mutually Indepen-dent Initial Perturbations for Ensemble Forecasts: Orthogonal ConditionalNonlinear Optimal Perturbations. Journal of the Atmospheric Sciences , 997–1014 (2016). URL https://doi.org/10.1175/JAS-D-15-0138.1 . https://journals.ametsoc.org/jas/article-pdf/73/3/997/4811703/jas-d-15-0138_1.pdf .53. Le Carrer, N. & Green, P. L. A possibilistic interpretation of ensemble forecasts:experiments on the imperfect lorenz 96 system. Advances in Science and Research , 39–45 (2020). URL https://asr.copernicus.org/articles/17/39/2020/ .54. Paz´o, D., Szendro, I. G., L´opez, J. M. & Rodr´ıguez, M. A. Structure of character-istic lyapunov vectors in spatiotemporal chaos. Phys. Rev. E , 016209 (2008).URL https://link.aps.org/doi/10.1103/PhysRevE.78.016209 .55. Hallerberg, S., Paz´o, D., L´opez, J. & Rodr´ıguez, M. Logarithmic bred vectorsin spatiotemporal chaos: Structure and growth. Physical Review E - Statistical,Nonlinear, and Soft Matter Physics , 1–8 (2010).

56. Karimi, A. & Paul, M. R. Extensive chaos in the Lorenz-96 model.

Chaos: AnInterdisciplinary Journal of Nonlinear Science , 043105 (2010).57. Carlu, M., Ginelli, F., Lucarini, V. & Politi, A. Lyapunov analysis of multiscaledynamics: the slow bundle of the two-scale lorenz 96 model. Nonlinear Pro-cesses in Geophysics , 73–89 (2019). URL https://npg.copernicus.org/articles/26/73/2019/ .58. Abramov, R. V. & Majda, A. J. New approximations and tests of linearﬂuctuation-response for chaotic nonlinear forced-dissipative dynamical systems. Journal of Nonlinear Science , 303–341 (2008). URL https://doi.org/10.1007/s00332-007-9011-9 .59. Lucarini, V. & Sarno, S. A statistical mechanical approach for the computationof the climatic response to general forcings. Nonlinear Processes in Geophysics , 7–28 (2011).60. Lucarini, V. Stochastic perturbations to dynamical systems: A response theoryapproach. Journal of Statistical Physics , 774–786 (2012). URL https://doi.org/10.1007/s10955-012-0422-0 .61. Gallavotti, G. & Lucarini, V. Equivalence of Non-equilibrium Ensembles andRepresentation of Friction in Turbulent Flows : The Lorenz 96 Model.

Journalof Statistical Physics , 1027–1065 (2014).62. Abramov, R. V. Leading order response of statistical averages of a dynamicalsystem to small stochastic perturbations.

Journal of Statistical Physics ,1483–1508 (2017). URL https://doi.org/10.1007/s10955-017-1721-2 .63. Vissio, Gabriele & Lucarini, Valerio. Mechanics and thermodynamics of a newminimal model of the atmosphere.

Eur. Phys. J. Plus , 807 (2020). URL https://doi.org/10.1140/epjp/s13360-020-00814-w .64. B´odai, T. & Lucarini, V. Rough basin boundaries in high dimension: Can weclassify them experimentally?

Chaos: An Interdisciplinary Journal of NonlinearScience , 103105 (2020). URL https://doi.org/10.1063/5.0002577 . https://doi.org/10.1063/5.0002577 .65. Pierrehumbert, R. T., Abbot, D., Voigt, A. & Koll, D. Climate of the neopro-terozoic. Ann. Rev, Earth Plan. Sci. , 417 (2011).66. Skufca, J. D., Yorke, J. A. & Eckhardt, B. Edge of chaos in a parallel shear ﬂow. Physical Review Letters , 174101 (2006).67. Hirota, M., Holmgren, M., Van Nes, E. H. & Scheﬀer, M. Global resilience oftropical forest and savanna to critical transitions. Science , 232–235 (2011).URL https://science.sciencemag.org/content/334/6053/232 . https://science.sciencemag.org/content/334/6053/232.full.pdf .68. Ciemer, C. et al. Higher resilience to climatic disturbances in tropical vegetationexposed to more variable rainfall.

Nature Geoscience , 174–179 (2019). URL https://doi.org/10.1038/s41561-019-0312-z .69. May, R. M. Thresholds and breakpoints in ecosystems with a multiplicity of stablestates. Nature , 471–477 (1977). URL https://doi.org/10.1038/269471a0 .70. Ester, M., Xu, X., peter Kriegel, H. & Sander, J. Density-based algorithm fordiscovering clusters in large spatial databases with noise.

Proceedings Of TheAcm Sigkdd International Conference On Knowledge Discovery And Data Mining pages , 226–231 (1996).71. Gelbrecht, M., Boers, N. & Kurths, J. Neural partial diﬀerential equations forchaotic systems. (under review) (2020).72. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization (2017). .73. Benettin, G., Galgani, L., Giorgilli, A. & Strelcyn, J.-M. Lyapunov characteristicexponents for smooth dynamical systems and for hamiltonian systems; a methodfor computing all of them. part 2: Numerical application.

Meccanica , 21–30 ill be inserted by the editor 17 (1980). URL https://doi.org/10.1007/BF02128237 .74. Datseris, G. Dynamicalsystems.jl: A julia software library for chaos and nonlineardynamics. Journal of Open Source Software , 598 (2018). URL https://doi.org/10.21105/joss.00598 .75. Ramachandran, P., Zoph, B. & Le, Q. V. Searching for activation functions(2017). .76. Rackauskas, C. & Nie, Q. Diﬀerentialequations.jl – a performant and feature-richecosystem for solving diﬀerential equations in julia. Journal of Open ResearchSoftware (2017).77. Innes, M. et al.

Fashionable modelling with ﬂux.

CoRR abs/1811.01457 (2018).URL https://arxiv.org/abs/1811.01457 .1811.01457