Good and bad predictions: Assessing and improving the replication of chaotic attractors by means of reservoir computing
GGood and bad predictions: Assessing and improving the replication of chaoticattractors by means of reservoir computing
Alexander Haluszczynski
1, 2, a) and Christoph R¨ath b) Ludwig-Maximilians-Universit¨at, Department of Physics, Schellingstraße 4, 80799 Munich,Germany risklab GmbH, Seidlstraße 24, 80335, Munich, Germany Deutsches Zentrum f¨ur Luft- und Raumfahrt, Institut f¨ur Materialphysik im Weltraum, M¨unchner Str. 20,82234 Wessling, Germany (Dated: 11 October 2019)
The prediction of complex nonlinear dynamical systems with the help of machine learning techniques hasbecome increasingly popular. In particular, reservoir computing turned out to be a very promising approachespecially for the reproduction of the long-term properties of a nonlinear system. Yet, a thorough statisticalanalysis of the forecast results is missing. Using the Lorenz and R¨ossler system we statistically analyzethe quality of prediction for different parametrizations - both the exact short-term prediction as well asthe reproduction of the long-term properties (the “climate”) of the system as estimated by the correlationdimension and largest Lyapunov exponent. We find that both short and longterm predictions vary significantlyamong the realizations. Thus special care must be taken in selecting the good predictions as predictions whichdeliver better short-term prediction also tend to better resemble the long-term climate of the system. Insteadof only using purely random Erd¨os-Renyi networks we also investigate the benefit of alternative networktopologies such as small world or scale-free networks and show which effect they have on the predictionquality. Our results suggest that the overall performance with respect to the reproduction of the climate ofboth the Lorenz and R¨ossler system is worst for scale-free networks. For the Lorenz system there seems tobe a slight benefit of using small world networks while for the R¨ossler system small world and Erd¨os-Renyinetworks performed equivalently well. In general the observation is that reservoir computing works for allnetwork topologies investigated here.
The application of machine learning techniquesto various fields in science and technology yieldsvery promising and fast advancing results. How-ever, the robustness of these methods is a criti-cal aspect that is often not adequately addressed.Particularly when trying to predict complex non-linear systems – here by using a recurrent neuralnetwork based approach called reservoir comput-ing – it is very useful to know how likely it isto end up with a good prediction and how differ-ent the results can be in terms of quality. In ourcontext a good prediction is achieved when thepredicted trajectory matches those of the actualsystem in the short-term while reproducing itsstatistical properties in the long-term. In orderto thoroughly investigate the prediction qualitywe run our analysis not only using a single predic-tion but on many realizations which are based onthe same parameters but different random num-ber seeds. As a result we find strong variabilityamong the quality of the predictions, indicatingthat robustness seems to be an issue and showthe effect of varying the network topology of thereservoir. a) Electronic mail: [email protected] b) Electronic mail: [email protected]
I. INTRODUCTION
In the recent years the use of machine learning (ML)techniques has not only become increasingly important inresearch but also popular in media, public perception andbusinesses. The euphoric application in all possible areas,however, carries the risk of misinterpreting the results ifdeeper methodological knowledge is lacking. This is rem-iniscent of the situation in the late 1980s and early 1990swhen chaos was a hot topic in the scientific community.In the absence of adequate statistics analysis, many sys-tems have been erroneously categorized as being chaoticon the basis of e.g. assessing the attractor dimensions bysingle measurements of short and noisy time series. Onlyafter Theiler et al. introduced the concept of surrogatedata the errors of the nonlinear measures for a given dataset could be assessed, and it turned out that many claimsof chaos had to be rejected. The lesson learned is that inthe absence of a proper (linear) model of the underlyingprocess credible results can only be obtained by applyingthorough statistical analyses involving averaging over alarge number of realizations of simulations or surrogates.In recent years the use of reservoir computing for quanti-fying and predicting the spatiotemporal dynamics of non-linear systems has attracted much attention. Many ofthe achievements – be it e.g. the cross-prediction of vari-ables in two-dimensional excitable media or the repro-duction of the spectrum of Lyapunov exponents in lowerdimensional (Lorenz or R¨ossler) and higher dimensional(Kuramoto-Sivashinsky) systems – are impressive and a r X i v : . [ phy s i c s . d a t a - a n ] O c t guide the way to a range of new applications of ML incomplex systems research. However, all results shownuntil now are based on a single or few realizations ofreservoir computing. It is thus so far impossible to judgethe robustness of the results on e.g. variations of theset of random variables specifying the reservoir. Here,we perform the first thorough statistical analysis of pre-dicting short and long-term behaviour of nonlinear timeseries by means of reservoir computing.The heart of reservoir computing is – as the name al-ready says – a so-called reservoir, which consists of D r nodes that are sparsely connected with each other. Thenodes are supposed to yield a proper “echo state” to agiven input, which is then transferred to the output layer.That’s why most types of reservoir networks are oftencalled “echo state networks (ESN)”. Beginning with thefirst introduction of ESNs by Maass and Jaeger thereservoir has typically been modelled as a random Erd¨osR´enyi network, where two nodes are connected withas certain probability p . However, the groundbreakingworks of Watts and Strogatz, Albert and Barabasi. and many others have shown that random networks arefar from being common in physics, biology, finance or so-ciology. Rather, more complex networks like scale-free,small world or intermediate forms of networks withintriguing new properties are most often found in realworld applications. Having this in mind it seems naturalto ask whether also for reservoir computing the topologyof the network has a significant influence on the predic-tion results. As a first step we use the three aforemen-tioned prototypical classes of networks as reservoir andcompare them regarding their ability of short and long-term prediction of time series.The paper is organized as follows: Section II introducesreservoir computing and the methods used in our study.In section III we present the main results obtained fromthe statistical analysis of the prediction results as wellas studying different reservoir topologies. Our summaryand the conclusions are given in section IV.
II. METHODSA. Lorenz and R¨ossler system
As in Pathak et al. and Lu et al. we use the Lorenzsystem as an example for replicating chaotic attractorsusing reservoir computing. It has been developed as asimplified model for atmospheric convection and exhibitschaos for certain parameter ranges. The standard Lorenzsystem, however, is symmetric in x and y with respectto the transformation x → − x and y → − y . This canbe an issue for example when trying to infer the x and y dimension from knowledge of the z dimension as outlinedin Lu et al.. ? In order to study a more general examplewe would like to modify the Lorenz system such that thissymmetry is broken. This can be achieved by adding the term x to the z -component which then reads:˙ x = σ ( y − x )˙ y = x ( ρ − z ) − y ˙ z = xy − βz + x . (1)We use the standard parameters σ = 10 , β = 8 / ρ = 28. This system is referred to as modified Lorenzsystem. The equations are solved using the 4th orderRunge-Kutta method with a time resolution ∆ t = 0 . which equations read˙ x = − y − z ˙ y = x + ay ˙ z = b + z ( x − c ) , (2)where we use the parameters a = 0 . b = 2 . c =4 .
0. Again this is a D = 3 dimensional chaotic systembut said to be less chaotic than the Lorenz attractor.Thus it is an interesting object to study in particularwhen it comes to the short-term prediction capabilitiesof reservoir computing. For the R¨ossler system the timeresolution is chosen to be ∆ t = 0 .
05 in order to ensurea sufficient manifestation of the attractor in the t train =5000 training time steps. B. Reservoir Computing
Reservoir computing is a machine learning techniquethat falls into the category of artificial recurrent neu-ral networks. The core of the model is a network called reservoir which — in contrast to feedforward neural net-works — exhibits loops. This means that past values feedback into the system and thus allow for dynamics.
Inorder to complete the task of predicting time series, theability to capture dynamics is essential. Moreover, reser-voir computing has a powerful advantage: While in othermethods the network itself is dynamical, here the train-ing is based only on the linear output layer and thereforeallows for large reservoir dimensionality while still beingcomputationally feasible. In our implementation we stick to the setup used byPathak et al. which works as follows. We have an in-put signal u ( t ) with dimension D that we would like tofeed into a reservoir A . The reservoir is chosen to bea sparse Erd¨os-Renyi random network with D r = 300nodes and p = 0 . Here p describes the probability ofconnecting two nodes which then leads to an unweightedaverage degree of d = 6. To obtain the weighted networkwe then replace all nonzero elements of the adjacencymatrix by independently and uniformly drawn numbersfrom [ − , u ( t )into the reservoir, an D r × D input function W in is re-quired. The entries of W in are chosen here to be uni-formly distributed random numbers within the range ofthe nonzero elements of the reservoir.A key property of the system are its D r × r ( t ) which represent the scalar states of the nodesof the reservoir network. We initially assume r i ( t ) = 0for all nodes and update the reservoir states in each timestep according to the equation r ( t + ∆ t ) = α r ( t ) + (1 − α ) tanh ( Ar ( t ) + W in u ( t )) . (3)As in Pathak et al. we set α = 0 and therefore do notmix the input function with past reservoir states. Now wehave a fully dynamical system where the network edgesare constant and the states of the nodes are time depen-dent.The next step is to map the reservoir states r ( t ) backto the D dimensional output v given by v ( t + ∆ t ) = W out ( r ( t + ∆ t ) , P ) . (4)Here we assume that W out depends linearly on P andreads W out ( r , P ) = Pr . This means that the outputdepends only on the reservoir states r ( t ) and the outputmatrix P which contains a large number of adjustableparameters - all its elements. Therefore, after acquiringa sufficient number of reservoir states r ( t ) we have tochoose P such that the output v of the reservoir is as closeas possible to the known real output v R . This process iscalled training. In general, the task is to find an outputmatrix P using Ridge regression, which minimizes (cid:88) − T ≤ t ≤ (cid:107) W out ( r ( t ) , P ) − v R ( t ) (cid:107) − β (cid:107) P (cid:107) , (5)where β denotes the regularization constant, that pre-vents from overfitting by penalizing large values of thefitting parameter. In this study we choose β = 0 .
01. Thenotation, (cid:107) P (cid:107) describes the sum of the square elementsof the matrix P . For solving this problem, we are ap-plying the matrix form of the Ridge regression whichleads to P = ( r T r + β ) − r T v R . (6)The notion r and v R without the time indexing denotesmatrices where the columns are the vectors r ( t ) and v R ( t )respectively in each time step. In our implementation wechose t train = 5000 training time steps while allowing fora washout or initialisation phase of t init = 100. Dur-ing this time we do not ”record” the reservoir states r ( t ),which means that only 4900 time steps are used for the re-gression. In order to ensure that 100 time steps washoutis sufficient we also ran the analysis with 1000 time stepswashout and found both results to be equivalent.After P is determined we can now switch to the pre-diction mode by giving the predicted state v ( t ) as inputinstead of the actual data u ( t ). The update equation forthe network states r ( t ) then reads r ( t + ∆ t ) = tanh ( Ar ( t ) + W in W out ( r ( t ) , P ))= tanh ( Ar ( t ) + W in Pr ( t )) . (7)This allows us to produce a predicted trajectory of anylength by just applying Eq. 4. C. Alternative network topologies
So far it has been standard practice to use purely ran-dom Erd¨os-Renyi networks for the reservoir A . However,there is a variety of conceivable network topologies thatmay have an influence on the results. In this study weinvestigate the use of Small World and Scale Free networks as an alternative.Small World networks are graphs where the distance– in terms of steps via other nodes – between any pairof nodes is small. At the same time the clustering coef-ficient is relatively high which means that neighbouringnodes tend to be connected. This so-called small worldproperty is observed in many real world networks suchas social networks, electric power grids, chemical reac-tion networks and neuronal networks. In order to havethe same average degree d = 6 as the random Erd¨os-Renyi networks we construct the Small World networksin the following way: First we connect each node with itssix nearest neighbours implying periodic boundary con-ditions. This is equivalent to arranging all nodes as aring. Then we loop over each edge x − y and rewire it to x − z with probability p = 0 .
2, where node z is randomlychosen.Scale Free networks are graphs where the distributionof the number of edges per node decays with a powerlaw tail. This is again a property which is observed inmany real world networks. For example, the above men-tioned electric power grid networks and neuronal net-works exhibit not only the small world property butare also scale free. Other examples include the worldwide web or networks of citations of scientific papers. Again, the network is constructed such that its averagedegree is d = 6. For this we use the scale free graphgenerator of the NetworkX package with parameters α = 0 . , β = 0 . , γ = 0 . , δ in = 0 . , δ out = 0. Notethat in this case the graph is directed while the other twonetwork topologies result in undirected graphs. Here, α sets the probability of adding a new node which isconnected to an already existing node which is chosenrandomly according to the in-degree distribution while γ does the same except that the node is chosen accordingto the out-degree distribution. In addition, β regulatesthe probability of creating an edge between two existingnodes where one is chosen according to the in-degree dis-tribution and the other node according to the out-degreedistribution. D. Measures of the System
In order to assess the quality of the prediction we aremainly using three different measures. The goal is toadequately address both the exact short-term predictionas well as the long-term reproduction of the statisticalproperties of the system - its so called climate.
FIG. 1. Left: X coordinate of two predicted (blue) trajectories of the Lorenz system plotted over n = 2000 time steps. Theresults are compared against the trajectory of the simulated actual Lorenz system (green). The upper plot shows a goodrealization where both trajectories are overlapping while the lower part shows a bad prediction. Right: Three dimensionalattractor for both cases. The spectral radius is ρ = 0 . A .The correlation dimension for is ν = 1 .
992 for the upper and ν = 0 .
007 for the lower realization while the largest Lyapunovexponents are λ = 0 .
851 (upper) and λ = 0 .
420 (lower).
1. Forecast Horizon
To quantify the quality and duration of the exact pre-diction of the trajectory we use a fairly simple measurewhich we call forecast horizon . Here we track the numberof time steps during which the predicted and the actualtrajectory are matching. As soon as one of the three coor-dinates exceeds certain deviation thresholds we considerthe trajectories as not matching anymore. Throughoutour study we use | v ( t ) − v R ( t ) | > δ (8)where the thresholds are δ = (5 , , T for the Lorenzsystem and δ = (2 . , . , T for the R¨ossler system. Thevalues are chosen this way due to the different ranges ofthe state variables in both systems. The aim is thatsmall fluctuations around the actual trajectory as well asminor detours do not exceed the threshold. Empiricallywe found that distances between the trajectories becomemuch larger than the threshold values as soon as short-term prediction collapses.
2. Correlation Dimension
One important characteristic of the long-term proper-ties of the system is its structural complexity. This can bequantified by calculating the correlation dimension whichmeasures the dimensionality of the space populated bythe trajectory. It is based on the correlation integral C ( r ) = lim N →∞ N N (cid:88) i,j =1 θ ( r − | x i − x j | )= (cid:90) r d r (cid:48) c ( r (cid:48) ) , (9)which describes the mean probability that two states inphase space are close two each other at different timesteps. The condition close to is met if the distance be-tween the two states is less than the threshold distance r . θ represents the Heaviside function while c ( r (cid:48) ) de-notes the standard correlation function. For self-similarstrange attractors the following power-law relationshipholds in a range of r : C ( r ) ∝ r ν . (10) FIG. 2. Forecast horizon scattered against the correlation dimension for each of the N = 1000 predictions of the Lorenz systemper spectral radius. Different spectral radii are differentiated by colours. Random Erd¨os-Renyi networks are used for thereservoir A . The correlation dimension is then measured by the scal-ing exponent ν . We use the Grassberger Procacciaalgorithm to calculate the correlation dimension of ourtrajectories. This approach is purely data driven andtherefore does not require any knowledge about the sys-tem.
3. Largest Lyapunov Exponent
Another aspect of the long-term behaviour is the tem-poral complexity of the system. When dealing withchaotic systems, looking at its Lyapunov exponents isan obvious choice. The Lyapunov exponents λ i describethe average rate of divergence of nearby states in phasespace and thus measure sensitivity to initial conditions.For each dimension in phase space there is one exponent.If the system exhibits at least one positive Lyapunov ex-ponent it is classified as chaotic while the magnitude ofthe exponent quantifies the time scale on which the sys-tem becomes unpredictable. Therefore it is sufficientfor our analysis to determine only the largest Lyapunovexponent λ d ( t ) = Ce λ t , (11)which makes the task computationally easier. Here, d ( t ) is the average distance or separation of the initially nearby states at time t and C is a constant that nor-malizes the initial separation. To calculate the largestLyapunov exponent we use the Rosenstein algorithm. III. RESULTS
Although machine learning techniques and reservoircomputing in particular have become increasingly pop-ular, a thorough statistical analysis of the forecast re-sults is yet missing. Therefore we found it insightful tonot only perform one single prediction where prediction means forecasting the trajectory for t prediction = 10000time steps. As there are random numbers involved inthe construction of the reservoir A as well as the inputfunction W in we can run the prediction with N = 1000different random number seeds while keeping all otherparameters of the network constant. Therefore, for dif-ferent seeds both A and W in will vary. This allows usto gain a statistical view on the quality of the predictioninstead of analysing only single realizations. In additionto the parameters mentioned in Section II there is onemore parameter that we can tune. This is the spectralradius ρ of the adjacency matrix of the reservoir A whichis defined as its largest absolute eigenvalue ρ ( A ) = max {| λ | , ..., | λ D r |} (12) FIG. 3. Forecast horizon scattered against the correlation dimension for each of the N = 1000 predictions of the R¨osslersystem per spectral radius. Different spectral radii are differentiated by colours. Random Erd¨os-Renyi networks are used forthe reservoir A . and reflects some kind of average degree of the network.We can adjust the spectral radius by A ∗ = A ρ ( A ) ρ ∗ , (13)where ρ ∗ is the desired spectral radius. Note that λ i here denote the eigenvalues of the adjacency matrix ofthe reservoir A - not to be confused with the Lyapunovexponent in Eq. 11 which is commonly called λ as well.Other studies showed results for particular values of ρ such as Pathak et al. e.g. claiming that the predictionusing a spectral radius of ρ = 1 . ρ = 1 .
45 does not. To possibly reproduce these re-sults and to assess the best ranges for the spectral radiuswe ran the reservoir computing with N = 1000 dif-ferent random number seeds for each spectral radius ρ ∗ i ∈{ . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . , . } while all other parameters remainconstant. We simulate one trajectory which is usedfor the training of the network as well as for the com-parison of the predicted trajectory with the actual one.Furthermore, we simulated additional 1000 trajectoriesof the actual Lorenz and R¨ossler system with differentrandomly chosen initial conditions in order to investigatethe statistical error when calculating the correlationdimension and the largest Lyapunov exponent from thetime series with limited length. Table I shows the means and standard deviations forboth measures indicating that the error is reasonablysmall. As we use only .
04 forthe correlation dimension and 0 .
89 for the largest Lya-punov exponent of the Lorenz system. For the R¨osslersystem the results for the correlation dimension are sig-nificantly below the desired value of around 2 because theGrassberger Procaccia algorithm is slower converging ascompared to the Lorenz system when using only 10000data points. This is also reflected in the higher stan-dard deviation σ of the correlation dimension. However, Lorenz Mean σ Correlation Dimension 2.026 0.014Largest Lyapunov Exponent 0.878 0.029R¨ossler Mean σ Correlation Dimension 1.713 0.037Largest Lyapunov Exponent 0.107 0.011TABLE I. Mean and standard deviation σ of the two mea-sures calculated from 1000 simulated trajectories with differ-ent initial conditions. FIG. 4. Largest Lyapunov exponent scattered against the correlation dimension for each of the N = 1000 predictions perspectral radius. Results are shown for the Lorenz (left) and R¨ossler system (right). Different spectral radii are differentiatedby colours. The red object represents the five sigma error ellipse of both measures calculated based on 1000 simulated truetrajectories.Random Erd¨os-Renyi networks are used for the reservoir A . we verified through increasing the number of data pointsthat our calculations converge to the expected results.Figure 1 shows two examples of predicted trajectoriesusing reservoir computing in the setup described abovewith a spectral radius of ρ = 0 .
3. Although we ran theprediction over n = 10000 time steps we plotted the re-sults for n = 2000 time steps for the sake of clarity. Onthe left side of the plot one can see the trajectory ofthe X coordinate for the predicted system using reser-voir computing (blue) and the simulated system basedon the Lorenz differential equations (green). In the up-per plot both trajectories are overlapping for around 200time steps and then deviate while still showing the char-acteristic pattern of the Lorenz system. However, in thelower plot both trajectories already separate after lessthan 100 time steps leading to a pattern which lookscompletely different.This is remarkable given the fact that the setup forboth cases is identical except for a different random num-ber seed which results in different realizations of the inputfunction W in and the reservoir A . Since looking solelyat the X coordinate yields insufficient information aboutthe overall prediction quality it is meaningful to investi-gate the whole attractor as plotted on the right side ofFig. 1. Here we can see that the Lorenz attractor is re-constructed very well by the upper prediction while the lower prediction has nothing to do with the butterfly-shaped Lorenz attractor. Instead, the trajectory quicklyruns into a fixed point after detaching and partly formingsome kind of mirrored Lorenz attractor. The differencein prediction quality is not only reflected in the ability tomatch the original trajectory in the short-term but alsowith respect to the correlation dimension and the largestLyapunov exponent. While in the upper case the result-ing values of ν = 1 .
992 and λ = 0 .
851 are well within theexpectations for the Lorenz system, the lower realizationcompletely misses to reconstruct the long-term climate.Immediately the question arises if this observation is anexception or whether the prediction quality is not robustwith respect to different random number seeds. There-fore we systematically investigated this effect by runningthe same setup with N = 1000 different random numberseeds. Since it would be laborious to visually inspect thetrajectories and attractors of all realizations we rely onthe measures introduced in Section II in order to assessif a prediction is good or bad.Figure 2 shows a scatter plot where the forecast horizonis plotted against the correlation dimension for all realiza-tions. The colours represent different spectral radii andfor each spectral radius there is one point for each of the N = 1000 random number seeds. In order to make theresults better readable we divided the plot into four sec-tions where we grouped the results for four to five differ-ent spectral radii. The first thing we can observe is thatthe prediction of the Lorenz system using reservoir com-puting works better for smaller spectral radii. But moreimportantly one can also see that the prediction qualitysignificantly varies even when using the same spectral ra-dius - as already indicated by Fig. 1. This becomes notonly evident by the results for the forecast horizon but es-pecially when considering the correlation dimension. Itsvalues quickly spread when increasing the spectral ra-dius indicating that the resulting predicted trajectoriesdo not resemble the long-term climate of the system wellin many cases. Figure 3 shows the same results for theR¨ossler system. In contrast to the Lorenz system not thesmallest spectral radius but a choice of ρ = 0 . ρ = 0 . σ = 5 error ellipse. This indicates strong variabilitygiven that σ = 5 is a large error. For the larger spectralradii shown in the bottom left plot - including the values ρ = 1 . ρ = 1 .
45 as used in Ref. - there is not asingle point within the ellipse. This indicates that theprediction completely fails to reproduce the long-termclimate for those cases. The results for the R¨ossler sys-tem on the right side of the plot give a similar picture.However, even for the best working spectral radius of ρ = 0 . σ = 5 error el-lipse. In addition, one can also see from the upper plotsof Fig. 4 that a good reproduction of the correlation di-mension also tends to coincide with a better reproductionof the largest Lyapunov exponent although this effect is FIG. 5. Top plot: Cumulative distribution of χ for the bestworking spectral radius ρ = 0 . ρ = 0 . not very significant.So far we only looked at the results based on the ran-dom Erd¨os-Renyi networks. In order to compare the per-formance of the three different network topologies on astatistical level we perform a χ analysis of the long-termclimate prediction. This means that we calculate χ ( i, ρ ) = (cid:88) j =1 (cid:20) X j ( i, ρ ) − (cid:104) X j (cid:105) σ X j (cid:21) , (14)where i is the i − th random number seed, ρ indexesthe spectral radius. The sum goes over the correlationdimension ( j = 1) and the largest Lyapunov exponent( j = 2). (cid:104) X j (cid:105) represents the average value derived from1000 simulated actual Lorenz trajectories - as shown inTable I while σ X j is the corresponding standard devia-tion. Therefore the resulting χ describes how strong thepredicted results deviate from the actual values weighedby the errors of the actual values.Figure 5 shows the cumulative distribution of the χ for the three network topologies. The top plot containsthe results for the Lorenz system using the spectral radius ρ = 0 . χ for 0 and 10. We cansee that Scale Free networks (red line) tend to work worstwhile Small World networks (blue) line slightly outper-forms the Erd¨os-Renyi networks (yellow line). However,it becomes clear that the method of reservoir computingworks for all networks topologies tested here. In con-trast, there is a difference between Scale Free networksand the other topologies in the case of the R¨ossler sys-tem (bottom plot). Here, we calculated the cumulativedistribution for values of χ between 0 and 500 basedon ρ = 0 .
4. The reason is that even for the best work-ing spectral radius the variability is significantly higheras compared to the Lorenz system which leads to highervalues of χ with only very few data points in the 0 to 10range. It is interesting to note that the performance ofScale Free networks is now strongly below the other twonetworks while Erd¨os-Renyi networks are slightly lead-ing overall. Therefore, in contrast to the Lorenz systemnetwork topology seems to matter in this case. IV. CONCLUSIONS AND OUTLOOK
In this paper investigated the prediction of chaotic at-tractors by using reservoir computing from a statisticalperspective. Instead of only predicting one trajectory wesimulated 1000 realizations each – where each realiza-tion corresponds to another random number seed – for anumber of different spectral radii ρ . Analyzing both theLorenz and R¨ossler system we found that the ability toexactly forecast the correct trajectory as well as the re-construction of the long-term climate measured by corre-lation dimension and largest Lyapunov exponent stronglyvaries. Even for the exact same parameter setup therecan be very good results that match the true trajectoryfor a large number of time steps and nicely reconstructthe attractor. On the other hand there can be results thatcompletely fail in either one or both dimensions and arenot reflecting the desired properties of the system. Theresults suggest that smaller spectral radii than typicallyused work better for both systems while in case of theLorenz system even the smallest spectral radius ρ = 0 . σ = 5 error ellipse for the bestworking spectral radius of ρ = 0 . χ analysis. A tentative explanation for the lowerperformance of Scale Free networks could be the follow- ing: According to Singh, the more capable a brain orneuronal system is, the less scaling is present in its de-gree distribution. Overall it is important to point outthat despite the differences presented here, the generalmethodology of reservoir computing works for differentnetwork topologies. However, even after trying differentparameters and alternatively a setup where also the in-put states are going into the regression as described byLukosevicius and Jaeger, the variability can always beobserved.Once the network is trained, the prediction is deter-ministic and depending strongly on the weights and onlyweakly on the topology. It should thus be possible to as-sociate good and bad predictions with differential proper-ties of the respective realization of the reservoir in a sys-tematic way. First attempts in this direction for a reser-voir with unweighted edges have recently been reportedin Carroll and Pecora. Once based on those insightsmore stable predictions are possible, a more precise anal-ysis of the attractor properties e.g. with the spectrumof Lyapunov exponents could be useful and necessary.Furthermore, the role of the network size is also an in-teresting aspect. Current and future work is dedicatedto the investigation of these questions - not the least be-cause the answers to them will shed new light on thecomplexity of the underlying dynamical system.
ACKNOWLEDGEMENTS
We wish to acknowledge useful discussions and com-ments from Jonas Aumeier, Ingo Laut and MierkSchwabe. J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. DoyneFarmer, “Testing for nonlinearity in time series: the methodof surrogate data,” Physica D Nonlinear Phenomena , 77–94(1992). Z. Lu, J. Pathak, B. Hunt, M. Girvan, R. Brockett, and E. Ott,“Reservoir observers: Model-free inference of unmeasured vari-ables in chaotic systems,” Chaos , 041102 (2017). J. Pathak, Z. Lu, B. R. Hunt, M. Girvan, and E. Ott, “Usingmachine learning to replicate chaotic attractors and calculate lya-punov exponents from data,” Chaos: An Interdisciplinary Jour-nal of Nonlinear Science , 121102 (2017). J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott, “Model-Free Prediction of Large Spatiotemporally Chaotic Systems fromData: A Reservoir Computing Approach,” Physical Review Let-ters , 024102 (2018). J. Pathak, A. Wikner, R. Fussell, S. Chandra, B. R. Hunt,M. Girvan, and E. Ott, “Hybrid forecasting of chaotic processes:Using machine learning in conjunction with a knowledge-basedmodel,” Chaos , 041101 (2018), arXiv:1803.04779 [cs.LG]. R. S. Zimmermann and U. Parlitz, “Observing spatio-temporaldynamics of excitable media using reservoir computing,” Chaos , 043118 (2018). T. L. Carroll, “Using reservoir computers to distinguish chaoticsignals,” Physical Review E , 052209 (2018), arXiv:1810.04574[nlin.AO]. Z. Lu, B. R. Hunt, and E. Ott, “Attractor reconstruction bymachine learning,” Chaos: An Interdisciplinary Journal of Non-linear Science , 061104 (2018). P. Antonik, M. Gulina, J. Pauwels, and S. Massar, “Using areservoir computer to learn chaotic attractors, with applications to chaos synchronization and cryptography,” Physical Review E , 012215 (2018). W. Maass, T. Natschlaeger, and H. Markram, “Real-time com-puting without stable states: A new framework for neural compu-tation based on perturbations,” Neural Computation , 2531–2560 (2002). H. Jaeger and H. Haas, “Harnessing Nonlinearity: PredictingChaotic Systems and Saving Energy in Wireless Communica-tion,” Science , 78–80 (2004). D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-world’networks,” nature , 440 (1998). R. Albert and A.-L. Barab´asi, “Statistical mechanics of com-plex networks,” Reviews of Modern Physics , 47–97 (2002),arXiv:cond-mat/0106096 [cond-mat.stat-mech]. A. D. Broido and A. Clauset, “Scale-free networks are rare,”arXiv e-prints , arXiv:1801.03400 (2018), arXiv:1801.03400[physics.soc-ph]. M. Gerlach and E. G. Altmann, “Testing statistical laws incomplex systems,” arXiv e-prints , arXiv:1904.11624 (2019),arXiv:1904.11624 [physics.data-an]. H. Cui, X. Liu, and L. Li, “The architecture of dynamic reservoirin the echo state network,” Chaos: An Interdisciplinary Journalof Nonlinear Science , 033127 (2012). E. N. Lorenz, “Deterministic nonperiodic flow,” Journal of theatmospheric sciences , 130–141 (1963). O. E. R¨ossler, “An equation for continuous chaos,” Physics Let-ters A , 397–398 (1976). M. Lukoˇseviˇcius and H. Jaeger, “Reservoir computing approachesto recurrent neural network training,” Computer Science Review , 127–149 (2009). R. Gen¸cay and T. Liu, “Nonlinear modelling and prediction withfeedforward and recurrent networks,” Physica D: Nonlinear Phe-nomena , 119–134 (1997). P. Erdos, “On random graphs,” Publicationes mathematicae ,290–297 (1959). A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased es-timation for nonorthogonal problems,” Technometrics , 55–67(1970). A.-L. Barab´asi and E. Bonabeau, “Scale-free networks,” Scien-tific american , 60–69 (2003). L. Amaral, A. Scala, M. Barth´el´emy, and H. Stanley, “Classesof small-world networks,” PNAS; Proceedings of the NationalAcademy of Sciences , 11149–11152 (2000). A. Hagberg, P. Swart, and D. S Chult, “Exploring network struc-ture, dynamics, and function using networkx,” Tech. Rep. (LosAlamos National Lab.(LANL), Los Alamos, NM (United States),2008). B. Bollob´as, C. Borgs, J. Chayes, and O. Riordan, “Directedscale-free graphs,” in
Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms (Society for Industrialand Applied Mathematics, 2003) pp. 132–139. P. Grassberger and I. Procaccia, “Measuring the strangeness ofstrange attractors,” Physica D: Nonlinear Phenomena , 189–208(1983). P. Grassberger, “Generalized dimensions of strange attractors,”Physics Letters A , 227–230 (1983). A. Wolf, J. B. Swift, H. L. Swinney, and J. A. Vastano, “De-termining lyapunov exponents from a time series,” Physica D:Nonlinear Phenomena , 285–317 (1985). R. Shaw, “Strange attractors, chaotic behavior, and informationflow,” Zeitschrift f¨ur Naturforschung A , 80–112 (1981). M. T. Rosenstein, J. J. Collins, and C. J. De Luca, “A practicalmethod for calculating largest lyapunov exponents from smalldata sets,” Physica D: Nonlinear Phenomena , 117–134 (1993). S. S. Singh, B. Khundrakpam, A. T. Reid, J. D. Lewis, A. C.Evans, R. Ishrat, B. I. Sharma, and R. B. Singh, “Scaling intopological properties of brain networks,” Scientific reports ,24926 (2016).33