[PDF] Reservoir Computers Modal Decomposition and Optimization

Abstract

Full PDF

RReservoir Computers Modal Decomposition and Optimization

Chad Nathe, Enrico Del Frate, Thomas Carroll, Louis Pecora, Afroza Shirin, andFrancesco Sorrentino a)1) Mechanical Engineering Department, University of New Mexico, Albuquerque,NM 87131 U.S. Naval Research Laboratory, Washington, DC 20375,USA

The topology of a network associated with a reservoir computer is often taken sothat the connectivity and the weights are chosen randomly. Optimization is hardlyconsidered as the parameter space is typically too large. Here we investigate thisproblem for a class of reservoir computers for which we obtain a decomposition ofthe reservoir dynamics into modes, which can be computed independently of oneanother. Each mode depends on an eigenvalue of the network adjacency matrix. Wethen take a parametric approach in which the eigenvalues are parameters that can beappropriately designed and optimized. In addition, we introduce the application ofa time shift to each individual mode. We show that manipulations of the individualmodes, either in terms of the eigenvalues or the time shifts, can lead to dramaticreductions in the training error. a) Electronic mail: [email protected] a r X i v : . [ c ond - m a t . d i s - nn ] J a n reservoir computer (RC) is a complex nonlinear dynamical system that is used for pro-cessing and analyzing empirical data, see e.g. , modeling of complex dynamical systems ,speech recognition , learning of context free and context sensitive languages , the recon-struction and prediction of chaotic attractors , image recognition , and control of roboticsystems . A typical RC consists of a set of nodes coupled together to form a network.Each node of the RC evolves in time in response to an input signal that is fed into thereservoir. An output signal is then generated from the time evolutions of the RC nodes. Ina RC, the output connections (those that connect the RC nodes to the output) are trainedto produce a best ﬁt between the output signal and a training signal related to the originalinput signal. On the other hand, the connections between the nodes of the reservoir are con-stant parameters of the system. As a result, RCs are easier to analyze than other machinelearning tools for which all the connections are typically trained.Reference studied the eﬀects of the network topology on the performance of a reser-voir computer and focused on the sparsity of the connections and the presence of networksymmetries. Recent work has analyzed linear reservoir computers and pointed out aconnection with the theory of dynamic mode decomposition . A common assumption isthat nonlinear reservoirs can outperform linear reservoirs . Optimizing the hyperparame-ters of a reservoir computer is often done, but optimizing the connections between the RCnodes is more diﬃcult due to the high-dimensional parameter space. The standard recipe isto use random matrices. Our analysis that follows shows that under certain conditions, thereservoir equations can be rewritten in an equivalent form which corresponds to individualuncoupled nodes, which are easier to optimize.We consider a reservoir computer modeled by the following dynamical equations in con-tinuous time, ˙ r i ( t ) = F (cid:16) r i ( t ) , N (cid:88) j =1 A ij r j ( t ) , s ( t ) , s ( t ) , ..., s l ( t ) (cid:17) , i = 1 , ..., N, (1)where r i is the scalar state of node i of the reservoir, N is the number of nodes, the adjacencymatrix A = { A ij } describes the connectivity between the network nodes, and s , s , ...s l are input signals to the reservoir. These can represent diﬀerent data being fed into thereservoir, such as in a weather prediction application, a rainfall time series, a wind timeseries, a humidity time series, and so on. These input signals are in general a function of2n underlying process to which the reservoir is applied. The training signal g ( t ) is anothersignal from the same underlying process which is related to the input signals through acomplex relation (e.g., in the weather prediction application, a temperature time series.)The function F : R l +2 → R determines the particular dynamics of the reservoir nodes.Next we focus on a speciﬁc class of reservoirs, which possess the property of universality ,described by the following set of equations,˙ r i ( t ) = α (cid:16) (cid:15)s ( t ) (cid:17) r i ( t ) + N (cid:88) j =1 A ij r j ( t ) + w i s ( t ) , i = 1 , ..., N, (2)where l = 2 and s and s are the two input signals. In what follows, we will often referto Eq. (2) as that of a linear reservoir computer. The particular process to generate theadjacency matrix A is described in the Supplementary Information. In the rest of this paperwe set N = 100 and we also assume for simplicity that A is symmetric, A = A T . Thecoeﬃcients w i represent the weight by which the input signal is multiplied in the dynamicsof node i . These are also typically randomly chosen .The underlying process we want to model may evolve in time based on a set of determinis-tic (chaotic) equations, such as the equations of the Lorenz chaotic attractor, in the variables x L ( t ) , y L ( t ) , z L ( t ) (see Supplementary Information.) One task we can give the reservoir isto reconstruct the z L ( t ) time evolution (training signal) from knowledge of either x L ( t ) or y L ( t ) or both (input signals). We will also consider other tasks for which the time series aregenerated by other chaotic or periodic systems, such as the Hindmarsh-Rose system or theDuﬃng system (see Supplementary Information.)In order to examine the accuracy of the reservoir computer relative to the dynamicalsystem it is modeling, we must have a way to quantify how well the reservoir is able toreproduce the training signal g ( t ) from knowledge of the input signals s ( t ) and s ( t ). Afterintegrating the reservoir equations for a long enough time, its dynamics can be described bythe T × ( N + 1) matrix, Ω =  r (1) r (1) ... r N (1) 1 r (1) r (2) ... r N (2) 1... ... ... ... ... r ( T ) r ( T ) ... r N ( T ) 1  (3)Here, N is the number of nodes in the reservoir computer and T is the amount of time-stepstaken. We add a column whose entries are all ones to account for any constant oﬀset. The3t h = [ h (1) , h (2) , ..., h ( T )] to the training signal g = [ g (1) , g (2) , ..., g ( T )] is computed as h ( t ) = (cid:80) Ni =1 κ i r i ( t ) + κ N +1 (or, equivalently, in vectorial form h = Ω κκκ ), where the vector κκκ = [ κ , κ , ..., κ N +1 ], which contains a set of unknown coeﬃcients to be determined. We set κκκ = Ω † g , (4)where with the symbol Ω † we indicate the pseudo-inverse of the matrix Ω.From this, we can compute the training error ∆ = (cid:104) Ω κκκ − g (cid:105)(cid:104) g (cid:105) , where the notation (cid:104)(cid:105) isdeﬁned (cid:104) X (cid:105) = (cid:113) T (cid:80) Ti =1 ( X ( i ) − µ ) for X any T -dimensional vector and µ = T (cid:80) Ti =1 X ( i ).Feeding the reservoir with more than one input signal or even driving the reservoir indiﬀerent ways with the same input signal can lead to improved performance. To see this,we perform numerical simulations in order to compare the single input with the l = 2 inputcase of Eqs. (2). In Fig. 1 we plot the training error ∆ vs the coeﬃcient (cid:15) seen in Eq. (2) (thecase (cid:15) = 0 corresponds to no eﬀect of s ( t ) on the reservoir dynamics.) In this ﬁgure we dealwith three diﬀerent tasks, i.e., reconstructing g ( t ) = z ( t ) from s ( t ) = s ( t ) = x ( t ) for theLorenz chaotic system (A) and the Hindmarsh-Rose chaotic system (B), and reconstructing g ( t ) = y ( t ) from s ( t ) = s ( t ) = x ( t ) for the Duﬃng periodic system (C).We see from these plots that as we increase (cid:15) the training error is ﬁrst reduced andthen it increases, indicating the advantage of picking speciﬁc values of (cid:15) . We have observedthis type of relationship between the training error and (cid:15) in a large variety of situations,including discrete time reservoirs (see Supplementary Information). Our results show thattypically two input reservoir computers (2) are advantageous compared to the single inputcase ( (cid:15) = 0). . . . . . (cid:15) ∆ A . . . (cid:15) ∆ B . . . . . (cid:15) ∆ C FIG. 1. Plots of the training error vs. (cid:15) . Here we plot the training error ∆ for the following tasksin continuous time: Lorenz attractor (A), Hindmarsh-Rose attractor (B), and the Duﬃng attractor(C). In A and B s ( t ) = s ( t ) = x ( t ) and g ( t ) = z ( t ). In C s ( t ) = s ( t ) = x ( t ) and g ( t ) = y ( t ).

4e also considered the case that the reservoir is only driven by s ( t ) and not by s ( t ),i.e., for which the coeﬃcients w i = 0. However, we found that for this case the training errorwas always close to 1, which seems to indicate the advantage of the reservoir (2) is limitedto the case that both s ( t ) and s ( t ) are used.Next we obtain a modal decomposition for the reservoir dynamics (2). Our derivationsthat follow are obtained for continuous time, but analogous derivations can be obtained fordiscrete time (see Supplementary Information.) We ﬁrst rewrite Eq. (2) in vector form, ˙r ( t ) = p ( t ) I r ( t ) + A r ( t ) + w s ( t ) , (5)where I is the identity matrix, p ( t ) = α (1 + (cid:15)s ( t )) and, r ( t ) =  r ( t ) r ( t )... r N ( t )  w =  w w ... w N  (6)As we have set the adjacency matrix A to be symmetric, it is also diagonalizable, A = V Λ V T where V is the matrix whose columns are the eigenvectors of A and Λ is a diagonalmatrix with the real eigenvalues of the matrix A on the main diagonal. We pre-multiply Eq.(5) by V T and after setting q ( t ) = V T r ( t ) and c ( t ) = V T w ( t ), we obtain, ˙q ( t ) = p ( t ) q ( t ) + Λ q ( t ) + c s ( t ) , (7)which breaks up into a set of N independent modes ,˙ q i ( t ) = p ( t ) q i ( t ) + λ i q i ( t ) + c i s ( t ) , i = 1 , ..., N, (8)with solution, q i ( t ) = exp (cid:16)(cid:90) t ( λ i + p ( τ )) dτ (cid:17) q i (0) + c i (cid:90) t exp (cid:16)(cid:90) tτ ( λ i + p ( ρ )) dρ (cid:17) s ( τ ) dτ, (9)where the ﬁrst term on the left hand side of (9) is the free evolution which by the assumptionof stability goes to zero for large t . For large t , each mode q i diﬀers from the others throughthe coeﬃcient λ i , while the particular modal amplitude is given by c i . We set h ( t ) = (cid:88) j κ (cid:48) j q j ( t ) + κ N +1 , (10)5here κ (cid:48) j = N (cid:88) i =1 V ij κ i , (11)i.e., h ( t ) can be written as a linear combination of the modes q j ( t ). It is important toemphasize that for large enough t the particular value of c i becomes irrelevant in order todetermine the best ﬁt to the training signal, as in Eq. (10) each mode is ‘rescaled’ by aparticular coeﬃcient κ (cid:48) i . Another key observation is that the magnitude of (cid:82) t exp (cid:16)(cid:82) tτ ( λ i + p ( ρ )) dρ (cid:17) s ( τ ) dτ will depend on the value of λ i . In practice, it may be convenient to properlyrescale each mode to be˜ q i ( t ) = ( λ i + ¯ p ) − (cid:90) t exp (cid:16)(cid:90) tτ ( λ i + p ( ρ )) dρ (cid:17) s ( τ ) dτ, (12)where ¯ p is a time-average value for p .We can now formulate the problem of ﬁnding the best ﬁt h ( t ) to the training signal interms of the modes q ( t ) , q ( t ) , ..., q N ( t ). To this end, we introduce the T × ( N + 1) matrix,Ω (cid:48) =  q (1) q (1) ... q N (1) 1 q (2) q (2) ... q N (2) 1... ... ... ... ... q ( T ) q ( T ) ... q N ( T ) 1  (13)and we deﬁne the best ﬁt to the training signal, h (cid:48) = Ω (cid:48) κκκ (cid:48) , in the vector κκκ (cid:48) = [ κ (cid:48) , κ (cid:48) , ..., κ (cid:48) N +1 ]contains a set of unknown coeﬃcients to be determined. The best ﬁt is obtained by setting κκκ (cid:48) = Ω (cid:48)† g .One best ﬁt is equal to the other one and viceversa. To see this, assume to ﬁrst computethe coeﬃcients κ (cid:48) . To this set of coeﬃcients corresponds a set of coeﬃcients κ , which canbe obtained by solving Eq. (11).An illustration of the reservoir computer and of its modal decomposition is shown inFig. 2. Figure 3 is a plot of the individual modes for the case of the Lorenz system as theparameter λ is varied.There are several advantages of the modal decomposition. One is that in case of instabilityof one or a few modes, these can be ‘removed’ without the instability aﬀecting the remainingmodes. Another advantage is the possibility to individually manipulate each q i ( t ) before itis used to generate the best ﬁt to the training signal. One simple such manipulation that6e will study in what follows is application of a time shift q i ( t ) → q i ( t + τ i ). As we will see,this simple modiﬁcation will lead to dramatic reductions in the training error. FIG. 2. On the left, we have a reservoir computer with a network of connected nodes and on theright a modal decomposition of the reservoir dynamics, where each mode is described by a node .FIG. 3. Mode amplitude plot from a continuous time reservoir (8). The input signal is the x L state from the Lorenz system. We set α = −

15 and (cid:15) = 0 . Our analysis shows that the eigenvalues λ i diﬀerentiate individual modes. We now con-sider a parametric approach in which the eigenvalues λ i are treated as parameters of Eq.(8). We ﬁrst choose an interval [ a, b ] and then select the λ i ∈ [ a, b ]. A trivial choice is topick the eigenvalues to be uniformly randomly distributed in the interval.We are now interested in how the coeﬃcients κ (cid:48) (i.e., the mode weights) depend on theparticular choice of the eigenvalues λ . We observe that typically the curve κ (cid:48) ( λ ) is robust to7i) the particular choice of N , (ii) the particular sampling of the eigenvalues from the interval[ a, b ], and (iii) the particular choice of the three time series s ( t ) , s ( t ) , g ( t ) from a given (thesame) attractor. Property (iii) indicates robustness of the vector κκκ (cid:48) with respect to variationsin the initial condition, which predicts a low testing error (see Supplementary Information.)As an example of this, in Fig. 4 we plot the resulting κ (cid:48) vs the eigenvalues λ , for thecase of a continuous-time reservoir applied to the Lorenz chaotic system. In plot (A) weused linearly spaced eigenvalues and in plot (B) we use randomly spaced eigenvalues from auniform distribution. In both plots we set N = 100 nodes and use the interval [ − , − − ].Diﬀerent curves in the same plot are for several choices of the initial conditions on the Lorenzattractor. In order to ensure that the time traces s ( t ) , s ( t ) , g ( t ) are from the attractor, wetake the last point from the Lorenz attractor for the previous iteration as the initial point forthe new iteration. In all cases, we see that certain eigenvalues are associated with larger κ (cid:48) values in modulus (both positive and negative.) Other eigenvalues instead have associated κ (cid:48) close to zero, indicating that these do not play a signiﬁcant role in the mapping betweenthe input signals and the output (training) signal. We also see that the plots in Fig. 4are consistent over diﬀerent iterations of the same task, indicating that the preference forcertain eigenvalues is robust with respect to the particular choice of the input and trainingtime series from the same attractor. The ﬁgure also shows that the functional relationshipbetween λ and κ (cid:48) is robust with respect to variations in the number of nodes N . We thusenvision an advantage of picking the eigenvalues λ ’s about the maxima and minima of the κ (cid:48) vs. λ plot, which provides the motivation for the optimization study presented next.8 IG. 4. Lorenz system. Continuous time RC, N = 100 nodes. We use x L ( t ) as the input signaland z L ( t ) as the training signal. Linearly spaced eigenvalues are used in plot A and uniformlyrandomly distributed eigenvalues are used in plot B. For both plots, eigenvalues are from theinterval [ − , − − ].

00 600 800 1 ,

000 1 ,

200 1 , − − T ∆ A

600 800 1 ,

000 1 ,

200 1 , − − T ∆ B Non-linear LinearLinear w/ random time shifts Linear w/ optimized time shiftsLinear w/ optimized time shifts and eigenvalues

FIG. 5. We plot the training error ∆ vs the length of the time series T . (A) The HR system.We set α = − (cid:15) = 20. For the linear RC p = p = 0, c i = 1 and for the non-linear RC p = − p = − c i = 400. (B) The Lorenz system. We set α = − (cid:15) = 0 .

5. For the linearRC p = p = 0, c i = 1 and for the non-linear RC p = − p = − c i = 30. In each plot wecompare the following cases: nonlinear, linear, linear with application of random time shifts tothe individual modes, linear with optimized time shifts of the individual modes, and linear withoptimized time shifts and eigenvalues of the individual modes. The time shifts ( τ i ) are taken fromthe interval [ − . , .

5] and the eigenvalues ( λ i ) are taken from the interval [ − , We now consider a comparison of linear and nonlinear reservoirs, described by the fol-lowing equations ,˙ q i ( t ) = p ( t ) q i ( t ) + p q i ( t ) + p q i ( t ) + λ i q i ( t ) + c i s ( t ) , i = 1 , ..., N. (14)where p = − p = − . Note that Eq.(14) has the same parametric form in λ i as (8) for direct comparison. When the parameters c i are small, Eq. (14) is well approximated by the linear reservoir (8) ( p = p = 0.) In Fig.5 we plot the training error ∆ vs the length of the time series T . Plot A is for the HR systemand plot B is for the Lorenz system. In each plot, we compare the following cases: nonlinear(Eq. (14)), linear (Eq. (8)), linear with application of random time shifts to the individual10odes, linear with optimized time shifts of the individual modes, and linear with optimizedtime shifts and eigenvalues of the individual modes. Optimization of the time-shifts andof the eigenvalues was obtained via simulated annealing (see Supplementary Information.)From Fig. 5, we see that the nonlinear reservoir performs better than the linear one, which isexpected . However, this relation is inverted when manipulations of the individual modesof the linear reservoir are introduced, either in terms of the eigenvalues or the time shifts,for which the training error is much lower (also in the case of (cid:15) = 0, see SupplementaryInformation.) A considerable reduction in the training error is observed even when thetime shifts applied to the individual modes are randomly chosen. In the SupplementaryInformation we show that a linear reservoir with random time-shifts has both lower trainingerror and testing error than a nonlinear reservoir.In this paper we have studied a special class of reservoir computers for which a modaldecomposition is possible. This is equivalent to replacing the reservoir network with a set ofuncoupled nodes, each one corresponding to a ‘mode’ of the reservoir. We then have shownthat the training error for the two reservoir computers (coupled network and uncouplednodes) is the same. We build on this result and show that the modes can be manipulatedto signiﬁcantly decrease the overall training error. For example, the simple applicationof time shifts to the individual modes is found to be highly beneﬁcial; namely, a linearreservoir formed of uncoupled nodes with application of random time shifts to the nodes’outputs, is highly competitive against a nonlinear reservoir. As shown in Fig. 5, sometimes,the improvement is by orders of magnitude. A considerable reduction in the training errorwas observed even when the time shifts applied to the individual modes were randomlychosen. It is worth noting that the ability to either temporally delaying or advancing theindividual modes is limited to the uncoupled nodes conﬁguration (right panel of Fig. 2), asin the coupled network conﬁguration (left panel of Fig. 2), the reservoir states are linearcombinations of the modes. REFERENCES Herbert Jaeger. The “echo state” approach to analysing and training recurrent neuralnetworks-with an erratum note.

Bonn, Germany: German National Research Center forInformation Technology GMD Technical Report , 148(34):13, 2001.11

Benjamin Schrauwen, David Verstraeten, and Jan Van Campenhout. An overview ofreservoir computing: theory, applications and implementations. In

Proceedings of the 15theuropean symposium on artiﬁcial neural networks. p. 471-482 2007 , pages 471–482, 2007. Thomas Natschl¨ager, Wolfgang Maass, and Henry Markram. The” liquid computer”: Anovel strategy for real-time computing on time series.

Special issue on Foundations ofInformation Processing of TELEMATIK , 8(ARTICLE):39–43, 2002. Wolfgang Maass, Thomas Natschl¨ager, and Henry Markram. Real-time computing withoutstable states: A new framework for neural computation based on perturbations.

Neuralcomputation , 14(11):2531–2560, 2002. Romain Martinenghi, Sergei Rybalko, Maxime Jacquot, Yanne K Chembo, and LaurentLarger. Photonic nonlinear transient computing with multiple-delay wavelength dynamics.

Physical review letters , 108(24):244101, 2012. Daniel Brunner, Miguel C Soriano, Claudio R Mirasso, and Ingo Fischer. Parallel photonicinformation processing at gigabyte per second data rates using transient states.

Naturecommunications , 4:1364, 2013. Kohei Nakajima, Helmut Hauser, Tao Li, and Rolf Pfeifer. Information processing viaphysical soft body.

Scientiﬁc reports , 5:10487, 2015. Michiel Hermans, Miguel C Soriano, Joni Dambre, Peter Bienstman, and Ingo Fischer.Photonic delay systems as machine learning implementations.

Journal of Machine LearningResearch , 2015. Quentin Vinckier, Fran¸cois Duport, Anteo Smerieri, Kristof Vandoorne, Peter Bienstman,Marc Haelterman, and Serge Massar. High-performance photonic reservoir computer basedon a coherently driven passive cavity.

Optica , 2(5):438–446, 2015. Fran¸cois Duport, Anteo Smerieri, Akram Akrout, Marc Haelterman, and Serge Massar.Fully analogue photonic reservoir computer.

Scientiﬁc reports , 6:22381, 2016. Laurent Larger, Antonio Bayl´on-Fuentes, Romain Martinenghi, Vladimir S Udaltsov,Yanne K Chembo, and Maxime Jacquot. High-speed photonic reservoir computing usinga time-delay-based architecture: Million words per second classiﬁcation.

Physical ReviewX , 7(1):011015, 2017. Johan AK Suykens, Joos PL Vandewalle, and Bart L de Moor.

Artiﬁcial neural networksfor modelling and control of non-linear systems . Springer Science & Business Media, 2012. James P Crutchﬁeld, William L Ditto, and Sudeshna Sinha. Introduction to focus issue:12ntrinsic and designed computation: information processing in dynamical systems—beyondthe digital hegemony, 2010. Paul Rodriguez. Simple recurrent networks learn context-free and context-sensitive lan-guages by counting.

Neural computation , 13(9):2093–2118, 2001. Felix A Gers and E Schmidhuber. Lstm recurrent networks learn simple context-free andcontext-sensitive languages.

IEEE Transactions on Neural Networks , 12(6):1333–1340,2001. Zhixin Lu, Brian R Hunt, and Edward Ott. Attractor reconstruction by machine learning.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 28(6):061104, 2018. Roland S Zimmermann and Ulrich Parlitz. Observing spatio-temporal dynamics of ex-citable media using reservoir computing.

Chaos: An Interdisciplinary Journal of NonlinearScience , 28(4):043118, 2018. Piotr Antonik, Marvyn Gulina, Ja¨el Pauwels, and Serge Massar. Using a reservoir com-puter to learn chaotic attractors, with applications to chaos synchronization and cryptog-raphy.

Physical Review E , 98(1):012215, 2018. Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems andsaving energy in wireless communication. science , 304(5667):78–80, 2004. Jaideep Pathak, Zhixin Lu, Brian R Hunt, Michelle Girvan, and Edward Ott. Usingmachine learning to replicate chaotic attractors and calculate lyapunov exponents fromdata.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 27(12):121102, 2017. Jaideep Pathak, Brian Hunt, Michelle Girvan, Zhixin Lu, and Edward Ott. Model-freeprediction of large spatiotemporally chaotic systems from data: A reservoir computingapproach.

Physical review letters , 120(2):024102, 2018. Azarakhsh Jalalvand, Kris Demuynck, Wesley De Neve, and Jean-Pierre Martens. On theapplication of reservoir computing networks for noisy image recognition.

Neurocomputing ,277:237–248, 2018. Alex Graves, Douglas Eck, Nicole Beringer, and Juergen Schmidhuber. Biologically plau-sible speech recognition with lstm neural nets. In

International Workshop on BiologicallyInspired Approaches to Advanced Information Technology , pages 127–136. Springer, 2004. Tony Robinson. An application of recurrent nets to phone probability estimation.

IEEEtransactions on Neural Networks , 5(2), 1994. Mantas Lukoˇseviˇcius, Herbert Jaeger, and Benjamin Schrauwen. Reservoir computing13rends.

KI-K¨unstliche Intelligenz , 26(4):365–371, 2012. Thomas L Carroll and Louis M Pecora. Network structure eﬀects in reservoir computers.

Chaos: An Interdisciplinary Journal of Nonlinear Science , 29(8):083130, 2019. Stephen Boyd and Leon Chua. Fading memory and the problem of approximating nonlinearoperators with volterra series.

IEEE Transactions on circuits and systems , 32(11):1150–1161, 1985. Erik Bollt. On explaining the surprising success of reservoir computing forecaster of chaos?the universal machine learning dynamical system with contrasts to var and dmd. arXivpreprint arXiv:2008.06530 , 2020. Peter J Schmid. Dynamic mode decomposition of numerical and experimental data.

Jour-nal of ﬂuid mechanics , 656:5–28, 2010. Lyudmila Grigoryeva and Juan-Pablo Ortega. Universal discrete-time reservoir computerswith stochastic inputs and linear readouts using non-homogeneous state-aﬃne systems.

The Journal of Machine Learning Research , 19(1):892–931, 2018. Afroza Shirin, Isaac S. Klickstein, and Francesco Sorrentino. Stability analysis of reser-voir computers dynamics via lyapunov functions.