[PDF] Adding Filters to Improve Reservoir Computer Performance

Abstract

Reservoir computers are a type of neuromorphic computer that may be built a an analog system, potentially creating powerful computers that are small, light and consume little power. Typically a reservoir computer is build by connecting together a set of nonlinear nodes into a network; connecting the nonlinear nodes may be difficult or expensive, however. This work shows how a reservoir computer may be expanded by adding functions to its output. The particular functions described here are linear filters, but other functions are possible. The design and construction of linear filters is well known, and such filters may be easily implemented in hardware such as field programmable gate arrays (FPGA's). The effect of adding filters on the reservoir computer performance is simulated for a signal fitting problem, a prediction problem and a signal classification problem.

Full PDF

AAdding Filters to Improve Reservoir ComputerPerformance

T. L. Carroll

Code 6392, US Naval Research LabWashington, DC 20375 USA

Abstract

Reservoir computers are a type of neuromorphic computer that may be builtwith analog hardware, potentially creating powerful computers that are small,light and consume little power. Typically a reservoir computer is build byconnecting together a set of nonlinear nodes into a network; connecting thenonlinear nodes may be diﬃcult or expensive, however. This work shows howa reservoir computer may be expanded by adding functions to its output. Theparticular functions described here are linear ﬁlters, but other functions arepossible. The design and construction of linear ﬁlters is well known, and suchﬁlters may be easily implemented in hardware such as ﬁeld programmable gatearrays (FPGA’s). The eﬀect of adding ﬁlters on the reservoir computer per-formance is simulated for a signal ﬁtting problem, a prediction problem and asignal classiﬁcation problem.

Keywords: reservoir computer, machine learning

Email address: [email protected] (T. L. Carroll)

Preprint submitted to Physica D October 20, 2020 a r X i v : . [ c s . N E ] O c t dding Filters to Improve Reservoir ComputerPerformance T. L. Carroll

Code 6392, US Naval Research LabWashington, DC 20375 USA

1. Introduction

A reservoir computer is a high dimensional dynamical system that may beused to do computation [1, 2]. The reservoir computer by itself will evolve to astable ﬁxed point; in use the reservoir computer is driven by an input signal s ( n ).The reservoir computer is synchronized in the general sense to the input signal.meaning that the reservoir computer will follow the same trajectory every timeit is driven with the same input signal (after an initial transient). To train areservoir computer, a number of time series signals are extracted and used to doa linear ﬁt to a training signal. The ﬁt coeﬃcients are the output of the trainingprocess. To use the reservoir computer for a computation, the same dynamicalsystem is driven with a diﬀerent input signal and a linear combination is madefrom the output signals using the coeﬃcients found during the training process.An example of training and testing would be using the x, y and z signals fromthe Lorenz chaotic system as the input signal and creating a linear combinationof reservoir signals to ﬁt the Lorenz z signal [3]. In the computation stage, thereservoir would be driven by the signals x (cid:48) , y (cid:48) and z (cid:48) from the Lorenz system withdiﬀerent initial conditions, and the linear combination made from the trainingcoeﬃcients and the reservoir signals would be a close ﬁt to the corresponding z (cid:48) signal. Email address: [email protected] (T. L. Carroll)

Preprint submitted to Physica D October 20, 2020 ypically a reservoir computer is build by linking together a set of nonlinearnodes in a directed network. A reservoir computer is similar to a recurrentneural network, but the connections between nodes do not change in a reservoircomputer. As a result, reservoir computers may be built as analog systems.Examples of analog reservoir computers so far include photonic systems [4, 5],analog circuits [6], mechanical systems [7] and ﬁeld programmable gate arrays[8]. This analog approach means that reservoir computers can potentially bevery fast, and yet consume little power, while being small and light.One obstacle to building analog reservoir computers is that creating andconnecting the analog nodes may be diﬃcult, which is why some photonic sys-tems [4, 5] use one actual node and then use time multiplexing to create a set ofvirtual nodes by adding a delay loop to the laser system. This time multiplex-ing increases the number of nodes but slows the response time of the photonicreservoir computer. Other types of analog reservoir computers, such as thoseconstructed from ﬁeld programmable gate arrays [8] have nodes that are coupleddirectly, but in most systems the coupling of large numbers of nonlinear nodesinto a reservoir computer network is still a research problem.It has been shown that the number of linearly independent signals in areservoir computer is an indicator of how well a reservoir computer can ﬁtsignals . In [9, 10] this number was measured as the covariance rank. A relatedmeasure of capacity was proposed in [11]. The covariance rank of a reservoircomputer is ultimately limited by the number of nodes in the network, but insome cases the rank is less than the number of nodes.In this work I show that it is possible to increase the covariance rank of areservoir computer by adding ﬁlters to the reservoir computer output. I also notethat adding ﬁlters to a reservoir computer can increase the memory capacity ofa reservoir computer without aﬀecting the nonlinearity. There are many typesof ﬁlters that could be used- to keep things simple, in this work I use linear ﬁniteimpulse response (FIR) ﬁlters. Inﬁnite impulse response (IIR) ﬁlters could alsobe used, but they can increase the fractal dimension of a signal [12], so FIRﬁlters are used to avoid this complication. Filter design and implementation is3 mature technology, and ﬁlters can be implemented in oﬀ the shelf devices suchas ﬁeld programmable gate arrays (FPGAs) [13].I begin by describing how adding functions to a reservoir computer may in-crease the rank of the reservoir computer (Section 2). Section 3 then describesreservoir computers and how ﬁlters may be added to increase their rank. Sec-tion 4 shows how adding ﬁlters improves signal ﬁtting, Section 5 describes theimpact on prediction, and signal classiﬁcation is discussed in Section 6. Section7 describes changes in the memory capacity caused by adding ﬁlters.

2. Filters and Rank

Given a signal x ( n ) and some function f ( x ), one may create a basis of rank2. Applying Gram-Schmidt orthogonalization, u ( n ) = x ( n ) (cid:107) x ( n ) (cid:107) y ( n ) = f ( x ) (cid:107) f ( x ) (cid:107) z ( n ) = y ( n ) − (cid:104) u ( n ) , y ( n ) (cid:105) u ( n ) u ( n ) = z ( n ) (cid:107) z ( n ) (cid:107) (1)where (cid:104)(cid:105) indicates a dot product and (cid:107) x (cid:107) = (cid:104) x, x (cid:105) . The signals u ( n ) and u ( n )form an orthonormal basis of rank 2, as long as f ( x ) is not just a multiple of x .The Gram-Schmidt procedure may be repeated to create bases of higher rankusing additional functions.One type of function that can be used for f is a linear ﬁlter, in particulara ﬁnite impulse response (FIR) ﬁlter. An FIR ﬁlter is a linear system withno feedback. The design of FIR ﬁlters uses well established techniques, andimplementing FIR ﬁlters in hardware is also well known; a ﬁeld programablegate array (FPGA) may be used to implement these ﬁlters, for example. BecauseFIR ﬁlters have no feedback, stability is not a question. It is possible to designstable inﬁnite response ﬁlters (which include feedabck), but it is known thatﬁlters with feedback can increase the fractal dimension of a signal from a chaoticdynamical system [12], a complication that is avoided with FIR ﬁlters.4 able 1: Filter coeﬃcients ﬁlter number k = 1 k = 2 k = 3 k = 4 k = 51 1 0 0 0 02 1.7321 1 0 0 03 2.4329 2.4662 1 0 04 3.1239 4.3916 3.2011 1 05 3.8107 6.7767 6.8864 3.9363 1 An FIR ﬁlter may be described by y ( t ) = N F (cid:88) k =0 a k x ( t − k ) (2)where N F is the ﬁlter order.The particular type of FIR ﬁlter I used was a Bessel ﬁlter [14]. The denom-inator of the transfer function of an n (cid:48) th order Bessel ﬁlter is the n (cid:48) th orderBessel polynomial (the numerator is a constant). Other ﬁlters may also work,and multiple diﬀerent types of ﬁlters could be used simultaneously- low orderBessel ﬁlters were used here because they are simple to describe.I used Bessel ﬁlters with orders from 1 to 5. The Bessel ﬁlters were designated y ηi ( t ) = η (cid:88) k =1 a k χ i ( t − k ) (3)where i indicates the node index from the reservoir computer and η is the ﬁlterorder. The ﬁlter coeﬃcients are given in Table 1.

3. Reservoir Computers

Two diﬀerent node types are used for the reservoir computers in this work.The ﬁrst reservoir computer uses leaky tanh nodes [15], χ i ( n + 1) = α χ i ( n ) + (1 − α ) tanh  M (cid:88) j =1 A ij χ j ( n ) + w i s ( n ) + 1  (4)where the reservoir computer variables are χ i ( n ) , i = 1 ...M with M the numberof nodes, A is an adjacency matrix that described how the diﬀerent nodes in5he network are connected to each other, W = [ w , w , ...w M ] describes howthe input signal s ( n ) is coupled into the diﬀerent nodes, and f is a nonlinearfunction.For the leaky tanh map reservoir computer, half of the elements of the adja-cency matrix A were chosen randomly and set to random numbers drawn froma uniform random distribution between -1 and 1. The diagonal elements of A were then set to zero. The spectral radius σ is the largest magnitude of theeigenvectors of A . The entire adjacency matrix was renormalized to have aspectral radius speciﬁed for the diﬀerent examples below.The second node type was a model for the laser experiment of [4]. Thissystem is described by ε ˙ x ( s ) + x ( s ) = β sin [ µx ( s −

1) + ρu I ( s −

1) + φ ] (5)where s is a normalized time.Converting s and x to discrete variables, this system may be modeled by themap x ( n + τ s ) = N (cid:88) j =1 βH ( j ) sin (cid:104) µx ( n − j ) + W s (cid:16)(cid:106) nM (cid:107)(cid:17) + φ (cid:105) (6)where (cid:98)(cid:99) is the ﬂoor function and M is the number of nodes. The ﬂoor functionmeans that the value of the input signal is sampled once every M time stepsof the map. The variable τ s in this equation is an integer. The variable β wasset at 0.5, µ = 0 . φ = 0. The signal H ( j ) is the impulse response of thelow pass ﬁlter in eq. (5). The low pass ﬁlter was a ﬁrst order ﬁlter with a timeconstant of τ R = 1 . × − . The elements of the vector W are drawn from auniform random distribution between +1 and -1.The time step in the map of eq. (6) was t s = 7 . × − s. The number ofnodes M was varied, so the total time for one update of the reservoir computerwas M × t s .The map signal x ( n ) was rearranged into a matrix Ω to create the set of6eservoir computer nodes. n = 1 , . . . Ni = ( n mod M ) + 1 j = (cid:4) nM (cid:5) + 1Ω i,j = x ( n ) (7)Each column of this matrix represented one virtual node, while each rowcorresponded to one time step for the input signal. The parameters for eq. (7)were taken from the experiment in [4]. When the reservoir computer was driven with s ( n ), the ﬁrst 1000 time stepswere discarded as a transient. The next N = 10000 time steps from each nodewere combined in a N × ( M + 1) matrixΩ =  χ (1) . . . χ M (1) 1 χ (2) χ M (2) 1... ... ... χ ( N ) . . . χ M ( N ) 1  (8)The last column of Ω was set to 1 to account for any constant oﬀset in the ﬁt.The training signal is ﬁt by h ( n ) = Ω C (9)where h ( n ) = [ h (1) , h (2) . . . h ( N )] is the ﬁt to the training signal g ( n ) =[ g (1) , g (2) . . . g ( N )] and C = [ c , c . . . c N ] is the coeﬃcient vector. The ﬁtcoeﬃcient vector C is found by ridge regression.The training error may be computed from∆ RC = std [Ω C − g ( n )]std [ g ( n )] (10)where std[ ] indicates a standard deviation.7n the testing conﬁguration, the reservoir computer is driven by a new signal s (cid:48) ( n ) and the the matrix of signals from the reservoir is now Ω (cid:48) . The testingerror is ∆ tx = std [Ω (cid:48) C − g (cid:48) ( n )]std [ g (cid:48) ( n )] (11)where g (cid:48) ( n ) is the testing signal. Each node output χ i ( n ) was passed through between one and ﬁve ﬁlters,with ﬁlter coeﬃcients deﬁned in Table 1. The ﬁlter outputs were y ηi ( t ), foundas y ηi ( t ) = η (cid:88) k =1 a k χ i ( t − k ) (12)where i indicates the node index from the reservoir computer and η is the ﬁlterorder.The ﬁlter outputs were arranged in a matrix similar in form to ΩΛ =  y (1) y (1) . . . y (1) . . . y N F M (1) 1 y (2) ... ...... ... ... y ( N ) y ( N ) . . . y ( N ) . . . y N F M ( N ) 1  (13)For N time series points, M nodes and a ﬁlter order of N F , the size of Λ is N × ( M × N F + 1).Fitting proceeds as with the reservoir only, but using the full Λ matrixinstead; the ﬁt coeﬃcients are found as C F = Λ inv g ( n ) (14)where g ( n ) is the same training signal as was used for the reservoir. The train-ing and testing errors are calculated in the same manner as for the reservoir,substituting Λ for Ω. 8 .3. Covariance Rank The individual columns of the reservoir matrix Ω or the ﬁlter matrix Λ maybe used as a basis to ﬁt the training signal g ( n ). Among other things, the ﬁtwill depend on the number of orthogonal columns in Ω.Principal component analysis [16] states that the eigenvectors of the covari-ance matrix of Ω, Θ = Ω T Ω, form an uncorrelated basis set. The rank of thecovariance matrix tells us the number of uncorrelated vectors.Therefore, we will use the rank of the covariance matrix of Ω,Γ (Ω) = rank (cid:0) Ω T Ω (cid:1) Γ (Λ) = rank (cid:0) Λ T Λ (cid:1) (15)to characterize the reservoir matrix Ω or the ﬁlter matrix Λ. We calculate therank using the MATLAB rank() function. The maximum covariance rank isequal to the number of nodes, M or the number of nodes times the number ofﬁlters. In [9, 10], higher covariance rank was associated with lower testing error.

4. Signal Fitting

The ﬁrst example of adding ﬁlters to a reservoir computer will be ﬁtting the z signal from the Lorenz chaotic system based on the x signal. the The Lorenzsystem [17] is described by dxdt = c y − c x dydt = x ( c − z ) − y dzdt = xy − c z (16)with c =10, c =28, and c =8/3. The equations were numerically integratedwith a time step of t s = 0 . The optimum parameters for ﬁtting the Lorenz z signal when the Lorenz x signal drove the leaky tanh reservoir computer were found by simulations tobe α = 0 .

75 and a spectral radius of 0.48. Figure 1 shows the testing error∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the number of nodes M x10 -13456789 Δ t x M Γ ( Ω ) Δ tx Γ ( Ω ) Figure 1: Testing error ∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the number ofnodes M for the leaky tanh nodes of eq. (4) when the input signal is the Lorenz x signal andthe testing signal is the Lorenz z signal. These quantities were calculated from the reservoirsignal matrix as in eq (8). The reservoir computer is based on the leaky tanh nodes of eq. (4). for the leaky tanh nodes. The notation (Ω) is used to indicate that the testingerror and the covariance rank are calculated from the reservoir signal matrix Ωdeﬁned in eq. 8.Figure 1 shows a typical result, that the covariance rank increases and thetesting error decreases as the number of reservoir computer nodes goes up.Figure 2 shows the ratio of the covariance rank calculated from the ﬁlter matrixΛ of eq. (13) to the covariance rank calculated from the reservoir matrix Ω.Figure 2 shows that adding FIR ﬁlters as described in Section 2.1 can increasethe covariance rank for the leaky tanh nodes if the reservoir is small, 20 nodesor less. For larger reservoirs, the improvement in rank is not as large. It wasseen in ﬁgure 1 that the covariance rank of the reservoir matrix Ω saturatedwhen the reservoir computer reached 100 nodes. Part of this saturation maybe numerical in nature; the MATLAB rank algorithm sets a threshold belowwhich singular values are considered to be zero. If the reservoir matrix Ω is toolarge, numerical errors may cause a number of singular values to be below thisthreshold. For the same reason, the lack of increase in rank of the ﬁlter matrixΛ relative to the reservoir matrix may be caused by these round oﬀ errors.Figure 3 shows the ratio of the testing error calculated from the ﬁlter matrixΛ of eq. (13) to the testing error calculated from the reservoir matrix Ω of eq.(8). When a single ﬁlter is used, it is a ﬁrst order Bessel ﬁlter, and Table 110 igure 2: Ratio of covariance rank Γ(Λ) when N f ﬁlters are added after the reservoir to thetesting error found using only the reservoir, Γ(Ω). The number of ﬁlters used is N f while thenumber of nodes is M . The reservoir computer is based on the leaky tanh nodes of eq. (4).Figure 3: Ratio of testing error ∆ tx (Λ) when N f ﬁlters are used after the reservoir to thetesting error found using only the reservoir, ∆ tx (Ω). The number of ﬁlters used is N f whilethe number of nodes is M . The reservoir computer is based on the leaky tanh nodes of eq.(4). α in eq. (4) was varied from 0.05 to 1, while the spectral radius σ was varied from 0.1 to 3. For each value of α and σ , 20 random adjacencymatrices and 20 random input vectors W were generated. The elements of W were chosen from a uniform random distribution between -1 and 1, whilea random selection of half of the elements of the adjacency matrix A were setto zeroThe nonzero elements of A were set to uniformly distributed randomnumbers between -1 and 1. The diagonal elements of A were then set to zero.Figure 4 shows the mean testing error for the leaky tanh reservoir computeras the parameter α and the spectral radius σ are chenged. Figure 4: Mean testing error (cid:104) ∆ tx (Ω) (cid:105) as a function of the parameter α and the spectralradius σ for a leaky tanh reservoir with 80 nodes. σ = 0 . α = 0 . (cid:104) ∆ tx (Λ) / ∆ tx (Ω) (cid:105) . Figure 5: Mean ratio of the testing error ∆ tx (Λ) when ﬁve ﬁlters are used in the ﬁlter matrixΛ to the testing error for the leaky tanh reservoir by itself, as a function of the parameter α and the spectral radius σ . The reservoir had 80 nodes Figures 4 and 5 show that the largest improvement in testing error causedby adding ﬁlters comes in the parameter region where the reservoir computerby itself already has the smallest testing error. Possibly the regions of poorﬁt in both ﬁgures occur because the reservoir signals are not a good match forthe training signal, so increasing the rank of the reservoir computer does notprovide any beneﬁt. For α = 1 in ﬁgures 4 and 5, χ i ( n + 1) = χ i ( n ) in eq. (4),so the reservoir computer does not respond to the input signal. The leaky tanh nodes of the previous section are usually implemented ona digital computer, in which case there is no advantage to adding ﬁlters. Thelaser system of [4] was built as an actual experiment. The performance of thereservoir computer could be improved by adding more virtual nodes, but addingvirtual nodes slowed the response time, so there is a more obvious advantage toadding ﬁlters to this system. 13 .80.60.40.20.0 Δ t x ( Ω ) M Γ ( Ω ) Δ tx ( Ω ) Γ ( Ω ) Figure 6: Testing error ∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the numberof nodes M for the laser system nodes of eqs. (6-7) when the input signal is the Lorenz x signal and the testing signal is the Lorenz z signal. These quantities were calculated from thereservoir signal matrix as in eq (8). For a ﬁlter of order η , there is a startup delay in the FPGA of η/

2. Afterthat the ﬁlter runs in real time; one ﬁlter time step equals one laser systemtime step. For M nodes, the time for one full update of the reservoir computerwill be M × t s . If instead we have M f nodes followed by ﬁlters, the updatetime is M f × t s × (1 + η/ η is the maximum ﬁlter order. We gain inspeed if M f < M/ (1 + η/ η = 5, M/ (1 + η/

2) = 0 . M , while M f = 0 . M, so there is a speedup factor of 1.4Up to the capacity of the FPGA, the ﬁlters can be operated in parallel, soby using diﬀerent types of ﬁlters, or ﬁlters with diﬀerent frequency or phasecharacteristics, we could combine more than η ﬁlters with a maximum order of η to achieve even a greater speedup. It is also possible to use both FIR and IIR(inﬁnite impulse response) ﬁlters, as long as the IIR ﬁlters are stable.The map of eqs. (6-7) was used to simulate the laser reservoir computer.The parameters for the laser map were taken from the experiment in [4].Figure 6 shows the testing error and covariance rank for the laser simulationof eqs. (6-7). The testing error stops decreasing as the number of nodes becomesgreater than M = 20. Instead of an adjacency matrix, the coupling betweennodes in the laser system comes from the low pass ﬁlter that is part of thedelay loop. The low pass ﬁlter creates a fading memory in the laser reservoircomputer, but it also means that nodes separated by a number of time steps14 igure 7: Ratio of covariance rank Γ(Λ) when N f ﬁlters are used after the reservoir to thetesting error found using only the reservoir, Γ(Ω). The number of ﬁlters used is N f while thenumber of nodes is M . The reservoir computer was modeled by the laser system of eqs. (6-7). longer than the memory of the low pass ﬁlter are not coupled. As a result,increasing the reservoir past a certain size does not lead to a further decreasein testing error. The ratio of the low pass ﬁlter time constant to the reservoirtime step was τ R /t s = 20, which is about the number of nodes for which thetesting error stops decreasing.Figure 7 shows the ratio of the covariance rank calculated from the ﬁltermatrix Λ of eq. (13) to the covariance rank calculated from the reservoir matrixΩ of eq. (8) for the laser reservoir computer simulation. As in ﬁgures 2 and 3,when only a single ﬁlter is present, it is equivalent to the identify.For reservoirs up to about 50 nodes, adding ﬁlters increased the rank of theﬁlter matrix Λ relative to the reservoir matrix Ω in proportion to the number ofadded ﬁlters. As the reservoir computer became larger, the increase in rank wasnot as large. Figure 6 shows that the covariance rank stops increasing with thenumber of nodes for a reservoir computer with about 250 nodes. When ﬁve ﬁltersare added to the reservoir computer, the ﬁlter matrix Λ will have dimensions15 igure 8: Ratio of testing error ∆ tx (Λ) when N f ﬁlters are used after the reservoir to thetesting error found using only the reservoir, ∆ tx (Ω). The number of ﬁlters used is N f whilethe number of nodes is M . The reservoir computer was modeled by the laser system of eq.(6-7). × .0010.010.1 Δ t x / Γ M leaky tanh reservoir only leaky tanh with 5 ﬁlters laser reservoir only laser with 5 ﬁlters Figure 9: Ratio of testing error ∆ tx to covariance rank Γ for both types of reservoir computer,for the reservoir computer only and the reservoir computer followed by ﬁve ﬁlters. alone. Adding ﬁlters also changes the memory capacity of the reservoir; thesechanges will be adressed in section 7.This section on signal ﬁtting shows that augmenting a reservoir computercan improve performance, but the improvement is larger for reservoir computerswith smaller numbers of nodes. This is not necessarily bad; if it is easy tobuild the reservoir computer with large numbers of nodes, then performanceimprovement is not as useful.

5. Prediction

Predicting the future time evolution of a signal given its past time evolutionis a variation on ﬁtting signals. In this case, if the input signal is s ( n ), thetraining signal is g ( n ) = s ( n + τ ), where τ represents some number of timesteps into the future. A useful time scale for predicting the Lorenz x signal isthe Lyapunov time, or the reciprocal of the largest Lyapunov exponent. For theparameters in eq. (16), the largest Lyapunov exponent is 0.9/s, so the Lyapunovtime is T L = 1 . x signal and predict the x signal 0 . T L s into the future. At anintegration time step of 0.02 s, this amounts to 13 points into the future. Theerror in prediction is calculated by a process analogous to the testing error in17q. (11), but to avoid confusion the prediction error will be called ∆ P .The prediction time here does not look very large, but it is a prediction timefor an open loop conﬁguration, where the output of the reservoir computer isnot fed back into the input. The time interval in Lyapunov times is similar tothe prediction time in [18], which was one of the ﬁrst papers to use a reservoircomputer for predicting a chaotic system. Figure 10 shows the error for predicting the future of the Lorenz x signal0.25 Lyapunov times into the future, or ∆ P , as a function of the number ofnodes M for a reservoir computer using the leaky tanh nodes (no ﬁlters). Δ p ( Ω ) M Figure 10: ∆ P (Ω) is the error for predicting the future of the Lorenz x signal 0.25 Lyapunovtimes into the future, plotted versus the number of nodes M . This prediction is for thereservoir computer only, using leaky tanh nodes. Figure 11 shows the ratio of the prediction error using a reservoir computeraugmented with ﬁlters to the prediction error using the reservoir computer only.Figure 11 shows that adding linear ﬁlters to the leaky tanh reservoir com-puter can lower the error in predicting the Lorenz x signal, but the improvementin prediction error is only large for reservoir computers with less than 50 nodes.The prediction error for the leaky tanh reservoir may also be evaluatedfor diﬀerent values of the parameter α and the spectral radius. As with theparameter sweeps for ﬁtting the z signal, 20 random adjacency matrices werecreated for each value of spectral radius and α and the mean testing errors were18 M N f Δ p ( Λ ) / Δ p ( Ω ) prediction error ratio Figure 11: Prediction error ratio ∆ P (Λ) / ∆ P (Ω) for predicting the future of the Lorenz x signal when the leaky tanh reservoir computer is augmented with up to ﬁve ﬁlters. Thenumber of nodes is M , while N f is the number of ﬁlters. plotted. Figure 12 shows the mean error in predicting the Lorenz x variable asa function of α and the spectral radius.Figure 13 shows the mean value of the ratio of the prediction error for theLorenz x signal with ﬁve ﬁlters following the reservoir to the prediction error forthe reservoir only, as the parameter α and the spectral radius σ are scanned. Aswith the error in ﬁtting the Lorenz z signal, the most improvement in predictingthe x signal when ﬁlters are added comes for the same parameter range for thesmallest prediction error for the reservoir only. The prediction error for the laser system model is plotted in ﬁgure 14. Theprediction error saturates for somewhere between 50 and 100 nodes.Adding ﬁlters to the reservoir computer modeled on the laser system canimprove prediction, as shown in ﬁgure 15. The largest improvement in predictionerror when three or more ﬁlters are used appears to come when the reservoircomputer has more than 20 nodes. The prediction error does not improve for19 igure 12: Mean error in predicting the Lorenz x variable, ∆ tx (Ω), as a function of theparameter α and the spectral radius σ for a leaky tanh reservoir with 80 nodes. more than three ﬁlters. It is likely that the prediction error for the laser systemis not limited by covariance rank in this reservoir computer, but rather by howwell the reservoir signals χ i ( n ) match the testing signal.

6. Classiﬁcation

The reservoir computers from the previous sections will be used to determineif adding ﬁlters to a reservoir computer can improve the ability to classify a setof signals. The signals in this case are the x component of the 19 Sprott chaoticsystems [19]. Each of the Sprott systems was numerically integrated with a timestep of 0.5.Some of the attractors for the Sprott systems have small basins of attraction,so rather than set random initial conditions to create diﬀerent realizations ofeach Sprott system, a long time series of the x signal was generated for eachof the Sprott systems. The test and training signals were taken from diﬀerentsections of this long time series. The reservoir computers used to classify theSprott signals each had M = 100 nodes.The reservoir computers were trained with 100 training examples each. Foreach training example, the reservoir computer was ﬁrst driven by a 1000 pointsignal to eliminate transients, after which the next 1000 output points from20 igure 13: Mean ratio of the prediction error for the Lorenz x variable, ∆ tx (Λ), when ﬁveﬁlters are used in the ﬁlter matrix Λ to the prediction error for the leaky tanh reservoir byitself, as a function of the parameter α and the spectral radius σ . The reservoir had 80 nodes Δ p ( Ω ) M Figure 14: ∆ P (Ω) is the error for predicting the future of the Lorenz x signal 0.25 Lyapunovtimes into the future, plotted versus the number of nodes M . This prediction is for thereservoir computer only, using nodes modeled on the laser system (eqs. 6-7. each node were used to ﬁt the training signal. Fit coeﬃcients were found usingboth the reservoir matrix Ω and the ﬁlter matrix Λ. For each of the Sprott sys-tems, the ﬁt coeﬃcient vectors were given by c ( j, k ) , j = 1 . . . , k = 1 . . . j indicated the j ’th section of the x signal and k indicated the par-ticular Sprott system. Each coeﬃcient vector had M components: c ( j, k ) =[ c ( j, k ) , c ( j, k ) , . . . c M ( j, k )], where M , the number of nodes, was 100. For eachof the Sprott systems a reference coeﬃcient vector was deﬁned as the mean of21 igure 15: Prediction error ratio ∆ P (Λ) / ∆ P (Ω) for predicting the future of the Lorenz x signal when the reservoir computer based on the laser system model is augmented with up toﬁve ﬁlters. The number of nodes is M , while N f is the number of ﬁlters. the coeﬃcient vectors: C ( k ) = 1100 (cid:88) j =1 c ( j, k ) . (17)The set of C ( k ) = [ C ( k ) , C ( k ) , . . . C M ( k )] , k = 1 . . .

19 coeﬃcients formed areference library.To identify the Sprott systems, the reservoir computers were again drivenwith a 1000 point time series of the x signal from each of the Sprott systemsto eliminate transients. The next 1000 points were saved in the reservoir com-puter matrix Ω or the ﬁlter matrix Λ. Once again, for each section a set of ﬁtcoeﬃcients c ( j, l ) , j = 1 . . . , l = 1 . . .

19 was found, for both the reservoircomputer matrix and the ﬁlter matrix.Each time a coeﬃcient vector was found, it was compared to the referencelibrary according to Ψ j ( l, k ) = (cid:118)(cid:117)(cid:117)(cid:116) M (cid:88) i =1 [ c i ( j, l ) − C j ( k )] (18)22here i = 1 . . . M indicated the components (or node numbers) of the coeﬃcientvectors. The diﬀerence Ψ j ( l, k ) was computed for all 19 reference coeﬃcientvectors C k , and the value of k that gave the minimum of Ψ j ( l, k ) was identiﬁedas the Sprott system that generated the coeﬃcient vector c ( i, l ).The probability of making an error when identifying from which of the Sprottsystems an x signal originated is shown in ﬁgure 16. Each time a signal from aSprott system was compared to the reference library, if the value of k that gavea minimum of Ψ j ( l, k ) did not correspond to the Sprott system that generatedthe signal, an error was recorded. The probability of error P E was the totalnumber of errors divided by the total number of comparisons.Figure 16 shows that adding just two ﬁlters (really just one ﬁlter, since oneof the ﬁlters is the identity) to the leaky tanh reservoir computer improves theability to identify the Sprott systems if the reservoir computer has only twonodes, but not if it has more than two nodes. Adding ﬁve ﬁlters improves theclassiﬁcation of signals if the reservoir computer has less than eight nodes. Oncethe reservoir has eight nodes, adding more nodes or adding ﬁlters gives littleextra beneﬁt.Adding ﬁlters to a reservoir computer showed a greater advantage whenﬁtting signals (Section 4) than when classifying signals. In ﬁtting signals thereservoir time series acts as a basis, so the covariance rank of the reservoiroutput is important. Adding ﬁlters increases this rank. Classifying signalsmay not depend as much on how well the reservoir computer ﬁts the signals;what is more important is that the set of ﬁt coeﬃcients are suﬃciently diﬀerentfor diﬀerent inputs. Adding ﬁlters to a reservoir computer does create morecoeﬃcients, but the ﬁlters are linear, so the extra coeﬃcients may not be usefulin distinguishing the diﬀerent Sprott signals.Figure 17 shows the probability of error in identifying the Sprott systemsusing a reservoir computer based on the laser system model of eqs. (6-7).Figure 17 for the laser model reservoir computer shows that adding two orﬁve ﬁlters to this type of reservoir computer does reduce classiﬁcation errorwhen the reservoir has less than six nodes. Once the reservoir computer has six23r more nodes, adding ﬁlters does not show any advantage for classiﬁcation.Adding ﬁlters to a very small reservoir computer can increase the dimensionof the coeﬃcient vector, but once the coeﬃcient vector has enough dimensions,adding additional ﬁlters does not lower the classiﬁcation error. Still, if creating areservoir with many coupled nodes is diﬃcult or expensive, adding linear ﬁltersto a small reservoir computer can be useful.Figures 18 and 19 are confusion matrices for the Sprott classiﬁcation problemfor the leaky tanh reservoir computer or the laser reservoir computer. In eachof these ﬁgures, the reservoir computer had four nodes, so when two ﬁlters wereadded there were 8 coeﬃcients (eq. 17) and when ﬁve ﬁlters were added therewere 20 coeﬃcients.Figure 18 shows that for the leaky tanh reservoir computer, the classiﬁcationaccuracy was limited by systems 2 and 3. The reservoir computer with twoadded ﬁlters has fewer misclassiﬁcations than for the reservoir computer alone,but the probability of misclassiﬁcation between systems 2 and 3 is about thesame. When ﬁve ﬁlters are added, the probability of misclassiﬁcation betweensystems 2 and 3 is smaller.With no added ﬁlters, the laser system reservoir computer confusion matricesin ﬁgure 19 also show a high probability of misclassiﬁcation between systems 2and 3, but also a high probability of misclassiﬁcation for systems 6, 7, 8 and17. The classiﬁcation errors for all but systems 2 and 3 drop sharply when twoﬁlters are added, and the misclassiﬁcation between systems 2 and 3 is smallerwhen ﬁve ﬁlters are added.

7. Memory

Memory capacity, as deﬁned in [20], is considered to be an important quan-tity in reservoir computers. Memory capacity is a measure of how well thereservoir can reproduce previous values of the input signal.24he memory capacity as a function of delay isMC k = N (cid:80) n =1 [ s ( n − k ) − s ] [ g k ( n ) − g k ] N (cid:80) n =1 [ s ( n − k ) − s ] N (cid:80) n =1 [ g k ( n ) − g k ] (19)where the overbar indicator indicates the mean. The signal g k ( n ) is the ﬁt ofthe reservoir signals χ i ( n ) to the delayed input signal s ( n − k ). The memorycapacity is MC = ∞ (cid:88) k =1 MC k (20)Input signals such as the Lorenz x signal contain correlations in time, whichwill cause errors in the memory calculation, so in eq. (19), s ( n ) is a randomsignal uniformly distributed between -1 and +1. There are some drawbacks todeﬁning memory in this way; the reservoir is nonlinear, so its response will bediﬀerent for diﬀerent input signals, but this memory deﬁnition is the standarddeﬁnition used in the ﬁeld of reservoir computing.Figure 20 shows the testing error for both reservoir types as a function ofmemory capacity.Figure 20 shows two things; the testing error decreases as memory capacityincreases, and the memory capacity for both reservoirs is higher with ﬁve ﬁltersfollowing the reservoir than for the reservoir only. The data is more scatteredfor the laser system, but the trend is still there. It has been noted that there isa tradeoﬀ between nonlinearity and memory in reservoir computers [21]; addingﬁlters is a way to add memory to a reservoir computer without aﬀecting thenonlinearity.

8. Summary

Reservoir computers should show the greatest advantage over other types ofcomputing when they are built as analog systems, but building these systemsmay be diﬃcult or expensive. Creating individual nonlinear nodes may bediﬃcult, but the largest cost in building analog reservoir computers may bein connecting the nodes in a network. The work in this paper demonstrates25hat reservoir computers may be expanded if the nonlinear network is followedby a set of linear ﬁlters. The design and implementation of linear ﬁlters is wellknown, so there should be little cost for adding ﬁlters to the reservoir computer.This work showed that adding ﬁlters improved the performance of two typesof reservoir computer for signal ﬁtting, for prediction and for classiﬁcation.Improvements in classifying signals were largest for small reservoir computers,but small reservoir computers are where the improvement is most needed.Reservoir computers may be expanded using other types of functions besideslinear FIR ﬁlters; the linear ﬁlters were used here because they are simple todesign and characterize. It is even possible to use nonlinear functions; one earlypaper, for example, connected linear nodes into a network and followed thelinear network with nonlinear output functions [22].This work was supported by the Naval Research Laboratory’s Basic ResearchProgram.

References [1] H. Jaeger, The echo state approach to analysing and training recurrentneural networks-with an erratum note, German National Research Centerfor Information Technology GMD Technical Report 148 (1) (2001) 34. doi:http://publica.fraunhofer.de/documents/B-73135.html .[2] T. Natschlaeger, W. Maass, H. Markram, The ”liquid computer”: A novelstrategy for real-time computing on time series, Special Issue on Founda-tions of Information Processing of TELEMATIK 8 (1) (2002) 39–43.[3] Z. Lu, J. Pathak, B. Hunt, M. Girvan, R. Brockett, E. Ott, Reservoirobservers: Model-free inference of unmeasured variables in chaotic systems,Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (4) (2017)041102. doi:10.1063/1.4979665 .[4] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pes-quera, C. R. Mirasso, I. Fischer, Photonic information processing beyond26uring: an optoelectronic implementation of reservoir computing, OpticsExpress 20 (3) (2012) 3241–3249. doi:10.1364/oe.20.003241 .[5] G. V. der Sande, D. Brunner, M. C. Soriano, Advances in photonic reser-voir computing, Nanophotonics 6 (3) (2017) 561–576. doi:10.1515/nanoph-2016-0132 .[6] F. Schurmann, K. Meier, J. Schemmel, Edge of chaos computation inmixed-mode vlsi - a hard liquid, in: Advances in Neural Information Pro-cessing Systems 17, MIT Press, 2004, pp. 1201–1208.[7] G. Dion, S. Mejaouri, J. Sylvestre, Reservoir computing with a singledelay-coupled non-linear mechanical oscillator, Journal of Applied Physics124 (15) (2018) 152132. doi:10.1063/1.5038038 .[8] D. Canaday, A. Griﬃth, D. J. Gauthier, Rapid time series prediction witha hardware-based reservoir computer, Chaos: An Interdisciplinary Journalof Nonlinear Science 28 (12) (2018) 123119. doi:10.1063/1.5048199 .[9] T. L. Carroll, L. M. Pecora, Network structure eﬀects in reservoir comput-ers, Chaos 29 (8) (2019) 083130. doi:10.1063/1.5097686 .[10] T. L. Carroll, Dimension of reservoir computers, Chaos: An Interdisci-plinary Journal of Nonlinear Science 30 (1) (2020) 013102. doi:10.1063/1.5128898 .[11] J. Dambre, D. Verstraeten, B. Schrauwen, S. Massar, Informa-tion processing capacity of dynamical systems, Scientiﬁc Reports2 (2012) 514. .[12] R. Badii, G. Broggi, B. Derighetti, M. Ravani, S. Ciliberto, A. Politi, M. A.Rubio, Dimension increase in ﬁltered chaotic signals, Phys Rev Lett 60 (11)(1988) 979–982. doi:10.1103/PhysRevLett.60.979 .2713] Wikipedia, Field-programmable gate array doi:https://en.wikipedia.org/wiki/Field-programmable_gate_array .URL https://en.wikipedia.org/wiki/Field-programmable_gate_array [14] U. Tietze, C. Shenk, Electronic Circuits, Springer, Berlin, 1991.[15] H. Jaeger, M. Lukoˇseviˇcius, D. Popovici, U. Siewert, Optimization andapplications of echo state networks with leaky- integrator neurons, Neu-ral Networks 20 (3) (2007) 335–352. doi:https://doi.org/10.1016/j.neunet.2007.04.016 .[16] I. T. Jolliﬀe, Principal component analysis, Springer, 2011.[17] E. N. Lorenz, Deterministic non-periodic ﬂow, Journal of Atmospheric Sci-ence 20 (2) (1963) 130–141. doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2 .[18] H. Jaeger, H. Haas, Harnessing nonlinearity: Predicting chaotic systemsand saving energy in wireless communication, Science 304 (5667) (2004)78–80. doi:10.1126/science.1091277 .[19] J. C. Sprott, Some simple chaotic ﬂows, Physical Review E 50 (2) (1994)R647–R650. doi:10.1103/PhysRevE.50.R647 .[20] H. Jaeger, Short term memory in echo state networks, Technical reportGMD-Forschungszentrum Informationstechnik.[21] M. Inubushi, K. Yoshimura, Reservoir computing beyond memory-nonlinearity trade-oﬀ, Scientiﬁc Reports 7 (1) (2017) 10199. doi:10.1038/s41598-017-10257-6 .[22] S. Boyd, L. Chua, Fading memory and the problem of approximating non-linear operators with volterra series, IEEE Transactions on Circuits andSystems 32 (11) (1985) 1150–1161. doi:10.1109/TCS.1985.1085649 .28 igure 16: Probability of error P E in identifying the source of an x signal from one of the 19Sprott systems as a function of the number of nodes M in the leaky tanh reservoir computer. igure 17: Probability of error P E in identifying the source of an x signal from one of the 19Sprott systems as a function of the number of nodes M in the reservoir computer based onthe laser system model of eqs. (6-7). igure 18: Confusion matrices for classifying the 19 Sprott chaotic systems based on theleaky tanh reservoir computer. The reservoir computer had four nodes for these ﬁgures.Figure 19: Confusion matrices for classifying the 19 Sprott chaotic systems based on thelaser reservoir computer. The reservoir computer had four nodes for these ﬁgures. Δ t x MC leaky tanh reservoir only leaky tanh with 5 ﬁlters laser reservoir only laser reservoir with 5 ﬁlters Figure 20: Testing error ∆ tx for both types of reservoir as a function of memory capacityfor both types of reservoir as a function of memory capacity