Adding Filters to Improve Reservoir Computer Performance
AAdding Filters to Improve Reservoir ComputerPerformance
T. L. Carroll
Code 6392, US Naval Research LabWashington, DC 20375 USA
Abstract
Reservoir computers are a type of neuromorphic computer that may be builtwith analog hardware, potentially creating powerful computers that are small,light and consume little power. Typically a reservoir computer is build byconnecting together a set of nonlinear nodes into a network; connecting thenonlinear nodes may be difficult or expensive, however. This work shows howa reservoir computer may be expanded by adding functions to its output. Theparticular functions described here are linear filters, but other functions arepossible. The design and construction of linear filters is well known, and suchfilters may be easily implemented in hardware such as field programmable gatearrays (FPGA’s). The effect of adding filters on the reservoir computer per-formance is simulated for a signal fitting problem, a prediction problem and asignal classification problem.
Keywords: reservoir computer, machine learning
Email address: [email protected] (T. L. Carroll)
Preprint submitted to Physica D October 20, 2020 a r X i v : . [ c s . N E ] O c t dding Filters to Improve Reservoir ComputerPerformance T. L. Carroll
Code 6392, US Naval Research LabWashington, DC 20375 USA
1. Introduction
A reservoir computer is a high dimensional dynamical system that may beused to do computation [1, 2]. The reservoir computer by itself will evolve to astable fixed point; in use the reservoir computer is driven by an input signal s ( n ).The reservoir computer is synchronized in the general sense to the input signal.meaning that the reservoir computer will follow the same trajectory every timeit is driven with the same input signal (after an initial transient). To train areservoir computer, a number of time series signals are extracted and used to doa linear fit to a training signal. The fit coefficients are the output of the trainingprocess. To use the reservoir computer for a computation, the same dynamicalsystem is driven with a different input signal and a linear combination is madefrom the output signals using the coefficients found during the training process.An example of training and testing would be using the x, y and z signals fromthe Lorenz chaotic system as the input signal and creating a linear combinationof reservoir signals to fit the Lorenz z signal [3]. In the computation stage, thereservoir would be driven by the signals x (cid:48) , y (cid:48) and z (cid:48) from the Lorenz system withdifferent initial conditions, and the linear combination made from the trainingcoefficients and the reservoir signals would be a close fit to the corresponding z (cid:48) signal. Email address: [email protected] (T. L. Carroll)
Preprint submitted to Physica D October 20, 2020 ypically a reservoir computer is build by linking together a set of nonlinearnodes in a directed network. A reservoir computer is similar to a recurrentneural network, but the connections between nodes do not change in a reservoircomputer. As a result, reservoir computers may be built as analog systems.Examples of analog reservoir computers so far include photonic systems [4, 5],analog circuits [6], mechanical systems [7] and field programmable gate arrays[8]. This analog approach means that reservoir computers can potentially bevery fast, and yet consume little power, while being small and light.One obstacle to building analog reservoir computers is that creating andconnecting the analog nodes may be difficult, which is why some photonic sys-tems [4, 5] use one actual node and then use time multiplexing to create a set ofvirtual nodes by adding a delay loop to the laser system. This time multiplex-ing increases the number of nodes but slows the response time of the photonicreservoir computer. Other types of analog reservoir computers, such as thoseconstructed from field programmable gate arrays [8] have nodes that are coupleddirectly, but in most systems the coupling of large numbers of nonlinear nodesinto a reservoir computer network is still a research problem.It has been shown that the number of linearly independent signals in areservoir computer is an indicator of how well a reservoir computer can fitsignals . In [9, 10] this number was measured as the covariance rank. A relatedmeasure of capacity was proposed in [11]. The covariance rank of a reservoircomputer is ultimately limited by the number of nodes in the network, but insome cases the rank is less than the number of nodes.In this work I show that it is possible to increase the covariance rank of areservoir computer by adding filters to the reservoir computer output. I also notethat adding filters to a reservoir computer can increase the memory capacity ofa reservoir computer without affecting the nonlinearity. There are many typesof filters that could be used- to keep things simple, in this work I use linear finiteimpulse response (FIR) filters. Infinite impulse response (IIR) filters could alsobe used, but they can increase the fractal dimension of a signal [12], so FIRfilters are used to avoid this complication. Filter design and implementation is3 mature technology, and filters can be implemented in off the shelf devices suchas field programmable gate arrays (FPGAs) [13].I begin by describing how adding functions to a reservoir computer may in-crease the rank of the reservoir computer (Section 2). Section 3 then describesreservoir computers and how filters may be added to increase their rank. Sec-tion 4 shows how adding filters improves signal fitting, Section 5 describes theimpact on prediction, and signal classification is discussed in Section 6. Section7 describes changes in the memory capacity caused by adding filters.
2. Filters and Rank
Given a signal x ( n ) and some function f ( x ), one may create a basis of rank2. Applying Gram-Schmidt orthogonalization, u ( n ) = x ( n ) (cid:107) x ( n ) (cid:107) y ( n ) = f ( x ) (cid:107) f ( x ) (cid:107) z ( n ) = y ( n ) − (cid:104) u ( n ) , y ( n ) (cid:105) u ( n ) u ( n ) = z ( n ) (cid:107) z ( n ) (cid:107) (1)where (cid:104)(cid:105) indicates a dot product and (cid:107) x (cid:107) = (cid:104) x, x (cid:105) . The signals u ( n ) and u ( n )form an orthonormal basis of rank 2, as long as f ( x ) is not just a multiple of x .The Gram-Schmidt procedure may be repeated to create bases of higher rankusing additional functions.One type of function that can be used for f is a linear filter, in particulara finite impulse response (FIR) filter. An FIR filter is a linear system withno feedback. The design of FIR filters uses well established techniques, andimplementing FIR filters in hardware is also well known; a field programablegate array (FPGA) may be used to implement these filters, for example. BecauseFIR filters have no feedback, stability is not a question. It is possible to designstable infinite response filters (which include feedabck), but it is known thatfilters with feedback can increase the fractal dimension of a signal from a chaoticdynamical system [12], a complication that is avoided with FIR filters.4 able 1: Filter coefficients filter number k = 1 k = 2 k = 3 k = 4 k = 51 1 0 0 0 02 1.7321 1 0 0 03 2.4329 2.4662 1 0 04 3.1239 4.3916 3.2011 1 05 3.8107 6.7767 6.8864 3.9363 1 An FIR filter may be described by y ( t ) = N F (cid:88) k =0 a k x ( t − k ) (2)where N F is the filter order.The particular type of FIR filter I used was a Bessel filter [14]. The denom-inator of the transfer function of an n (cid:48) th order Bessel filter is the n (cid:48) th orderBessel polynomial (the numerator is a constant). Other filters may also work,and multiple different types of filters could be used simultaneously- low orderBessel filters were used here because they are simple to describe.I used Bessel filters with orders from 1 to 5. The Bessel filters were designated y ηi ( t ) = η (cid:88) k =1 a k χ i ( t − k ) (3)where i indicates the node index from the reservoir computer and η is the filterorder. The filter coefficients are given in Table 1.
3. Reservoir Computers
Two different node types are used for the reservoir computers in this work.The first reservoir computer uses leaky tanh nodes [15], χ i ( n + 1) = α χ i ( n ) + (1 − α ) tanh M (cid:88) j =1 A ij χ j ( n ) + w i s ( n ) + 1 (4)where the reservoir computer variables are χ i ( n ) , i = 1 ...M with M the numberof nodes, A is an adjacency matrix that described how the different nodes in5he network are connected to each other, W = [ w , w , ...w M ] describes howthe input signal s ( n ) is coupled into the different nodes, and f is a nonlinearfunction.For the leaky tanh map reservoir computer, half of the elements of the adja-cency matrix A were chosen randomly and set to random numbers drawn froma uniform random distribution between -1 and 1. The diagonal elements of A were then set to zero. The spectral radius σ is the largest magnitude of theeigenvectors of A . The entire adjacency matrix was renormalized to have aspectral radius specified for the different examples below.The second node type was a model for the laser experiment of [4]. Thissystem is described by ε ˙ x ( s ) + x ( s ) = β sin [ µx ( s −
1) + ρu I ( s −
1) + φ ] (5)where s is a normalized time.Converting s and x to discrete variables, this system may be modeled by themap x ( n + τ s ) = N (cid:88) j =1 βH ( j ) sin (cid:104) µx ( n − j ) + W s (cid:16)(cid:106) nM (cid:107)(cid:17) + φ (cid:105) (6)where (cid:98)(cid:99) is the floor function and M is the number of nodes. The floor functionmeans that the value of the input signal is sampled once every M time stepsof the map. The variable τ s in this equation is an integer. The variable β wasset at 0.5, µ = 0 . φ = 0. The signal H ( j ) is the impulse response of thelow pass filter in eq. (5). The low pass filter was a first order filter with a timeconstant of τ R = 1 . × − . The elements of the vector W are drawn from auniform random distribution between +1 and -1.The time step in the map of eq. (6) was t s = 7 . × − s. The number ofnodes M was varied, so the total time for one update of the reservoir computerwas M × t s .The map signal x ( n ) was rearranged into a matrix Ω to create the set of6eservoir computer nodes. n = 1 , . . . Ni = ( n mod M ) + 1 j = (cid:4) nM (cid:5) + 1Ω i,j = x ( n ) (7)Each column of this matrix represented one virtual node, while each rowcorresponded to one time step for the input signal. The parameters for eq. (7)were taken from the experiment in [4]. When the reservoir computer was driven with s ( n ), the first 1000 time stepswere discarded as a transient. The next N = 10000 time steps from each nodewere combined in a N × ( M + 1) matrixΩ = χ (1) . . . χ M (1) 1 χ (2) χ M (2) 1... ... ... χ ( N ) . . . χ M ( N ) 1 (8)The last column of Ω was set to 1 to account for any constant offset in the fit.The training signal is fit by h ( n ) = Ω C (9)where h ( n ) = [ h (1) , h (2) . . . h ( N )] is the fit to the training signal g ( n ) =[ g (1) , g (2) . . . g ( N )] and C = [ c , c . . . c N ] is the coefficient vector. The fitcoefficient vector C is found by ridge regression.The training error may be computed from∆ RC = std [Ω C − g ( n )]std [ g ( n )] (10)where std[ ] indicates a standard deviation.7n the testing configuration, the reservoir computer is driven by a new signal s (cid:48) ( n ) and the the matrix of signals from the reservoir is now Ω (cid:48) . The testingerror is ∆ tx = std [Ω (cid:48) C − g (cid:48) ( n )]std [ g (cid:48) ( n )] (11)where g (cid:48) ( n ) is the testing signal. Each node output χ i ( n ) was passed through between one and five filters,with filter coefficients defined in Table 1. The filter outputs were y ηi ( t ), foundas y ηi ( t ) = η (cid:88) k =1 a k χ i ( t − k ) (12)where i indicates the node index from the reservoir computer and η is the filterorder.The filter outputs were arranged in a matrix similar in form to ΩΛ = y (1) y (1) . . . y (1) . . . y N F M (1) 1 y (2) ... ...... ... ... y ( N ) y ( N ) . . . y ( N ) . . . y N F M ( N ) 1 (13)For N time series points, M nodes and a filter order of N F , the size of Λ is N × ( M × N F + 1).Fitting proceeds as with the reservoir only, but using the full Λ matrixinstead; the fit coefficients are found as C F = Λ inv g ( n ) (14)where g ( n ) is the same training signal as was used for the reservoir. The train-ing and testing errors are calculated in the same manner as for the reservoir,substituting Λ for Ω. 8 .3. Covariance Rank The individual columns of the reservoir matrix Ω or the filter matrix Λ maybe used as a basis to fit the training signal g ( n ). Among other things, the fitwill depend on the number of orthogonal columns in Ω.Principal component analysis [16] states that the eigenvectors of the covari-ance matrix of Ω, Θ = Ω T Ω, form an uncorrelated basis set. The rank of thecovariance matrix tells us the number of uncorrelated vectors.Therefore, we will use the rank of the covariance matrix of Ω,Γ (Ω) = rank (cid:0) Ω T Ω (cid:1) Γ (Λ) = rank (cid:0) Λ T Λ (cid:1) (15)to characterize the reservoir matrix Ω or the filter matrix Λ. We calculate therank using the MATLAB rank() function. The maximum covariance rank isequal to the number of nodes, M or the number of nodes times the number offilters. In [9, 10], higher covariance rank was associated with lower testing error.
4. Signal Fitting
The first example of adding filters to a reservoir computer will be fitting the z signal from the Lorenz chaotic system based on the x signal. the The Lorenzsystem [17] is described by dxdt = c y − c x dydt = x ( c − z ) − y dzdt = xy − c z (16)with c =10, c =28, and c =8/3. The equations were numerically integratedwith a time step of t s = 0 . The optimum parameters for fitting the Lorenz z signal when the Lorenz x signal drove the leaky tanh reservoir computer were found by simulations tobe α = 0 .
75 and a spectral radius of 0.48. Figure 1 shows the testing error∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the number of nodes M x10 -13456789 Δ t x M Γ ( Ω ) Δ tx Γ ( Ω ) Figure 1: Testing error ∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the number ofnodes M for the leaky tanh nodes of eq. (4) when the input signal is the Lorenz x signal andthe testing signal is the Lorenz z signal. These quantities were calculated from the reservoirsignal matrix as in eq (8). The reservoir computer is based on the leaky tanh nodes of eq. (4). for the leaky tanh nodes. The notation (Ω) is used to indicate that the testingerror and the covariance rank are calculated from the reservoir signal matrix Ωdefined in eq. 8.Figure 1 shows a typical result, that the covariance rank increases and thetesting error decreases as the number of reservoir computer nodes goes up.Figure 2 shows the ratio of the covariance rank calculated from the filter matrixΛ of eq. (13) to the covariance rank calculated from the reservoir matrix Ω.Figure 2 shows that adding FIR filters as described in Section 2.1 can increasethe covariance rank for the leaky tanh nodes if the reservoir is small, 20 nodesor less. For larger reservoirs, the improvement in rank is not as large. It wasseen in figure 1 that the covariance rank of the reservoir matrix Ω saturatedwhen the reservoir computer reached 100 nodes. Part of this saturation maybe numerical in nature; the MATLAB rank algorithm sets a threshold belowwhich singular values are considered to be zero. If the reservoir matrix Ω is toolarge, numerical errors may cause a number of singular values to be below thisthreshold. For the same reason, the lack of increase in rank of the filter matrixΛ relative to the reservoir matrix may be caused by these round off errors.Figure 3 shows the ratio of the testing error calculated from the filter matrixΛ of eq. (13) to the testing error calculated from the reservoir matrix Ω of eq.(8). When a single filter is used, it is a first order Bessel filter, and Table 110 igure 2: Ratio of covariance rank Γ(Λ) when N f filters are added after the reservoir to thetesting error found using only the reservoir, Γ(Ω). The number of filters used is N f while thenumber of nodes is M . The reservoir computer is based on the leaky tanh nodes of eq. (4).Figure 3: Ratio of testing error ∆ tx (Λ) when N f filters are used after the reservoir to thetesting error found using only the reservoir, ∆ tx (Ω). The number of filters used is N f whilethe number of nodes is M . The reservoir computer is based on the leaky tanh nodes of eq.(4). α in eq. (4) was varied from 0.05 to 1, while the spectral radius σ was varied from 0.1 to 3. For each value of α and σ , 20 random adjacencymatrices and 20 random input vectors W were generated. The elements of W were chosen from a uniform random distribution between -1 and 1, whilea random selection of half of the elements of the adjacency matrix A were setto zeroThe nonzero elements of A were set to uniformly distributed randomnumbers between -1 and 1. The diagonal elements of A were then set to zero.Figure 4 shows the mean testing error for the leaky tanh reservoir computeras the parameter α and the spectral radius σ are chenged. Figure 4: Mean testing error (cid:104) ∆ tx (Ω) (cid:105) as a function of the parameter α and the spectralradius σ for a leaky tanh reservoir with 80 nodes. σ = 0 . α = 0 . (cid:104) ∆ tx (Λ) / ∆ tx (Ω) (cid:105) . Figure 5: Mean ratio of the testing error ∆ tx (Λ) when five filters are used in the filter matrixΛ to the testing error for the leaky tanh reservoir by itself, as a function of the parameter α and the spectral radius σ . The reservoir had 80 nodes Figures 4 and 5 show that the largest improvement in testing error causedby adding filters comes in the parameter region where the reservoir computerby itself already has the smallest testing error. Possibly the regions of poorfit in both figures occur because the reservoir signals are not a good match forthe training signal, so increasing the rank of the reservoir computer does notprovide any benefit. For α = 1 in figures 4 and 5, χ i ( n + 1) = χ i ( n ) in eq. (4),so the reservoir computer does not respond to the input signal. The leaky tanh nodes of the previous section are usually implemented ona digital computer, in which case there is no advantage to adding filters. Thelaser system of [4] was built as an actual experiment. The performance of thereservoir computer could be improved by adding more virtual nodes, but addingvirtual nodes slowed the response time, so there is a more obvious advantage toadding filters to this system. 13 .80.60.40.20.0 Δ t x ( Ω ) M Γ ( Ω ) Δ tx ( Ω ) Γ ( Ω ) Figure 6: Testing error ∆ tx (Ω) and the covariance rank Γ(Ω) as a function of the numberof nodes M for the laser system nodes of eqs. (6-7) when the input signal is the Lorenz x signal and the testing signal is the Lorenz z signal. These quantities were calculated from thereservoir signal matrix as in eq (8). For a filter of order η , there is a startup delay in the FPGA of η/
2. Afterthat the filter runs in real time; one filter time step equals one laser systemtime step. For M nodes, the time for one full update of the reservoir computerwill be M × t s . If instead we have M f nodes followed by filters, the updatetime is M f × t s × (1 + η/ η is the maximum filter order. We gain inspeed if M f < M/ (1 + η/ η = 5, M/ (1 + η/
2) = 0 . M , while M f = 0 . M, so there is a speedup factor of 1.4Up to the capacity of the FPGA, the filters can be operated in parallel, soby using different types of filters, or filters with different frequency or phasecharacteristics, we could combine more than η filters with a maximum order of η to achieve even a greater speedup. It is also possible to use both FIR and IIR(infinite impulse response) filters, as long as the IIR filters are stable.The map of eqs. (6-7) was used to simulate the laser reservoir computer.The parameters for the laser map were taken from the experiment in [4].Figure 6 shows the testing error and covariance rank for the laser simulationof eqs. (6-7). The testing error stops decreasing as the number of nodes becomesgreater than M = 20. Instead of an adjacency matrix, the coupling betweennodes in the laser system comes from the low pass filter that is part of thedelay loop. The low pass filter creates a fading memory in the laser reservoircomputer, but it also means that nodes separated by a number of time steps14 igure 7: Ratio of covariance rank Γ(Λ) when N f filters are used after the reservoir to thetesting error found using only the reservoir, Γ(Ω). The number of filters used is N f while thenumber of nodes is M . The reservoir computer was modeled by the laser system of eqs. (6-7). longer than the memory of the low pass filter are not coupled. As a result,increasing the reservoir past a certain size does not lead to a further decreasein testing error. The ratio of the low pass filter time constant to the reservoirtime step was τ R /t s = 20, which is about the number of nodes for which thetesting error stops decreasing.Figure 7 shows the ratio of the covariance rank calculated from the filtermatrix Λ of eq. (13) to the covariance rank calculated from the reservoir matrixΩ of eq. (8) for the laser reservoir computer simulation. As in figures 2 and 3,when only a single filter is present, it is equivalent to the identify.For reservoirs up to about 50 nodes, adding filters increased the rank of thefilter matrix Λ relative to the reservoir matrix Ω in proportion to the number ofadded filters. As the reservoir computer became larger, the increase in rank wasnot as large. Figure 6 shows that the covariance rank stops increasing with thenumber of nodes for a reservoir computer with about 250 nodes. When five filtersare added to the reservoir computer, the filter matrix Λ will have dimensions15 igure 8: Ratio of testing error ∆ tx (Λ) when N f filters are used after the reservoir to thetesting error found using only the reservoir, ∆ tx (Ω). The number of filters used is N f whilethe number of nodes is M . The reservoir computer was modeled by the laser system of eq.(6-7). × .0010.010.1 Δ t x / Γ M leaky tanh reservoir only leaky tanh with 5 filters laser reservoir only laser with 5 filters Figure 9: Ratio of testing error ∆ tx to covariance rank Γ for both types of reservoir computer,for the reservoir computer only and the reservoir computer followed by five filters. alone. Adding filters also changes the memory capacity of the reservoir; thesechanges will be adressed in section 7.This section on signal fitting shows that augmenting a reservoir computercan improve performance, but the improvement is larger for reservoir computerswith smaller numbers of nodes. This is not necessarily bad; if it is easy tobuild the reservoir computer with large numbers of nodes, then performanceimprovement is not as useful.
5. Prediction
Predicting the future time evolution of a signal given its past time evolutionis a variation on fitting signals. In this case, if the input signal is s ( n ), thetraining signal is g ( n ) = s ( n + τ ), where τ represents some number of timesteps into the future. A useful time scale for predicting the Lorenz x signal isthe Lyapunov time, or the reciprocal of the largest Lyapunov exponent. For theparameters in eq. (16), the largest Lyapunov exponent is 0.9/s, so the Lyapunovtime is T L = 1 . x signal and predict the x signal 0 . T L s into the future. At anintegration time step of 0.02 s, this amounts to 13 points into the future. Theerror in prediction is calculated by a process analogous to the testing error in17q. (11), but to avoid confusion the prediction error will be called ∆ P .The prediction time here does not look very large, but it is a prediction timefor an open loop configuration, where the output of the reservoir computer isnot fed back into the input. The time interval in Lyapunov times is similar tothe prediction time in [18], which was one of the first papers to use a reservoircomputer for predicting a chaotic system. Figure 10 shows the error for predicting the future of the Lorenz x signal0.25 Lyapunov times into the future, or ∆ P , as a function of the number ofnodes M for a reservoir computer using the leaky tanh nodes (no filters). Δ p ( Ω ) M Figure 10: ∆ P (Ω) is the error for predicting the future of the Lorenz x signal 0.25 Lyapunovtimes into the future, plotted versus the number of nodes M . This prediction is for thereservoir computer only, using leaky tanh nodes. Figure 11 shows the ratio of the prediction error using a reservoir computeraugmented with filters to the prediction error using the reservoir computer only.Figure 11 shows that adding linear filters to the leaky tanh reservoir com-puter can lower the error in predicting the Lorenz x signal, but the improvementin prediction error is only large for reservoir computers with less than 50 nodes.The prediction error for the leaky tanh reservoir may also be evaluatedfor different values of the parameter α and the spectral radius. As with theparameter sweeps for fitting the z signal, 20 random adjacency matrices werecreated for each value of spectral radius and α and the mean testing errors were18 M N f Δ p ( Λ ) / Δ p ( Ω ) prediction error ratio Figure 11: Prediction error ratio ∆ P (Λ) / ∆ P (Ω) for predicting the future of the Lorenz x signal when the leaky tanh reservoir computer is augmented with up to five filters. Thenumber of nodes is M , while N f is the number of filters. plotted. Figure 12 shows the mean error in predicting the Lorenz x variable asa function of α and the spectral radius.Figure 13 shows the mean value of the ratio of the prediction error for theLorenz x signal with five filters following the reservoir to the prediction error forthe reservoir only, as the parameter α and the spectral radius σ are scanned. Aswith the error in fitting the Lorenz z signal, the most improvement in predictingthe x signal when filters are added comes for the same parameter range for thesmallest prediction error for the reservoir only. The prediction error for the laser system model is plotted in figure 14. Theprediction error saturates for somewhere between 50 and 100 nodes.Adding filters to the reservoir computer modeled on the laser system canimprove prediction, as shown in figure 15. The largest improvement in predictionerror when three or more filters are used appears to come when the reservoircomputer has more than 20 nodes. The prediction error does not improve for19 igure 12: Mean error in predicting the Lorenz x variable, ∆ tx (Ω), as a function of theparameter α and the spectral radius σ for a leaky tanh reservoir with 80 nodes. more than three filters. It is likely that the prediction error for the laser systemis not limited by covariance rank in this reservoir computer, but rather by howwell the reservoir signals χ i ( n ) match the testing signal.
6. Classification
The reservoir computers from the previous sections will be used to determineif adding filters to a reservoir computer can improve the ability to classify a setof signals. The signals in this case are the x component of the 19 Sprott chaoticsystems [19]. Each of the Sprott systems was numerically integrated with a timestep of 0.5.Some of the attractors for the Sprott systems have small basins of attraction,so rather than set random initial conditions to create different realizations ofeach Sprott system, a long time series of the x signal was generated for eachof the Sprott systems. The test and training signals were taken from differentsections of this long time series. The reservoir computers used to classify theSprott signals each had M = 100 nodes.The reservoir computers were trained with 100 training examples each. Foreach training example, the reservoir computer was first driven by a 1000 pointsignal to eliminate transients, after which the next 1000 output points from20 igure 13: Mean ratio of the prediction error for the Lorenz x variable, ∆ tx (Λ), when fivefilters are used in the filter matrix Λ to the prediction error for the leaky tanh reservoir byitself, as a function of the parameter α and the spectral radius σ . The reservoir had 80 nodes Δ p ( Ω ) M Figure 14: ∆ P (Ω) is the error for predicting the future of the Lorenz x signal 0.25 Lyapunovtimes into the future, plotted versus the number of nodes M . This prediction is for thereservoir computer only, using nodes modeled on the laser system (eqs. 6-7. each node were used to fit the training signal. Fit coefficients were found usingboth the reservoir matrix Ω and the filter matrix Λ. For each of the Sprott sys-tems, the fit coefficient vectors were given by c ( j, k ) , j = 1 . . . , k = 1 . . . j indicated the j ’th section of the x signal and k indicated the par-ticular Sprott system. Each coefficient vector had M components: c ( j, k ) =[ c ( j, k ) , c ( j, k ) , . . . c M ( j, k )], where M , the number of nodes, was 100. For eachof the Sprott systems a reference coefficient vector was defined as the mean of21 igure 15: Prediction error ratio ∆ P (Λ) / ∆ P (Ω) for predicting the future of the Lorenz x signal when the reservoir computer based on the laser system model is augmented with up tofive filters. The number of nodes is M , while N f is the number of filters. the coefficient vectors: C ( k ) = 1100 (cid:88) j =1 c ( j, k ) . (17)The set of C ( k ) = [ C ( k ) , C ( k ) , . . . C M ( k )] , k = 1 . . .
19 coefficients formed areference library.To identify the Sprott systems, the reservoir computers were again drivenwith a 1000 point time series of the x signal from each of the Sprott systemsto eliminate transients. The next 1000 points were saved in the reservoir com-puter matrix Ω or the filter matrix Λ. Once again, for each section a set of fitcoefficients c ( j, l ) , j = 1 . . . , l = 1 . . .
19 was found, for both the reservoircomputer matrix and the filter matrix.Each time a coefficient vector was found, it was compared to the referencelibrary according to Ψ j ( l, k ) = (cid:118)(cid:117)(cid:117)(cid:116) M (cid:88) i =1 [ c i ( j, l ) − C j ( k )] (18)22here i = 1 . . . M indicated the components (or node numbers) of the coefficientvectors. The difference Ψ j ( l, k ) was computed for all 19 reference coefficientvectors C k , and the value of k that gave the minimum of Ψ j ( l, k ) was identifiedas the Sprott system that generated the coefficient vector c ( i, l ).The probability of making an error when identifying from which of the Sprottsystems an x signal originated is shown in figure 16. Each time a signal from aSprott system was compared to the reference library, if the value of k that gavea minimum of Ψ j ( l, k ) did not correspond to the Sprott system that generatedthe signal, an error was recorded. The probability of error P E was the totalnumber of errors divided by the total number of comparisons.Figure 16 shows that adding just two filters (really just one filter, since oneof the filters is the identity) to the leaky tanh reservoir computer improves theability to identify the Sprott systems if the reservoir computer has only twonodes, but not if it has more than two nodes. Adding five filters improves theclassification of signals if the reservoir computer has less than eight nodes. Oncethe reservoir has eight nodes, adding more nodes or adding filters gives littleextra benefit.Adding filters to a reservoir computer showed a greater advantage whenfitting signals (Section 4) than when classifying signals. In fitting signals thereservoir time series acts as a basis, so the covariance rank of the reservoiroutput is important. Adding filters increases this rank. Classifying signalsmay not depend as much on how well the reservoir computer fits the signals;what is more important is that the set of fit coefficients are sufficiently differentfor different inputs. Adding filters to a reservoir computer does create morecoefficients, but the filters are linear, so the extra coefficients may not be usefulin distinguishing the different Sprott signals.Figure 17 shows the probability of error in identifying the Sprott systemsusing a reservoir computer based on the laser system model of eqs. (6-7).Figure 17 for the laser model reservoir computer shows that adding two orfive filters to this type of reservoir computer does reduce classification errorwhen the reservoir has less than six nodes. Once the reservoir computer has six23r more nodes, adding filters does not show any advantage for classification.Adding filters to a very small reservoir computer can increase the dimensionof the coefficient vector, but once the coefficient vector has enough dimensions,adding additional filters does not lower the classification error. Still, if creating areservoir with many coupled nodes is difficult or expensive, adding linear filtersto a small reservoir computer can be useful.Figures 18 and 19 are confusion matrices for the Sprott classification problemfor the leaky tanh reservoir computer or the laser reservoir computer. In eachof these figures, the reservoir computer had four nodes, so when two filters wereadded there were 8 coefficients (eq. 17) and when five filters were added therewere 20 coefficients.Figure 18 shows that for the leaky tanh reservoir computer, the classificationaccuracy was limited by systems 2 and 3. The reservoir computer with twoadded filters has fewer misclassifications than for the reservoir computer alone,but the probability of misclassification between systems 2 and 3 is about thesame. When five filters are added, the probability of misclassification betweensystems 2 and 3 is smaller.With no added filters, the laser system reservoir computer confusion matricesin figure 19 also show a high probability of misclassification between systems 2and 3, but also a high probability of misclassification for systems 6, 7, 8 and17. The classification errors for all but systems 2 and 3 drop sharply when twofilters are added, and the misclassification between systems 2 and 3 is smallerwhen five filters are added.
7. Memory
Memory capacity, as defined in [20], is considered to be an important quan-tity in reservoir computers. Memory capacity is a measure of how well thereservoir can reproduce previous values of the input signal.24he memory capacity as a function of delay isMC k = N (cid:80) n =1 [ s ( n − k ) − s ] [ g k ( n ) − g k ] N (cid:80) n =1 [ s ( n − k ) − s ] N (cid:80) n =1 [ g k ( n ) − g k ] (19)where the overbar indicator indicates the mean. The signal g k ( n ) is the fit ofthe reservoir signals χ i ( n ) to the delayed input signal s ( n − k ). The memorycapacity is MC = ∞ (cid:88) k =1 MC k (20)Input signals such as the Lorenz x signal contain correlations in time, whichwill cause errors in the memory calculation, so in eq. (19), s ( n ) is a randomsignal uniformly distributed between -1 and +1. There are some drawbacks todefining memory in this way; the reservoir is nonlinear, so its response will bedifferent for different input signals, but this memory definition is the standarddefinition used in the field of reservoir computing.Figure 20 shows the testing error for both reservoir types as a function ofmemory capacity.Figure 20 shows two things; the testing error decreases as memory capacityincreases, and the memory capacity for both reservoirs is higher with five filtersfollowing the reservoir than for the reservoir only. The data is more scatteredfor the laser system, but the trend is still there. It has been noted that there isa tradeoff between nonlinearity and memory in reservoir computers [21]; addingfilters is a way to add memory to a reservoir computer without affecting thenonlinearity.
8. Summary
Reservoir computers should show the greatest advantage over other types ofcomputing when they are built as analog systems, but building these systemsmay be difficult or expensive. Creating individual nonlinear nodes may bedifficult, but the largest cost in building analog reservoir computers may bein connecting the nodes in a network. The work in this paper demonstrates25hat reservoir computers may be expanded if the nonlinear network is followedby a set of linear filters. The design and implementation of linear filters is wellknown, so there should be little cost for adding filters to the reservoir computer.This work showed that adding filters improved the performance of two typesof reservoir computer for signal fitting, for prediction and for classification.Improvements in classifying signals were largest for small reservoir computers,but small reservoir computers are where the improvement is most needed.Reservoir computers may be expanded using other types of functions besideslinear FIR filters; the linear filters were used here because they are simple todesign and characterize. It is even possible to use nonlinear functions; one earlypaper, for example, connected linear nodes into a network and followed thelinear network with nonlinear output functions [22].This work was supported by the Naval Research Laboratory’s Basic ResearchProgram.
References [1] H. Jaeger, The echo state approach to analysing and training recurrentneural networks-with an erratum note, German National Research Centerfor Information Technology GMD Technical Report 148 (1) (2001) 34. doi:http://publica.fraunhofer.de/documents/B-73135.html .[2] T. Natschlaeger, W. Maass, H. Markram, The ”liquid computer”: A novelstrategy for real-time computing on time series, Special Issue on Founda-tions of Information Processing of TELEMATIK 8 (1) (2002) 39–43.[3] Z. Lu, J. Pathak, B. Hunt, M. Girvan, R. Brockett, E. Ott, Reservoirobservers: Model-free inference of unmeasured variables in chaotic systems,Chaos: An Interdisciplinary Journal of Nonlinear Science 27 (4) (2017)041102. doi:10.1063/1.4979665 .[4] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pes-quera, C. R. Mirasso, I. Fischer, Photonic information processing beyond26uring: an optoelectronic implementation of reservoir computing, OpticsExpress 20 (3) (2012) 3241–3249. doi:10.1364/oe.20.003241 .[5] G. V. der Sande, D. Brunner, M. C. Soriano, Advances in photonic reser-voir computing, Nanophotonics 6 (3) (2017) 561–576. doi:10.1515/nanoph-2016-0132 .[6] F. Schurmann, K. Meier, J. Schemmel, Edge of chaos computation inmixed-mode vlsi - a hard liquid, in: Advances in Neural Information Pro-cessing Systems 17, MIT Press, 2004, pp. 1201–1208.[7] G. Dion, S. Mejaouri, J. Sylvestre, Reservoir computing with a singledelay-coupled non-linear mechanical oscillator, Journal of Applied Physics124 (15) (2018) 152132. doi:10.1063/1.5038038 .[8] D. Canaday, A. Griffith, D. J. Gauthier, Rapid time series prediction witha hardware-based reservoir computer, Chaos: An Interdisciplinary Journalof Nonlinear Science 28 (12) (2018) 123119. doi:10.1063/1.5048199 .[9] T. L. Carroll, L. M. Pecora, Network structure effects in reservoir comput-ers, Chaos 29 (8) (2019) 083130. doi:10.1063/1.5097686 .[10] T. L. Carroll, Dimension of reservoir computers, Chaos: An Interdisci-plinary Journal of Nonlinear Science 30 (1) (2020) 013102. doi:10.1063/1.5128898 .[11] J. Dambre, D. Verstraeten, B. Schrauwen, S. Massar, Informa-tion processing capacity of dynamical systems, Scientific Reports2 (2012) 514. .[12] R. Badii, G. Broggi, B. Derighetti, M. Ravani, S. Ciliberto, A. Politi, M. A.Rubio, Dimension increase in filtered chaotic signals, Phys Rev Lett 60 (11)(1988) 979–982. doi:10.1103/PhysRevLett.60.979 .2713] Wikipedia, Field-programmable gate array doi:https://en.wikipedia.org/wiki/Field-programmable_gate_array .URL https://en.wikipedia.org/wiki/Field-programmable_gate_array [14] U. Tietze, C. Shenk, Electronic Circuits, Springer, Berlin, 1991.[15] H. Jaeger, M. Lukoˇseviˇcius, D. Popovici, U. Siewert, Optimization andapplications of echo state networks with leaky- integrator neurons, Neu-ral Networks 20 (3) (2007) 335–352. doi:https://doi.org/10.1016/j.neunet.2007.04.016 .[16] I. T. Jolliffe, Principal component analysis, Springer, 2011.[17] E. N. Lorenz, Deterministic non-periodic flow, Journal of Atmospheric Sci-ence 20 (2) (1963) 130–141. doi:10.1175/1520-0469(1963)020<0130:DNF>2.0.CO;2 .[18] H. Jaeger, H. Haas, Harnessing nonlinearity: Predicting chaotic systemsand saving energy in wireless communication, Science 304 (5667) (2004)78–80. doi:10.1126/science.1091277 .[19] J. C. Sprott, Some simple chaotic flows, Physical Review E 50 (2) (1994)R647–R650. doi:10.1103/PhysRevE.50.R647 .[20] H. Jaeger, Short term memory in echo state networks, Technical reportGMD-Forschungszentrum Informationstechnik.[21] M. Inubushi, K. Yoshimura, Reservoir computing beyond memory-nonlinearity trade-off, Scientific Reports 7 (1) (2017) 10199. doi:10.1038/s41598-017-10257-6 .[22] S. Boyd, L. Chua, Fading memory and the problem of approximating non-linear operators with volterra series, IEEE Transactions on Circuits andSystems 32 (11) (1985) 1150–1161. doi:10.1109/TCS.1985.1085649 .28 igure 16: Probability of error P E in identifying the source of an x signal from one of the 19Sprott systems as a function of the number of nodes M in the leaky tanh reservoir computer. igure 17: Probability of error P E in identifying the source of an x signal from one of the 19Sprott systems as a function of the number of nodes M in the reservoir computer based onthe laser system model of eqs. (6-7). igure 18: Confusion matrices for classifying the 19 Sprott chaotic systems based on theleaky tanh reservoir computer. The reservoir computer had four nodes for these figures.Figure 19: Confusion matrices for classifying the 19 Sprott chaotic systems based on thelaser reservoir computer. The reservoir computer had four nodes for these figures. Δ t x MC leaky tanh reservoir only leaky tanh with 5 filters laser reservoir only laser reservoir with 5 filters Figure 20: Testing error ∆ tx for both types of reservoir as a function of memory capacityfor both types of reservoir as a function of memory capacity