[PDF] Limitations of the recall capabilities in delay based reservoir computing systems

Abstract

We analyze the memory capacity of a delay based reservoir computer with a Hopf normal form as nonlinearity and numerically compute the linear as well as the higher order recall capabilities. A possible physical realisation could be a laser with external cavity, for which the information is fed via electrical injection. A task independent quantification of the computational capability of the reservoir system is done via a complete orthonormal set of basis functions. Our results suggest that even for constant readout dimension the total memory capacity is dependent on the ratio between the information input period, also called the clock cycle, and the time delay in the system. Optimal performance is found for a time delay about 1.6 times the clock cycle

Full PDF

CCognitive Computation manuscript No. (will be inserted by the editor)

Limitations of the recall capabilities in delay based reservoircomputing systems

Felix K¨oster · Dominik Ehlert · Kathy L¨udge

Received: date / Accepted: date

Abstract Objectives:

We analyze the memory ca-pacity of a delay based reservoir computer with a Hopfnormal form as nonlinearity and numerically computethe linear as well as the higher order recall capabilities.A possible physical realisation could be a laser with ex-ternal cavity, for which the information is fed via electri-cal injection.

Methods:

A task independent quantiﬁ-cation of the computational capability of the reservoirsystem is done via a complete orthonormal set of basisfunctions.

Results:

Our results suggest that even forconstant readout dimension the total memory capacityis dependent on the ratio between the information inputperiod, also called the clock cycle, and the time delayin the system.

Conclusions:

Optimal performance isfound for a time delay about 1.6 times the clock cycle

Keywords

Lasers · Reservoir Computing · NonlinearDynamics

Reservoir computing is a machine learning paradigm [1]inspired by the human brain [2], which utilizes the nat-

Felix K¨oster (corresponding author)Institut f¨ur Theoretische Physik, Technische Universit¨atBerlin, Berlin 10623 Germany Straße des 17. Juni 135Tel.: +49-30-314-24254E-mail: [email protected] EhlertInstitut f¨ur Theoretische Physik, Technische Universit¨atBerlin, Berlin 10623 Germany Straße des 17. Juni 135E-mail: [email protected] L¨udgeInstitut f¨ur Theoretische Physik, Technische Universit¨atBerlin, Berlin 10623 Germany Straße des 17. Juni 135Tel.: +49-30-314-23002E-mail: [email protected] ural computational capabilities of dynamical systems.As a subset of recurrent neural networks it was devel-oped to predict time-dependent tasks with the advan-tage of a very fast training procedure. Generally thetraining of recurrent neural networks is connected withhigh computational cost resulting e.g. from connectionsthat are correlated in time. Therefore, problems like thevanishing gradient in time arise [3]. Reservoir comput-ing avoids this problem by training just a linear outputlayer, leaving the rest of the system (the reservoir) asit is. Thus, the inherent computing capabilites can beexploited. One can divide a reservoir into three distinctsubsystems, the input layer, which corresponds to theprojection of the input information into the system, thedynamical system itself that processes the information,and the output layer, which is a linear combination ofthe system’s states trained to predict an often time-dependent task.Many diﬀerent realisations have been presented in thelast years, ranging from a bucket of water [4] over ﬁeldprogrammable gate arrays (FPGAs) [5] to dissociatedneural cell cultures [6], being used for satellite com-munications [7], real-time audio processing [8,9], bit-error correction for optical data transmission [10], am-plitude of chaotic laser pulse prediction [11] and cross-predicting the dynamics of an injected laser [12]. Es-pecially opto-electronic [13,14] and optical setups [15,16,17,18,19] were frequently studied because their highspeed and low energy consumption makes them prefer-able for hardware realisations.The interest in reservoir computing was refreshed whenAppeltant et al. showed a realisation with a single dy-namical node under inﬂuence of feedback [20], whichintroduced a time-multiplexed reservoir rather than aspatially extended system. A schematic sketch is shownin Fig. 1. In general the delay architecture slows down a r X i v : . [ c s . ET ] S e p Felix K¨oster et al.Laser T (cid:80) Time multiplexing Virtual node τ OutputInput viaelectricalinjection Linear combination θθ Fig. 1: Schematic sketch of time-multiplexed reser-voir computing scheme. The input is preprocessed bymultiplication with a mask that induces the time-multiplexing and is then electrically injected. The laserin our case is governed by a Hopf normal form. Theoutput dimension of the system is in this example 4.the information processing speed but reduces complex-ity of the hardware. Many neuron based, electrome-chanical, opto-electronic and photonic realisations [21,22,23,24,25,26] showed the capabilites from time-seriespredictions [27,28] over an equalization task on nonlin-early distorted signals [29] up to fast word recognition[30]. More general analysis showed the general and task-independent computational capabilities of semiconduc-tor lasers [31]. A broad overview is given in [32,33].In this paper we perform a numerical analysis of therecall capabilities and the computing performance ofa simple nonlinear oscillator, modelled by a Hopf nor-mal form, with delayed feedback. We calculate the to-tal memory capacity as well as the linear and nonlinearcontributions using the method derived by Dambre etal. in [34].The paper is structured as follows. First, we shortlyexplain the concept of time-multiplexed reservoir com-puting and give a short overview of the method used forcalculating the memory capacity. After that we presentour results and discuss the impact of the delay timeon the performance and the diﬀerent nonlinear recallcontributions.

Traditionally, reservoir computing was realised by ran-domly connecting nodes with simple dynamics (for ex-ample the tanh-function [1]) to a network, which wasthen used to process information. The linear estimatorof the readouts is then trained to approximate a target,e.g. predict a time-dependent task. The network thustransforms the input into a high dimensional space inwhich the linear combination can be used to seperatediﬀerent inputs, i.e. to classify the given data.In the traditional reservoir computing setup a reac-tion from the system s n = ( s n , s n , . . . , s Mn ) ∈ R M is recorded together with the corresponding input u n and the target o n . In this case n is the index for the n -th input-output training datapoint, ranging from 1to N , and M is the dimension of the measured systemstates. The goal for the reservoir computing paradigmis to approximate the target o n as close as possible withlinear combinations of the states s n for all input-outputpairs n , meaning that (cid:80) Mm =1 w m s mn = ˆ o n ≈ o n for all n , where w = ( w , w , . . . , w M ) ∈ R M are the weightsto be trained. We want to ﬁnd the best solution for s · w ≈ o , (1)where s ∈ R N × R M is the state matrix deﬁned by allsystem state reactions s n to their corresponding inputs u n , w are the weights to train and o ∈ R N is the vectorof targets to be approximated. This is equivalent to aleast square problem which is analytically solved by [35] w = ( s T s ) − s T o . (2)The capability of the system to approximate the targettask can be quantiﬁed by the normalized root meansquare diﬀerence between the approximated answers ˆ o n and the targets o n NRMSE = (cid:118)(cid:117)(cid:117)(cid:117)(cid:116) N (cid:80) n =1 ( o n − ˆ o n ) N · var ( o ) , (3)where NRMSE is the normalized root mean-square er-ror of the target task with var ( o ) being the variance ofthe target values o = ( o , o , . . . , o N ) and N the num-ber of sample points. An NRMSE of 1 indicates that thesystem is not capable of approximating the task betterthan approximating the mean value, a value NRMSE=0indicates that it is able to compute the task perfectly.For a successful operation N (cid:29) M needs to be fulﬁlled,where M is the number of output weights w m and N is the number of training data points. This correspondsto a training data set of size N being signiﬁcantly big-ger than the possible output dimension M to preventoverﬁtting.Appeltant et . al . introduced in [20] a time multiplexedscheme for applying the reservoir computing paradigmon a dynamical system with delayed feedback. In thiscase, the measured states for one input-output pair re-action s n = ( s n , s n , . . . , s Mn ) are recorded at diﬀerenttimes t m = t n + mθ , with m = 1 , , . . . , M , where t n isthe time at which the n -th input u n is fed into the sys-tem. θ is describing the distance between two recordedstates of the system and is called the virtual node sep-aration time. The time between two inputs t n +1 − t n iscalled the clock cycle T and describes the period lengthin which one input u n is applied to the system. To getdiﬀerent reactions between two virtual nodes a time-multiplexed masking process is applied. The informa-tion fed into the system is preprocessed by multiplying imitations of the recall capabilities in delay based reservoir computing systems 3 a T -periodic mask g on the inputs (see sketch Fig. 1),which is a piecewise constant function consisting of M intervalls, each of length θ . This corresponds to the in-put weights in the spatially extended system with thediﬀerence that now the input weights are distributedover time.Dambre et al. showed in [34] that the computationalcapability of a system can be quantiﬁed completely viaa complete orthonormal set of basis functions on a se-quence of inputs u n = ( . . . , u n − , u n − , u n ) at time n .In this case the index indicates the input ∆n time stepsago. The goal is to investigate how the system trans-forms the inputs u n . For this the chosen basis func-tions z ( u n ), forming a Hilbert space, are constructedand used to describe every possible transformation onthe inputs u n . The system’s capability to approximatethese basis functions is evaluated. Consider the follow-ing examples: The function z ( u n ) = u n − is chosen as atask o n . This is a transformation of the input sequence5 steps back. The question this task asks is, how wellthe system can remember the input 5 steps ago. An-other case would be o n = z ( u n ) = u n − u n − , askinghow well it can perform the nonlinear transformationof multiplying the input 5 steps into the past with theinput 2 steps into the past. A useful quantity to mea-sure the capability of the system is the capacity deﬁnedasC = 1 − NRMSE . (4)A value of 1 corresponds to the system being perfectlycapable of approximating the transformation task and0 corresponds to no capability at all. A simpler method,giving equal results like Eq. (4) developed by Dambreet al. in [34] to calculate C is given byC = o s ( s T s ) − s T o (cid:107) o (cid:107) , (5)where T indicates the transpose of a matrix and − theinverse. We use Eq. (5) to calculate the memory capac-ity.In this paper we use ﬁnite products of normalized Leg-endre polynomials P d ∆n as a full basis of the constructedHilbert space for each input step combination. d is theorder of the used Legendre polynomial and ∆n the ∆n -th step into the past passed as value to the Legendrepolynomial. Multiplying a set of those Legendre poly-nomials gives the target task y { d ∆n } , which yields (seeexample below for clariﬁcation) y { d ∆n } = Π ∆n P d ∆n ( u − ∆n ) . (6)This is directly taken from [34]. It is important thatthe inputs to the system are uniformly distributed ran-dom numbers u n , which are independent and identically drawn in [ − ,

1] to match the used normalized Legendrepolynomials. To calculate the memory capacity

M C d for a degree d , a summation over all possible past inputsets is done M C d = (cid:88) { ∆n } C d { ∆n } , (7)where { ∆n } is the set of past input steps, C d { ∆n } is thecapacity of the system to approximate a speciﬁc trans-formation task z { ∆n } ( u n ) and d is the degree of all Leg-endre polynomials combined in the task z { ∆n } ( u n ). Inthe example from above with z {− , − } ( u n ) = u n − u n − ,it is d = 2 and { ∆n } = {− , − } . For d = 1 we get thewell known linear memory capacity. To compute the to-tal memory capacity, a summation over all degrees d isdone. M C = D (cid:88) d =1 M C d (8)Dambre et al. showed in [34] that the M C is limited bythe readout-dimension M , given here by the number ofvirtual nodes N V .The simulation was written in C++ with standard li-braries used except for linear algebra calculations, whichwere calculated via the library ”Armadillo”. A Runge-Kutta 4th order method was applied to integrate nu-merically the delay-diﬀerential equation given by Eq.(10) with an integration step ∆t = 0 .

01 in time unitsof the system. First, the system was simulated with-out any inputs to let transients decay. Afterwards abuﬀer time was applied with 100000 inputs, that wereexcluded from the training process. Then, the train-ing and testing process itself was done with 250000inputs to have suﬃcient statistics. The tasks are con-structed via Eq. (6) and the corresponding capacities C d { ∆n } were calcualated via Eq. (5). All possible combi-nations of the Legendre polyomials up to degree D = 10and ∆n = 1000 input steps into the past were consid-ered. C d { ∆n } below 0 .

001 were excluded because of ﬁnitestatistics. To calcuate the inverse, the Moore–Penrosepseudoinverse from the C++ linear algebra library ”Ar-madillo” was used.We characterize the performance of our nonlinear os-cillator by evaluating the total memory capacity

M C ,the contributions

M C d as well as the NRMSE of theNARMA10 task. The latter is a benchmark test andcombines memory and nonlinear transformations. It isgiven by an iterative formula A n +1 = 0 . A n + 0 . A n (cid:32) (cid:88) i =0 A n − i (cid:33) + 1 . u n − u n + 0 . . (9) Felix K¨oster et al.

Table 1: Parameters used in the simulation if not statedotherwise.

Parameter Description Value λ pump rate − . η input strength 0 . ω free running frequency 0 . γ nonlinearity − . κ feedback strength 0 . θ feedback phase 0 . N V Number of virtual nodes 50 T Input period time 80

Here, A n is an iteratively given number and u n is an in-dependent and identically drawn uniformly distributedrandom number in [0 , . u n and has to be able to predictthe value of A n +1 , o = A . The reservoir we use forour analysis is a Stuart-Landau oscillator, also calledHopf normal form [36], with delayed feedback. This isa generalized model applicable for all systems operatedclose to a Hopf bifurcation, i.e. close to the onset of in-tensity oscillations. One example would be a laser op-erated closely above threshold [37]. A derivation fromthe Class B rate equations is shown in the Appendix.The equation of motion is given by˙ Z = ( λ + ηgI + iω + γ | Z | ) Z + κe iφ Z ( t − τ ) , (10)and was taken from [18]. Here, Z is a complex dynam-ical variable (in the case of a laser | Z | resembles theintensity), λ is a dimensionless pump rate, η the inputstrength of the information fed into the system via elet-rical injection, g is the masking function, I is the input, ω is the frequency with which the dynamical variable Z rotates in the complex plane without feedback (incase of a laser, this is the frequency of the emitted laserlight), γ the nonlinearity in the system, κ is the feed-back strength, φ the feedback phase and τ the delaytime. The corresponding parameters used in the simu-lations are found in Tab. 1 if not stated otherwise. To get a ﬁrst impression about how the system can re-call inputs from the input sequence u n = ( ...u n − , u n − , u n ), we show the linear recall ca-pacities C { ∆n } in Fig. 2. Here, each set of all inputs { ∆n } consists of only one input step ∆n , because d = 1for which z { ∆n } ( u n ) consists of the Legendre polyno-mial P ( u n − ∆n ) = u n − ∆n . The capacities C { ∆n } areplotted over the step ∆n to be recalled for 3 diﬀerentdelay times τ (blue, orange and green in Fig. 2) while recall steps ∆ n . . . . . C { ∆ n } τ =T τ =3T τ =3.06T Fig. 2: C { ∆n } as deﬁned in Eq. 7 plotted over the ∆n -thinput step to recall for 3 diﬀerent delay times τ . Theinput period time T = 80.the input period time T is kept ﬁxed to 80 and the read-out dimensions N V to 50. These timescale parameterswere chosen to ﬁt the characteristic timescale of the sys-tem, such that the time between two virtual nodes θ islong enough for the system to react, but short enoughsuch that the speeding process is still as high as possi-ble. For input period times T = τ (the blue solid linein Fig. 2) a high capacity is achieved for a few recallsteps after which the recallability drops steadily downto 0 at about the 15-th step ( ∆n = 15) to recall. Thischanges when the input period time reaches values of3 times the delay time τ = 3 T (the orange solid linein Fig. 2). Here, the linear recallability C { ∆n } oscillatesbetween high and low values as a function of n , while itsenvelope steadily decreases until it reaches 0 at aroundthe 35-th ( ∆n = 35) step to be recalled. Consideringthat τ = 3 T is a resonance between the input periodtime T and the delay time τ , one can also take a lookat the case for oﬀ-resonant setups, which is shown bythe green solid line in Fig. 2 with τ ≈ . T . This pa-rameter choice shows a similar behavior as the T = 3 τ one but with higher capacities for short recall steps anda faster decay of the recallability at around the 29-th( ∆n = 29) step.To get a more complete picture, we evaluated the linearcapacity C { ∆n } and quadratic capacities C { ∆n } of thesystem and depicted these as a heatmap over the de-lay time τ and the input steps in Fig. 3 for a constantinput period time T . The x-axis indicates the ∆n -thstep to be recalled while the delay time τ is varied frombottom to top on the y-axis. In Fig. 3(a) the linear ca-pacities C { ∆n } are shown, for which the red horizontalsolid lines indicate the scan from Fig. 2 One can see acontinuous capacity C { ∆n } for τ < T which forks intorays of certain recallable steps ∆n that linearly increasewith the delay time τ . This implies that speciﬁc steps ∆n can be remembered while others inbetween are for- imitations of the recall capabilities in delay based reservoir computing systems 5 recall step ∆ n d e l a y τ (a) C { ∆ n } recall step ∆ n (b) C ,p { ∆ n } n d e l a y τ (c) C { ∆ n } − − − C d { ∆ n } ∆ n Fig. 3: (a) Linear capacity C { ∆n } plotted colorcodedover the delay time τ and the input steps ∆n to re-call. Parameters as given in Tab. 1. The red horizontalsolid lines indicate the scan from Fig. 2. (b) Quadraticpure capacity C ,p { ∆n } . (c) Combination of two legendrepolynomials of degree 1 indicating the capabiltiy of non-linear transformations of the form u − ∆n u − ∆n . Here ∆n of the ﬁrst polynomial is plotted while betweentwo ∆n -steps ∆n is increased from 0 to 45 steps intothe past. Yellow indicates good while blue and black in-dicates bad recallability. The input period time T = 80.gotten, a crucial limitation to the performance of thesystem. Generally the number of steps into the pastthat can be remembered increases with τ (at constant T ), while on the other hand also the gaps inbetween therecallable steps increase. Thus, the total memory capac-ity stays constant. This will be discussed later in Fig. 4.In Fig. 3(b) the pure quadratic capacity C ,p { ∆n } is plot-ted within the same parameter space as in the Fig. 3(a).Pure means that only Legendre polynomials of degree 2,i.e. P ( u n − ∆n ) = (3 u n − ∆n −

1) were considered, ratherthan also considering combinations of two Legendrepolynomials of degree 1, i.e. P ( u n − ∆n ) P ( u n − ∆n ) = u n − ∆n u n − ∆n . In the graph one can see the same be-haviour as for the linear capacities C { ∆n } (Fig. 3(a)),but with less rays and thus less steps that can be re-membered from the past. This indicates that the dy-namical system is not as eﬀective in recalling inputsand additionally transforming them nonlinearily as itis in just recalling them linearily. For the full quadraticnonlinear transformation capacity, all combinations oftwo Legendre polynomials of degree 1 for diﬀerent in-put steps into the past have to be considered, i.e. P ( u n − ∆n ) P ( u n − ∆n ) = u n − ∆n u n − ∆n .This is shown in Fig. 3(c). Again, the capacities C { ∆n } are depicted as a heatmap and the delay time τ is variedalong the y-axis. This time the x-axis shows the steps ofthe ﬁrst Legendre polynomial ∆n , while inbetween twoticks of the x-axis, the second Legendre polynomial’sstep ∆n is scanned from 0 up to 45 steps into the past. M e m . C a p . M C d (a) MC max MCMC MC MC MC delay τ . . . . . E rr o r N R M S E τ = T τ = T τ = T (b) Fig. 4: Total memory capacity

M C as deﬁned by Eq.8 (blue) and memory capacities

M C , , , of degree 1to 4 (orange, green, red, violet) plotted over the delaytime τ for the same parameters as in Fig. 3. Resonancesbetween the clock cycle T and the delay time τ aredepicted as vertical red and green dashed lines. Onecan see the loss in memory capacity at the resonances,especially for degree 2. Higher order transformationswith d > τ < . T .For the steps of the second Legendre polynomial ∆n the capacity exhibits the same behaviour as alreadydiscussed for Fig. 3(a) and (b). This does also applyto the ﬁrst Legendre polynomial which induces inter-ference patterns in the capacity space of the two com-bined legendre polynomials. The red dashed lines high-light the ray behaviour of the ﬁrst Legendre polynomial.We therefore learn that the performance of a reservoircomputer described by a Hopf normal form with de-lay drastically depends on the task. There are certainnonlinear transformation combinations u − ∆n u − ∆n ofthe inputs u − ∆n and u − ∆n which cannot be approxi-mated due to the missing memory at speciﬁc steps. Toovercome these limitations it would be recommended touse multiple systems with diﬀerent parameters to com-pensate for each other.To fully characterize the computational capabilities ofour reservoir computer, a full analysis of the degree d memory capacities M C d and the total memory capac-ity M C as deﬁned in Eq. (8) is done. The results aredepicted in Fig. 4(a) as a function of the delay time τ . All other parameters are ﬁxed as in Fig. 3. The or-ange solid line in Fig. 4 referes to the linear, the green,red and violet lines to the quadratic, cubic and quarticmemory capacity M C , , , , respectively. The blue solidline shows the total memory capacity M C summed upover all degrees up to 10. Dambre et al. showed in [34]that the

M C is limited by the number of read-out di-

Felix K¨oster et al. (c) MC i npu t p e r i o d T (a) MC (b) MC

100 200 300 400 delay τ i npu t p e r i o d T (d) MC

100 200 300 400 delay τ (e) MC

100 200 300 400 delay τ (f) NRMSE . . . . . Fig. 5: (a) Linear memory capacity MC plotted colorcoded over the delay time τ and the input period time T . (b)Degree 2, MC . (c) Total memory capacity, MC. (d) Degree 3, MC . (e) Degree 4, M C . (f) NARMA10 predictionerror NRMSE.mensions and equals it when all read-out dimensions arelinearly independent. In our case the read-out dimen-sion is given by the number of virtual nodes N V = 50.Nevertheless, the total memory capacity M C starts ataround 15 for a very short time delay with respect tothe input period time T . This low value arises fromthe fact that a short delay induces a high correlationbetween the responses of the dynamical system whichinduces highly linearly dependent virtual nodes. Thisis an important general result that has to be kept inmind for all delay based reservoir computing systems:With τ < . T the capability of the reservoir com-puter is partially waisted. Increasing the delay time τ also increases the total memory capacity M C reachingthe upper bound of 50 at around 1 . T .For τ > . T an interesting behaviour emerges. De-picted by the vertical red dashed lines are multiples ofthe input period time T at which the total memory ca-pacity M C drops again signiﬁcantly to around 40. Adrop in the linear memory capacity was discussed inthe paper by Stelzer et al. [38] and explained by thefact that resonances between the delay time τ and theinput period time T concludes in a sparse connectionbetween the virtual nodes. Our results now show thatthis eﬀects the total memory capacity M C , by mainlyreducing the quadratic memory capacity

M C . At theresonances the quadratic nonlinear transformation ca-pability of the system is reduced. To conclude, delaybased reservoir computing systems should be kept oﬀthe resonances between T and τ to maximize the com-putational capability. A suprising result is that for the chosen Hopf nonlinearity the linear memory capacity M C is only slightly inﬂuenced by the resonances. Aresult from Dambre et al. in [34] and analysed by In-ubushi et al. in [39] showed that a trade-oﬀ betweenthe linear recallability and the nonlinear transforma-tion capability exists. This is clearly only the case ifthe theoretical limit of the total memory capacity M C is reached and kept constant, thus every change in thelinear memory capacity

M C has to induce a changein the nonlinear memory capacities M C d , d >

1. Inthe case of resonances, a decrease in the total memorycapacity

M C happens and thus this loss can be dis-tributed in any possible way over the diﬀerent memorycapacities

M C d . In our case, we see that the inﬂuenceon the quadratic memory capacity M C is highest.The system is capable of a small amount of cubic trans-formations, depicted by the solid red line in Fig. 4(a),which also decreases at the resonances in a similar wayas the quadratic contribution does. Higher order mem-ory capacities M C d , with d >

3, have only small con-tributions for short delay times τ , dropping to 0 forincreased time delay τ . A possible explanation is thefact that short delays induce an interaction of the lastinput directly with itself for k = Tτ times, dependingon the ratio between τ and T . As a result, short delaytimes τ enable highly nonlinear tasks in expense of alower total memory capacity M C .For more insights into the computing capabilities of ournonlinaer oscillator we now also discuss the NARMA10time series prediction task, shown in Fig. 4(b). Com-paring the memory capacities

M C d to the NARMA10computation error NRMSE in 4(b), a small increase in imitations of the recall capabilities in delay based reservoir computing systems 7 the NARMA10 NRMSE can be seen at the resonanceswith nτ = mT , where n ∈ [0 , , ... ] and m ∈ [0 , , ... ].For a systematic characterization a scan of the inputperiod time T and the delay time τ was done and thetotal memory capacity M C (Fig. 5(c)), the memory ca-pacities of degrees 1-4

M C , , , (Fig. 5(a,b,d,e)) andthe NARMA10 NRMSE (Fig. 5(f)) were plotted color-coded over the two timescales. This is an extension ofthe results of R¨ohm et al. in [19], where only the lin-ear memory capacity and the NARMA10 computationerror were analysed. For short time delays τ and pe-riod input times T the memory capacities of degree 1-3 M C , , and the total memory capacity M C are signif-icantly below the theoretical limit of 50 as already seenin the results from Fig. 4, while the NARMA10 NRMSEalso has high errors of around 0 .

8. This comes from thefact that short input period times T also mean shortvirtual node distances θ , which induces a high linearcorrelation between the read-out dimensions. Degree 4on the other hand only has values as long as T > τ ,a result coming from the fact that the input u n has tointeract with itself to get a transformation of degree4. A possible explanation comes from the fact that thedynamical system itself is not capable of transforma-tions higher than degree 3, since the highest order inEq. (10) is 3. If the delay time τ and the input periodtime T are long enough the total memory capacity M C reaches 50 with exceptions of resonances between τ and T . These resonances are also seen in the NARMA10NRMSE for which higher errors occur. Looking at thememory capacity of degree 1 and 2 M C , and compar-ing it with the NARMA10 NRMSE one can see a ten-dency in which the NARMA10 NRMSE is lowest whereboth have the highest capacities, raising from the factthat the NARMA10 task is highly dependent on linearmemory and quadratic nonlinear transformations. Thiscan also be seen in the area below the τ = T -resonance.To conclude, one can use the parameter dependenciesof the memory capacities M C d to make predictions ofthe reservoirs capability to approximate certain tasks. We analysed the memory capacities and nonlinear trans-formation capabilities of a reservoir computer consist-ing of an oscillatory system with delayed feedback op-erated close to a Hopf bifurcation, i.e. a paradigmaticmodel also applicable for lasers close to threshold. Wesystematically varied the timescales and found regionsof high and low reservoir computing performing abili-ties. Resonances between the information input periodtime T and the delay time τ should be avoided to fully utilize the natural computational capability of the non-linear oscillator. A ratio of τ = 1 . T was found to be theoptimal for the computed memory capacities, resultingin a good NARMA10 task approximation. Furthermore,it was shown, that the recallability for high delay times τ (cid:29) T is restricted to speciﬁc past inputs, which rulesout certain tasks. By computing the memory capacitiesof a Hopf normal form, one can make general assump-tions about the reservoir computing capabilities of anysystem operated close to a Hopf bifurcation. This signif-icantly helps in understanding and predicting the taskdependence of reservoir computers. Acknowledgements

The authors would like to thank Andr´eR¨ohm, Joni Dambre and David Hering for fruitfull discussion.

Funding Information

This study was funded by the ”DeutscheForschungsgemeinschaft” (DFG) in the framework of SFB910.

Conﬂict

The Authors declare that they have no conﬂict ofinterest.

Ethical approval

This article does not contain any studieswith human participants or animals performed by any of theauthors.

References

1. H. Jaeger, The ’echo state’ approach to analysingand training recurrent neural networks. GMD Report148, GMD - German National Research Institute forComputer Science (2001). doi: publica.fraunhofer.de/documents/b-73135.html2. W. Maass, T. Natschl¨ager, H. Markram, Neural Comp. , 2531 (2002). doi: doi:10.1162/0899766027604079553. S. Hochreiter, International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems , 107 (1998).doi: 10.1142/S02184885980000944. C. Fernando, S. Sojakka, in Advances in Artiﬁcial Life(2003), pp. 588–597. doi: 10.1007/978-3-540-39432-7 635. P. Antonik, F. Duport, M. Hermans, A. Smerieri,M. Haelterman, S. Massar, IEEE Trans. Neural Netw.Learn. Syst. (11) (2016). doi: 10.1109/tnnls.2016.25986556. K. Dockendorf, I. Park, P. He, J.C. Principe, T.B. De-Marse, Biosystems (2) (2009). doi: doi:10.1016/j.biosystems.2008.08.0017. M. Bauduin, A. Smerieri, S. Massar, F. Horlin, in2015 IEEE 81st Vehicular Technology Conference (VTCSpring) (2015)8. L. Keuninckx, J. Danckaert, G. Van der Sande, Cogn.Comput. (3) (2017). doi: 10.1007/s12559-017-9457-59. S. Scardapane, A. Uncini, Cognitive Computation ,125–135 (2017). doi: 10.1007/s12559-016-9439-z10. A. Argyris, J. Bueno, M.C. Soriano, I. Fis-cher, in 2017 Conf. on Lasers and Electro-OpticsEurope European Quantum Electronics Conference(CLEO/Europe-EQEC) (2017), p. 1. doi:10.1109/cleoe-eqec.2017.8086463 Felix K¨oster et al.11. P. Amil, M.C. Soriano, C. Masoller, Chaos: An Inter-disciplinary Journal of Nonlinear Science (11), 113111(2019). doi: 10.1063/1.5120755. URL https://doi.org/10.1063/1.5120755

12. A. Cunillera, M.C. Soriano, I. Fischer, Chaos: An Inter-disciplinary Journal of Nonlinear Science (11), 113113(2019). doi: 10.1063/1.5120822. URL https://doi.org/10.1063/1.5120822

13. L. Larger, M.C. Soriano, D. Brunner, L. Appeltant, J.M.Gutierrez, L. Pesquera, C.R. Mirasso, I. Fischer, Opt.Express (3), 3241 (2012). doi: 10.1364/oe.20.00324114. Y. Paquot, F. Duport, A. Smerieri, J. Dambre,B. Schrauwen, M. Haelterman, S. Massar, Sci. Rep. (287) (2012). doi: 10.1038/srep0028715. D. Brunner, M.C. Soriano, C.R. Mirasso, I. Fischer, Nat.Commun. , 1364 (2013). doi: 10.1038/ncomms236816. Q. Vinckier, F. Duport, A. Smerieri, K. Vandoorne, P. Bi-enstman, M. Haelterman, S. Massar, Optica (5) (2015).doi: 10.1364/optica.2.00043817. R.M. Nguimdo, E. Lacot, O. Jacquin, O. Hugon,G. Van der Sande, H.G. de Chatellus, Opt. Lett. (3)(2017). doi: 10.1364/ol.42.00037518. A. R¨ohm, K. L¨udge, J. Phys. Commun. , 085007 (2018).doi: 10.1088/2399-6528/aad56d19. A. R¨ohm, L.C. Jaurigue, K. L¨udge, IEEE J. Sel. Top.Quantum Electron. (1), 7700108 (2019). doi: 10.1109/jstqe.2019.292757820. L. Appeltant, M.C. Soriano, G. Van der Sande, J. Danck-aert, S. Massar, J. Dambre, B. Schrauwen, C.R. Mi-rasso, I. Fischer, Nat. Commun. , 468 (2011). doi:10.1038/ncomms147621. S. Ortin, L. Pesquera, Cognitive Computation (3), 327(2017). doi: 10.1007/s12559-017-9463-722. G. Dion, S. Mejaouri, J. Sylvestre, J. Appl. Phys. (15), 152132 (2018). doi: 10.1063/1.503803823. D. Brunner, B. Penkovsky, B.A. Marquez, M. Jacquot,I. Fischer, L. Larger, J. Appl. Phys. (15), 152004(2018). doi: 10.1063/1.504234224. Y. Chen, L. Yi, J. Ke, Z. Yang, Y. Yang, L. Huang,Q. Zhuge, W. Hu, Opt. Express (20), 27431 (2019).doi: 10.1364/oe.27.02743125. Y.S. Hou, G.Q. Xia, W.Y. Yang, D. Wang,E. Jayaprasath, Z. Jiang, C.X. Hu, Z.M. Wu, Opt.Express (8), 10211 (2018). doi: 10.1364/oe.26.01021126. C. Sugano, K. Kanno, A. Uchida, IEEE J. Sel. Top.Quantum Electron. (1), 1500409 (2020). doi: 10.1109/jstqe.2019.292917927. J. Bueno, D. Brunner, M.C. Soriano, I. Fischer, Opt. Ex-press (3), 2401 (2017). doi: 10.1364/oe.25.00240128. Y. Kuriki, J. Nakayama, K. Takano, A. Uchida, Opt. Ex-press (5), 5777 (2018). doi: 10.1364/oe.26.00577729. A. Argyris, J. Cantero, M. Galletero, E. Pereda, C.R. Mi-rasso, I. Fischer, M.C. Soriano, IEEE J. Sel. Top. Quan-tum Electron. (1), 5100309 (2020). doi: 10.1109/jstqe.2019.293694730. L. Larger, A. Bayl´on-Fuentes, R. Martinenghi, V.S.Udaltsov, Y.K. Chembo, M. Jacquot, Phys. Rev. X ,011015 (2017). doi: 10.1103/physrevx.7.01101531. K. Harkhoe, G. Van der Sande, Photonics (4) (2019).doi: 10.3390/photonics604012432. D. Brunner, M. Soriano, G. Van der Sande, J. Dambre,P. Bienstman, L. Larger, L. Pesquera, S. Mas-sar, PHOTONIC RESERVOIR COMPUTING OpticalRecurrent Neural Networks (2019)33. G. Van der Sande, D. Brunner, M.C. Soriano, Nanopho-tonics (3), 561 (2017). doi: https://doi.org/10.1515/nanoph-2016-0132 34. J. Dambre, D. Verstraeten, B. Schrauwen, S. Massar, Sci.Rep. , 514 (2012). doi: 10.1038/srep0051435. J.H. Williams., Quantifying measurement : the tyrannyof numbers (Morgan & Claypool Publishers, UK, 2016).doi: http://iopscience.iop.org/book/978-1-6817-4433-936. E. Sch¨oll, H.G. Schuster (eds.), Handbook of ChaosControl (Wiley-VCH, Weinheim, 2008). Second com-pletely revised and enlarged edition37. T. Erneux, P. Glorieux, Laser Dynamics (Cambridge Uni-versity Press, UK, 2010). doi: https://doi.org/10.1017/cbo978051177690838. F. Stelzer, A. R¨ohm, K. L¨udge, S. Yanchuk, Neural Netw. , 158 (2020). doi: https://doi.org/10.1016/j.neunet.2020.01.01039. M. Inubushi, K. Yoshimura, Sci. Rep. , 10199 (2017) Derivation of the Stuart-Landau Equation with delay fromthe Class-B laser rate equations˙ E = (1 + iα ) EN (11)˙ N = 1 T ( P + ηgI − N − (1 + 2 N ) | E | ) , (12)where E is the non-dimensionalized complex eletrical ﬁeldand N the non-dimensionalized carrier inversion, P the pumprelativ to the threshold for P thresh = 0 and α the Henryfactor. The reservoir computing signal is fed into the systemvia electrical injection ηgI . If fast carriers are considered, anadiabatic elimination of the charge carriers yields0 = 1 T ( P + ηgI − N − (1 + 2 N ) | E | ) (13) N = P + ηgI − | E | | E | (14),which after substituting into Eq. (11) gives˙ E = (1 + iα ) E ˜ P − | E | | E | , (15)where we introduced the quantity ˜ P = P + ηgI for conve-nience purposes. This equation yields the full Class A rateequation for the non-dimensionalized complex electric ﬁeld.Simulations with the full Class A rate equation close to thethreshold show similar results to the reduced case.Because we consider laser that are operated close to the thresh-old level, a taylor expansion of the denominator for | E | ≈ E = (1 + iα ) E ( ˜ P − | E | − P | E | ) , (16)where we set | E | ≈ P and the intensity | E | are of Order O ( (cid:15) ), where (cid:15) is a smallfactor. This holds true only if the input signal ηgI is a smallelectrical injection. After applying this the equation is givenby˙ E = (1 + iα ) E ( ˜ P − | E | ) (17)We can substitute ˜ P = P + ηgI back into the equation, changethe rotating frame of the laser by setting E = Ze − i ( ω − α ˜ P ) t imitations of the recall capabilities in delay based reservoir computing systems 9and introduce a complex factor γ = − (1 + iα ) that scales thenonlinearity˙ Z = Z ( P + ηgI + iω + γ | Z | ) , (18)By addding feedback κe φ Z ( t − τ ) to the system one arrivesat Eq. (10).˙ Z = Z ( P + ηgI + iω + γ | Z | ) + κe φ Z ( t − τ ) ,,