[PDF] Unveiling the role of plasticity rules in reservoir computing

Abstract

Reservoir Computing (RC) is an appealing approach in Machine Learning that combines the high computational capabilities of Recurrent Neural Networks with a fast and easy training method. Likewise, successful implementation of neuro-inspired plasticity rules into RC artificial networks has boosted the performance of the original models. In this manuscript, we analyze the role that plasticity rules play on the changes that lead to a better performance of RC. To this end, we implement synaptic and non-synaptic plasticity rules in a paradigmatic example of RC model: the Echo State Network. Testing on nonlinear time series prediction tasks, we show evidence that improved performance in all plastic models are linked to a decrease of the pair-wise correlations in the reservoir, as well as a significant increase of individual neurons ability to separate similar inputs in their activity space. Here we provide new insights on this observed improvement through the study of different stages on the plastic learning. From the perspective of the reservoir dynamics, optimal performance is found to occur close to the so-called edge of instability. Our results also show that it is possible to combine different forms of plasticity (namely synaptic and non-synaptic rules) to further improve the performance on prediction tasks, obtaining better results than those achieved with single-plasticity models.

Full PDF

UUnveiling the role of plasticity rules in reservoir computing

Guillermo B. Morales a,b , Claudio R. Mirasso a and Miguel C. Soriano a , ∗ a Instituto de Física Interdisciplinar y Sistemas Complejos (IFISC, UIB-CSIC), Campus Universitat de les Illes Balears E-07122, Palma de Mallorca, Spain b Instituto Carlos I de Física Teórica y Computacional, Facultad de Ciencias, Campus de Fuentenueva 18071 Granada, Spain

A R T I C L E I N F O

Keywords :Reservoir ComputingPlasticityHebbian learningIntrinsic plasticityNonlinear time series prediction.

A B S T R A C T

Reservoir Computing (RC) is an appealing approach in Machine Learning that combines the high com-putational capabilities of Recurrent Neural Networks with a fast and easy training method. Likewise,successful implementation of neuro-inspired plasticity rules into RC artiﬁcial networks has boostedthe performance of the original models. In this manuscript, we analyze the role that plasticity rulesplay on the changes that lead to a better performance of RC. To this end, we implement synapticand non-synaptic plasticity rules in a paradigmatic example of RC model: the Echo State Network.Testing on nonlinear time series prediction tasks, we show evidence that improved performance inall plastic models are linked to a decrease of the pair-wise correlations in the reservoir, as well asa signiﬁcant increase of individual neurons ability to separate similar inputs in their activity space.Here we provide new insights on this observed improvement through the study of diﬀerent stages onthe plastic learning. From the perspective of the reservoir dynamics, optimal performance is found tooccur close to the so-called edge of instability.Our results also show that it is possible to combine diﬀerent forms of plasticity (namely synapticand non-synaptic rules) to further improve the performance on prediction tasks, obtaining better resultsthan those achieved with single-plasticity models.

1. Introduction

From the ﬁrst bird-inspired “ﬂying machines” of Leonardoda Vinci to the latest advances in artiﬁcial photosynthesis,humankind has constantly sought to mimic nature in orderto solve complex problems. It is therefore not surprisingthat the dawn of Machine Learning (ML) and Artiﬁcial Neu-ral Networks (ANN) was also characterized by the idea ofemulating the functionalities and characteristics of the hu-man brain. Within his book

The Organization of Behavior ,Donald Hebb proposed in 1949 a neurophysiological modelof neuron interactions that attempted to explain the way as-sociative learning takes place [1]. Theorizing on the basisof synaptic plasticity, Hebb suggested that the simultane-ous activation of cells would lead to the reinforcement ofthe involved synapses, a hypothesis often summarized in thetoday’s well-known statement: “neurons that ﬁre together,wire together”. Thus, Hebbian theory was swiftly taken byneurophysiologists and early brain modelers as the founda-tion upon which to build the ﬁrst working artiﬁcial neuralnetwork. In 1950, Nat Rochester at the IBM research labembarked in the project of modeling an artiﬁcial cell assem-bly following Hebb’s rules [2]. However, he would soon bediscouraged by an obvious ﬂaw in Hebb’s initial theory: asconnection strength increases with the learning process, neu-ral activity eventually spreads across the whole assembly,saturating the network.It would not be until 1957 when Frank Rosenblatt —whohad previously read

The Organization of Behavior and soughtto ﬁnd a more “model-friendly” version of Hebb’s assem- ∗ Corresponding author [email protected] (G.B. Morales); [email protected] (C.R. Mirasso); [email protected] (M.C. Soriano)

ORCID (s): (G.B. Morales); (C.R. Mirasso); (M.C. Soriano) bly— came with a solution: the Perceptron, the ﬁrst exam-ple of a Feed Forward Neural Network (FFNN) [3]. Dis-missing the idea of a homogeneous mass of cells, Rosenblattintroduced three diﬀerent types of units within the network,which would correspond today to what is usually known asinput, hidden and output layers in a FFNN. Mathematically,the output of the perceptron is computed as: 𝑓 ( 𝑥 ) = { 𝑖𝑓 𝑤 ⋅ 𝑥 + 𝑏 > 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (1)where 𝑤 ⋅ 𝑥 is the dot product of the input 𝑥 with theweight vector 𝑤 and 𝑏 is a bias term that acts like a mov-ing threshold. In modern FFNNs, the step function is usu-ally substituted by a non-linearity 𝜑 ( 𝑤 ⋅ 𝑥 + 𝑏 ) which re-ceives the name of activation function . Being computation-ally more applicable than the original ideas of Hebb, Rosen-blatt paved the way that would progressively detach ML fromits biological inspiration.Despite the initial excitement, in 1969 Marvin Minskyand Seymour Papert proved that perceptrons could only betrained to recognize linearly separable patterns [4]. The au-thors already foresaw the need for Multilayer Perceptrons(MLP) to tackle non-linear classiﬁcation problems, but thelack of suitable learning algorithms lead to the ﬁrst of theAI winters [5], with neural network research stagnating formany years. The thaw would not arrive until 1974 with theadvent of today’s widely known backpropagation algorithms[6, 7]. Understood as a supervised learning method in multi-layer networks, backpropagation aims at adjusting the inter-nal weights in each layer to minimize the error or loss func-tion at the output using a gradient-descent approach. Despitetheir success in tasks as diverse as speech recognition, nat-ural language processing, medical image analysis or boardgame programs; backpropagation methods lack of a corre- G. B. Morales et al.:

Preprint submitted to Elsevier

Page 1 of 12 a r X i v : . [ n li n . AO ] J a n nveiling the role of plasticity rules in RC sponding biological representation. Instead, ANN that aimto resemble the biology behind the operation of the humanbrain ought to include neurons that send feedback signalsto each other. This is the idea behind a Recurrent NeuralNetwork (RNN). Whereas FFNNs are able to approximatea mathematical function, RNNs can approximate dynamicalsystems —i.e. functions with an added time component—so that the same input can result in a diﬀerent output at dif-ferent time steps [8].It is within this context when two fundamentally new ap-proaches to RNNs appeared independently: the Echo StateNetwork (ESN) [9] and the Liquid State Machine (LSM)[10], both constituting trailblazing models of what today isknown as the Reservoir Computing (RC) paradigm. Thesemodels are particularly fast and computationally much lessexpensive since training happens only at the output layerthrough the adjustment of the readout weights. Althoughvery ﬂexible, this approach also leaves the open question ofhow to choose the reservoir connectivity to maximize per-formance. While most reservoir computing approaches con-sider a reservoir with ﬁxed internal connection weights, plas-ticity was rediscovered as an unsupervised, biologically in-spired adaptation to implement an adaptive reservoir. It ap-peared ﬁrst as a type of Hebbian synaptic plasticity to modifythe reservoir weights [11], but soon the ideas of nonsynapticplasticity that inspired the ﬁrst Intrinsic Plasticity (IP) rule[12] were also implemented in an Echo State Network [13].After that, many diﬀerent models of plasticity rules havebeen implemented in RC networks with promising results[14, 15, 16]. Today, the fact that biologically meaningfullearning algorithms have a place in these models, togetherwith recent discoveries suggesting that biological neural net-works display RCs’ properties [17, 18], make reservoir com-puting a ﬁeld of machine learning in continuous growth.Echo State Networks have been shown to successfullyperform in a wide number of tasks, ranging from speechrecognition [19], channel equalization [20], or robot control[21], to stock data mining [22]. Here, we will focus in thechallenging problem of chaotic time series forecasting. Thistype of task has been addressed for a large number of dif-ferent time series [23, 24, 15, 25], and ESNs implementingplasticity rules to improve time series forecasting have beentreated before in [15, 25, 11]. Nevertheless, in this paper wewill move away from the ﬁnest-performance approach, fo-cusing instead in understanding how unsupervised learningthrough plasticity rules aﬀects the ESN architecture in a waythat boosts its performance.The paper is structured as follows. The Methods sectionincludes the standard deﬁnition of the ESN, the models con-sidered for synaptic and intrinsic forms of plasticity, as wellas the measures that will be employed for performance char-acterization. We consider the so-called anti-Hebbian typesof learning rules for synaptic plasticity, which in our casemeans that neurons’ activity at subsequent times tend to be-come decorrelated. As for intrinsic plasticity, we modify theparameters of the response functions of individual neurons toaccommodate to a target Gaussian distribution function. In Figure 1:

Architecture and functioning of a basic Echo StateNetwork for a one-step-ahead chaotic time series predictiontask. the Results section, we ﬁnd that the best performance is usu-ally obtained by employing a combination of both, synap-tic and intrinsic plasticity, thus revealing the emergence ofsynergistic eﬀects. Finally, we discuss the inﬂuence of theplasticity rules on the dynamical response of the individualneurons as well as on the global activity of the reservoir.

2. Methods

The basic architecture of an ESN model is made of threelayers: an input layer, a hidden layer or reservoir, and anoutput layer. Fig. 1 illustrates the ESN architecture, wherewe already particularized the more general concept for twoinput units —one feeding a point of the series at each discretetime step and a second one acting as a bias— and one outputneuron.In Fig. 1, points 𝑢 ( 𝑡 ) ∈ ℝ in the temporal series are fedas input after being multiplied by a weight matrix 𝑊 𝑖𝑛 ∈ ℝ 𝑁 𝑥 ×2 . The internal connections between neurons in thereservoir are deﬁned by 𝑊 𝑟𝑒𝑠 ∈ ℝ 𝑁 𝑥 × 𝑁 𝑥 , where 𝑁 𝑥 is thenumber of neurons in the reservoir. The states of the neuronsin the reservoir produce the ﬁnal output after multiplying byan output weight matrix 𝑊 𝑜𝑢𝑡 ∈ ℝ 𝑁 𝑥 . Thus, the networkdynamics for the reservoir and readout states are given by: 𝐱 ( 𝑡 ) = 𝑡𝑎𝑛ℎ ( 𝜖𝑊 𝑖𝑛 [1; 𝑢 ( 𝑡 )] + 𝑊 𝑟𝑒𝑠 𝐱 ( 𝑡 − 1)) , (2) 𝑦 ( 𝑡 ) = 𝑊 𝑜𝑢𝑡 𝐱 ( 𝑡 ) , (3)where 𝜖 is the input scaling, and 𝑊 𝑖𝑛 and 𝑊 𝑟𝑒𝑠 are often ran-domly initialized. Here we chose the hyperbolic tangent asour activation function, but it could be in general any nonlin-ear function. Using a supervised learning scheme, the goalis to generate an output 𝑦 ( 𝑡 ) ∈ ℝ that not only matches asclosely as possible the desired target 𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 ( 𝑡 ) ∈ ℝ but canalso generalize to unseen data. Because large output weightsare commonly associated to overﬁtting of the training data[26], it is a common practice to keep their values low byadding a regularization term to the error in the target re-construction. Although several regularization methods have G. B. Morales et al.:

Preprint submitted to Elsevier

Page 2 of 12nveiling the role of plasticity rules in RC been proposed [9, 27, 28], here we use the Ridge regressionmethod, for which the error is deﬁned as: 𝐸 𝑟𝑖𝑑𝑔𝑒 = 1 𝑇 𝑇 ∑ 𝑡 =1 ( 𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 ( 𝑡 ) − 𝑦 ( 𝑡 ) ) + 𝛽 ‖‖ 𝑊 𝑜𝑢𝑡 ‖‖ , (4)where ‖ ⋅ ‖ stands for the Euclidian norm, 𝛽 is the regulariza-tion coeﬃcient and 𝑇 is the total number of points in thetraining set. Notice that choosing 𝛽 = 0 removes the regu-larization, turning the ridge regression into a generalized lin-ear regression problem. After training, the expression for theoptimal readout weights 𝑊 𝑜𝑢𝑡𝑜𝑝𝑡 can be easily obtained —min-imizing the above error— as: 𝑊 𝑜𝑢𝑡 = 𝑌 𝑡𝑎𝑟𝑔𝑒𝑡 𝑋 𝑇 ( 𝑋𝑋 𝑇 + 𝛽𝐼 ) −1 , (5)where 𝐼 is the identity matrix, 𝑌 𝑡𝑎𝑟𝑔𝑒𝑡 ∈ ℝ 𝑇 containsall output targets 𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 ( 𝑡 ) , and 𝑋 ∈ ℝ ( 𝑁 𝑢 + 𝑁 𝑥 ) × 𝑇 consistsof all concatenated vectors [1; 𝑢 ( 𝑡 ); 𝑥 ( 𝑡 )] . It is worth notic-ing that the standard training in ESNs focuses on the opti-mization of the readout weights, 𝑊 𝑜𝑢𝑡 , but does not modifythe initial reservoir, which is usually considered randomlyconnected. A natural step forward is then to optimize theweights of the reservoir connections, 𝑊 𝑟𝑒𝑠 , or the excitabil-ity of the neurons according to the inputs reaching the reser-voir, that is, to introduce rules of neuronal plasticity. The term plasticity has been used in brain science forwell over a century to refer to the suspected changes in neu-ral organization that may account for various forms of be-havioral changes, either short- or long-lasting [29]. Froma biological point of view, mechanisms of plasticity in thebrain can be grouped into two large categories: synapticand nonsynaptic. Synaptic plasticity deals directly with thestrength of the connection between neurons, which is linkedto the amount of neurotransmitter released from the presy-naptic neuron and the response generated in the postsynap-tic channels [30, 31, 32]. Nonsynaptic plasticity, instead, in-volves modiﬁcation of the intrinsic excitability of the neuronitself, operating through structural changes that usually af-fect voltage-dependent membrane conductances in the axon,dendrites or soma [33, 34].Likewise, but now from the perspective of RC, plastic-ity rules aim to modify either the weights of the connections 𝑊 𝑟𝑒𝑠 (synaptic plasticity) or the excitability of the reservoirunits (nonsynaptic plasticity) based on the activity stimu-lated by the input. In this manner, the information carriedby the input signal is partly embedded in the reservoir. Al-though non-Hebbian forms of synaptic plasticity have beenfound empirically [35, 36], most rules modifying the synap-tic strength among neurons fall into the category of Heb-bian learning. The Hebbian rule, as originally proposed byHebb [1], can be described mathematically as a change in thesynaptic strength between two neurons that is proportional tothe product of the pre and post-synaptic activities at time 𝑡 : 𝑤 𝑘𝑗 ( 𝑡 + 1) = 𝑤 𝑘𝑗 ( 𝑡 ) + 𝜂𝑥 𝑗 ( 𝑡 ) 𝑥 𝑘 ( 𝑡 + 1) , (6) where 𝑤 𝑘𝑗 is the weight of a synapse connecting neuronsk and j —with j triggering the activity of k—; 𝑥 𝑗 ( 𝑡 ) and 𝑥 𝑘 ( 𝑡 + 1) represent the activity of the pre and post-synapticneurons, 𝜂 is a parameter accounting for the learning rate,and all weights in the reservoir are updated in parallel at eachdiscrete time step. Notice that we refer to matrices using cap-ital letters and denote with small letters their elements. Thegrowth of the weights in the direction of the correlations be-tween pre and post-synaptic units has an obvious ﬂaw: as theconnections get stronger following Hebb’s postulate, activitywill eventually spread and increase uncontrollably through-out the network. To avoid this, one possibility is to normal-ize the weights arriving to each post-synaptic neuron 𝑥 𝑘 , sothat √∑ 𝑗 𝑤 𝑘𝑗 = 1 . We can then rewrite the update rule inEq. 6 as: 𝑤 𝑘𝑗 ( 𝑡 + 1) = 𝑤 𝑘𝑗 ( 𝑡 ) + 𝜂𝑥 𝑘 ( 𝑡 + 1) 𝑥 𝑗 ( 𝑡 ) √∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) + 𝜂𝑥 𝑘 ( 𝑡 + 1) 𝑥 𝑗 ( 𝑡 ) ) . (7)Note that Eq. 7 is non-local (NL), meaning that a modiﬁca-tion in a given weight 𝑤 𝑘𝑗 also depends on other neurons inaddition to the connected neurons 𝑘 and 𝑗 . Finally, assuminga small learning rate 𝜂 and linear activation functions in theabsence of external inputs, Oja derived a local approxima-tion to Eq. 7, known today as Oja’s rule [37]: 𝑤 𝑘𝑗 ( 𝑡 +1) = 𝑤 𝑘𝑗 ( 𝑡 )+ 𝜂𝑥 𝑘 ( 𝑡 +1) ( 𝑥 𝑗 ( 𝑡 ) + 𝑥 𝑘 ( 𝑡 + 1) 𝑤 𝑘𝑗 ( 𝑡 ) ) . (8)It has been suggested that a change in the sign of Hebbianplasticity rules may be advantageous in making an eﬀectiveuse of the dynamic range of cortical neurons [38], while alsopromoting decorrelation between the activity induced by dif-ferent inputs.Therefore, in this paper, we will work with such so-calledanti-Hebbian learning rules, which are obtained simply bychanging the signs of the weight update in Eqs. 6, 7, and8. The precise writing of the anti-Hebbian learning rulesused here and a complete derivation of the anti-Oja rule canbe found in App. B. For the sake of clarity we stress that,from a practical point of view, the synaptic strengths 𝑤 𝑘𝑗 updated with the plastic rules correspond to the reservoirweights 𝑤 𝑟𝑒𝑠𝑘𝑗 of our ESN models.Although there are examples of the anti-Oja rule appliedto ESNs with nonlinear activation functions [11, 15], Eq. 8is strictly valid only when the state of the post-synaptic neu-rons is a linear combination of the pre-synaptic states in theform 𝑥 𝑘 ( 𝑡 + 1) = ∑ 𝑗 𝑤 𝑘𝑗 𝑥 𝑗 ( 𝑡 ) , which is no longer true inthe presence of nonlinear neurons. In order to evaluate theinﬂuence of the local approximation derived by Oja, we willcompare the performance obtained by using Eq. 7 (with theminus sign, see Eq. 16) and the one obtained by using Eq.8 (with the minus sign, see Eq. 17). We will show in theResults section that the NL anti-Hebbian rule outperformsthe anti-Oja rule in chaotic time series prediction tasks.We now consider the intrinsic plasticity (IP), which ad-justs the neurons’ internal excitability instead of the indi-vidual synapses. Based on the idea that every single neuron G. B. Morales et al.:

Preprint submitted to Elsevier

Page 3 of 12nveiling the role of plasticity rules in RC intends to maximize its information transmission while min-imizing its energy consumption, Jochen Triesch proposed amathematical learning rule that leads to maximum entropydistributions for the neurons output with certain ﬁxed mo-ments [12]. Although the original derivation of Triesch ap-plied to Fermi activation functions and exponential desireddistributions, soon Schrauwen et al. [13] extended the ruleto account for neurons with hyperbolic tangent functions. Inthis case, each neuron updates its state through the followingexpression: 𝑥 𝑘 ( 𝑡 ) = 𝑡𝑎𝑛ℎ ( 𝑎 𝑘 𝑧 𝑘 ( 𝑡 ) + 𝑏 𝑘 ) (9)where 𝑎 𝑘 and 𝑏 𝑘 are the gain and bias of the post-synapticneuron, and 𝑧 𝑘 ( 𝑡 ) = 𝜖𝑤 𝑖𝑛𝑘 [1; 𝑢 ( 𝑡 )] + ∑ 𝑗 𝑤 𝑖𝑗 𝑥 𝑗 ( 𝑡 − 1) is the to-tal arriving input. The minimization of the Kullback-Leiblerdivergence with respect to a desired Gaussian output distri-bution with a given mean and variance leads to the followingonline learning rules for the gain and bias: Δ 𝑏 𝑘 = − 𝜂 ( − 𝜇𝜎 + 𝑥 𝑘 ( 𝑡 ) 𝜎 ( 𝜎 + 1 − 𝑥 𝑘 ( 𝑡 ) + 𝜇𝑥 𝑘 ( 𝑡 ) )) , (10) Δ 𝑎 𝑘 = 𝜂𝑎 𝑘 + Δ 𝑏 𝑘 𝑧 𝑘 ( 𝑡 ) , (11)where 𝜂 is the learning rule and 𝜇 and 𝜎 the mean and stan-dard deviation of the targeted distribution, respectively.Finally, we will also consider the combination of two ofthe above rules —the NL anti-Hebbian and IP algorithms—to assess the performance of an ESN when these two typesof plasticity act in a synergistic manner. For this combi-nation, there are three natural ways in which the trainingcan be carried out: 𝑖 ) applying both rules simultaneouslyto update the intrinsic parameters and connections weightsafter each input; 𝑖𝑖 ) modifying ﬁrst the connections throughthe synaptic plasticity and then applying the IP rule; or 𝑖𝑖𝑖 ) conversely, changing ﬁrst the intrinsic plasticity of the neu-rons and then the synapses strength among them. From allthe alternatives, the application of the NL anti-Hebbian rulethrough the whole training set followed by the applicationof the IP rule through the same training set yielded the bestperformance, and is therefore used in the forthcoming re-sults section. Computational models combining the eﬀectof synaptic and non-synaptic plasticity have been previouslysuggested in the literature for simple model neurons [39],FFNNs [40] and RNNs [40, 41, 42]. However, we ﬁnd that asimple combination of two standard plasticity rules can easethe tractability of the results, while allowing fairer compar-isons against the other plasticity models. The task at hand consists on the prediction of the pointscontinuing a Mackey-Glass series, a classical benchmark-ing dataset generated from a time-delay diﬀerential equation(see App. A for details on the generation of the dataset). Since this series exhibits a chaotic behavior when the timedelay 𝜏 > . , we construct two diﬀerent sets: one with 𝜏 = 17 (MG-17), often used as an example of mildly chaoticseries; and a second one with 𝜏 = 30 (MG-30) that presentsstronger chaotic behavior. To assess its performance, we ini-tially feed the ESN with the last input of the training set, 𝑢 ( 𝑇 ) , then run the network for a number 𝐹 of steps usingthe predicted output at time 𝑡 as the next input at time 𝑡 + 1 (i.e. 𝑢 ( 𝑡 + 1) = 𝑦 ( 𝑡 ) ). In this manner, the testing phase isdone in the so-called autonomous or generative mode withoutput feedback. To quantify the error for this task, we usetwo diﬀerent quantities:• The root mean square error ( 𝑅𝑀𝑆𝐸 ) over the pre-dicted continuation of the series:

𝑅𝑀𝑆𝐸 = √√√√ 𝐹 𝐹 ∑ 𝑡 =0 ( 𝑦 ( 𝑡 ) − 𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 ( 𝑡 ) ) (12)• The furthest predicted point (FPP): this is the furthestpoint up to which the trained ESN is able to continuethe series without signiﬁcantly deviating from the orig-inal one. The tolerance for signiﬁcant deviation istaken as 𝜀 = 0 . , which represents approximately2% of the maximum distance between any two pointsin the original MG-17 and MG-30 series. The task of memory capacity (MC) is based on the net-work’s ability to retrieve past information from the reservoirusing the linear combinations of reservoir unit activations.To assess the ability of each ESN model to restore previousinputs fed into the network, we compute the (short-term) MCas introduced by Jaeger in [43]: 𝑀𝐶 = ∞ ∑ 𝑑 =1 𝑀𝐶 𝑑 = ∞ ∑ 𝑑 =1 𝑐𝑜𝑣 ( 𝑢 ( 𝑡 − 𝑑 ) , 𝑦 𝑑 ( 𝑡 )) 𝑣𝑎𝑟 ( 𝑢 ( 𝑡 )) ⋅ 𝑣𝑎𝑟 ( 𝑦 𝑑 ( 𝑡 )) (13)where 𝑐𝑜𝑣 and 𝑣𝑎𝑟 denote covariance and variance, respec-tively. In the above expression, 𝑢 ( 𝑡 − 𝑑 ) is the input presented 𝑑 steps before the current input 𝑢 ( 𝑡 ) , and 𝑦 𝑑 ( 𝑡 ) = 𝑊 𝑜𝑢𝑡𝑑 𝑋 = ̃𝑢 ( 𝑡 − 𝑑 ) is its reconstruction at the output unit 𝑑 with trainedoutput weights 𝑊 𝑜𝑢𝑡𝑑 . A value 𝑀𝐶 𝑑 ∼ 1 means that thesystem is able to accurately reconstruct the input fed to thenetwork 𝑑 steps ago. Thus, the sum of all 𝑀𝐶 𝑑 representsan estimation of the number of past inputs the ESN is ableto recall. Although the sum runs to inﬁnity in the originaldeﬁnition —accounting for the complete past of the input—in practice the data fed is ﬁnite and it will suﬃce with setting 𝑑 𝑚𝑎𝑥 = 𝐿 , with 𝐿 being the number of output units of theESN. Each of the 𝐿 output units is independently trained toapproximate past inputs with a diﬀerent value of 𝑑 . A the-oretical limit for the memory capacity was derived in [43]to be 𝑀𝐶 𝑚𝑎𝑥 ≈ 𝑁 − 1 , with 𝑁 the number of reservoirneurons. G. B. Morales et al.:

Preprint submitted to Elsevier

Page 4 of 12nveiling the role of plasticity rules in RC

3. Results

One of the biggest drawbacks of Echo State Networksis their high sensitivity to hyper-parameters choice (see [26]for a detailed review on their eﬀects over the network per-formance). In this work, we focus on tuning four hyper-parameters to improve the performance of each ESN model:the reservoir size or number of neurons in the reservoir 𝑁 ,the input scaling 𝜀 , the spectral radius 𝜌 of the reservoir’sweight matrix (i.e. the maximum absolute eigenvalue of 𝑊 𝑟𝑒𝑠 ) and the regularization parameter 𝛽 in the ridge regres-sion. Weights in the reservoir and input layers are initial-ized randomly according to a uniform distribution between-1 and 1. Sparseness in the reservoir matrix is set to 90%,meaning that only 10% of all connections have initially anon-zero value. When incorporating plasticity rules, an ex-tra tunable hyper-parameter 𝜂 describing the learning ratein the update rules is included. When IP is implemented,we ﬁnd that best results are obtained when using 𝜇 = 0 and 𝜎 = 0 . as the mean and variance of the targeted dis-tribution for the neuron states. For the sake of compari-son between diﬀerent ESN models, we choose initially acommon non-optimal, but generally well-performing set ofhyper-parameters { 𝜌 = 0 . , 𝜀 = 1 , 𝛽 = 10 −7 } for all ofthem, with 𝑁 = 300 , 𝜂 = 10 −6 for the MG-17 series pre-diction and 𝑁 = 600 , 𝜂 = 10 −7 for the MG-30. In order to compare the inﬂuence of the diﬀerent plastic-ity rules, we ﬁrst estimate the number of plasticity trainingepochs that optimizes the performance for each model. Wenote that the neuronal plasticity rules are only active in theunsupervised learning procedure but not during the predic-tion, as detailed in App. A. The unsupervised learning canlast for several epochs, with each epoch containing 𝑇 = 4000 points of the time series for the MG-17 task. Once the plas-tic unsupervised learning has ﬁnished, 𝑊 𝑜𝑢𝑡 is computed ina supervised fashion after letting the reservoir evolve for anadditional 𝑇 = 4000 steps. Fig. 2 shows the evolution ofthe RMSE and FPP for the non-local anti-Hebbian and IPrules. In this ﬁgure, the optimal number of epochs can beeasily found as the point in which the RMSE (FPP) presentsa global minimum (maximum). As we see, the performancegets worse as the ESNs with plasticity are over-trained. Wewill focus on understanding the role of plasticity rules in Sec.3.4.In Table 1 we show the results obtained for the anti-Oja,NL anti-Hebbian and IP rules when each model is trained op-timally (i.e. for the optimal number of epochs). The numberof epochs for the Anti-Oja, NL anti-Hebbian, and IP rulesare 10, 8, and 100, respectively. For the sake of comparison,we also include in Table 1 the results for a non-plastic ESNwith the hyper-parameters mentioned at the end of Sec. 3.1.It can be observed that the implementation of plasticity rulesreduces the average prediction error and its uncertainty, spe-cially in the highly chaotic series MG-30, while keeping con-secutively predicted points close to the original test set for a Figure 2:

Evolution of the total RMSE over 𝐹 = 300 predictedpoints and FPP for the MG-17 chaotic time series predictiontask as a function of the number of training episodes for diﬀer-ent plasticity rules. At each epoch, averages were computedover 20 independent realizations of a 300 neurons ESN. longer time. For the chosen set of hyper-parameters, the non-plastic ESN is only able to properly predict autonomouslythe initial points of the MG-30 chaotic time series. Thus,the corresponding error computed over 𝐹 = 100 points ofthe testing time-series is very large.When comparing diﬀerent plastic rules in Table 1, weﬁnd that the NL anti-Hebbian and IP rules yield a better pre-diction than the anti-Oja one. In addition, we ﬁnd that thecombination of NL anti-Hebbian and IP reaches the lowestRMSE and largest FPP, thus providing "better and further"predictions. We construct single input node ESNs with 𝑁 = 150 reservoir neurons and 𝐿 = 300 output nodes, such that 𝑀𝐶 𝑑 is computed up to a delay 𝑑 𝑚𝑎𝑥 = 𝐿 . For this task, we feedthe network with a random time series of 𝑇 = 4000 points,drawn from a uniform probability distribution in the inter-val [-1,1]. Fig. 3 shows the memory curves for an ESN be-fore and after implementation of the diﬀerent plasticity rules.Again, we notice how models with implemented plasticityoutperform the original non-plastic ESN, with the memorydecaying faster in the latter case. In Table 2, we present theestimated MC computed for the plastic and non-plastic ver-sions of the ESN. Here, we ﬁnd that the IP rule and the com-bination of NL anti-Hebbian and IP yield the largest memorycapacities. These results are in agreement with the averagevalues presented in [44], where the maximum memory wasobserved at the edge of stability for a random recurrent neu-ral network.In the next section, we are going to explore in more detailthe properties of the plastic ESNs. Inﬂuence of plasticity rules on the reservoir dynamics.

To analyze the eﬀects of plasticity over the ESN perfor-mance, we focus now on the MG-17 prediction task, cast-ing our attention into the dependence of the performance onthe number of training epochs. As mentioned above, Fig.2 shows that the measures of performance exhibit absolute

G. B. Morales et al.:

Preprint submitted to Elsevier

Page 5 of 12nveiling the role of plasticity rules in RC

Table 1

RMSE and Furthest Predicted Point (FPP) on the MG-17 and MG-30 prediction tasksfor diﬀerent implementations of plasticity rules. The RMSE was calculated using 𝐹 stepsof the predicted time-series, with 𝐹 = 300 for the MG-17 and 𝐹 = 100 for the MG-30.Averages were computed over 20 independent realizations. Non-Plastic Anti-Oja NL anti-Hebbian IP NL anti-Hebbian + IPMG17 RMSE .

05 ± 0 .

05 0 .

02 ± 0 .

03 0 .

004 ± 0 .

004 0 .

004 ± 0 .

002 0 .

003 ± 0 . FPP

136 ± 73 208 ± 89 288 ± 35 289 ± 33 299 ± 2

MG30 RMSE .

03 ± 0 .

02 0 .

011 ± 0 .

010 0 .

018 ± 0 .

011 0 .

011 ± 0 . FPP

Table 2

Memory Capacity for an ESN with 150 nodes and 300 output neurons with and withoutimplementation of the diﬀerent plastic rules.Non-Plastic Anti-Oja NL anti-Hebb IP NL anti-Hebb + IPMC . . . . . . . . . . Figure 3:

Memory curves, 𝑀𝐶 𝑑 , as a function of the delay 𝑑 forthe diﬀerent models of ESNs studied. Averages of the valueswere taken over 20 independent realizations of each model. extrema (minimum of the errors, maximum of the numberof predicted points), which are followed by a worsening ofthe predictions as the number of epochs increase. In orderto understand this behavior, we studied quantities related tothe reservoir dynamics as the plasticity training advanced.In Fig. 4, we show the average absolute Pearson correlationcoeﬃcient among reservoir states at consecutive times, asdeﬁned in App. C. In addition, and for the case of synapticplasticity only, we present the spectral radius of the reser-voir matrix (which does not change in the IP rule) as thenon-supervised plasticity training evolves .Focusing ﬁrst on the NL anti-Hebbian rule, we observein Fig. 2a) that the prediction error increases signiﬁcantlybeyond 10 training epochs. This fact could be attributedin ﬁrst place to the associated increase of the reservoir ma-trix spectral radius, as seen in Fig. 4a). A maximum abso-lute eigenvalue exceeding unity has often been regarded asa source of instability in ESNs due to the loss of the “echostate property”, a mathematical condition ensuring that the Figure 4:

Evolution of Pearson correlation coeﬃcient as a func-tion of the training epochs for the a) NL anti-Hebbian and b)IP rules. For the NL anti-Hebbian, the evolution of the spectralradius of 𝑊 𝑟𝑒𝑠 is also shown. eﬀect of the initial conditions dies out asymptotically withtime [9, 45, 26]. Nevertheless, subsequent studies provedthat the echo state property can be actually maintained overa unitary spectral radius, depending on the input fed to thereservoir [46, 47], which could be the reason why we ﬁndoptimal performance slightly above 𝜌 = 1 .The results presented here seem to agree with the resultspresented in [44], where it was suggested that informationtransfer and storage in ESNs are maximized at the edge be-tween a stable and an unstable (chaotic) dynamical regime.In our case the ESN becomes unstable (periodic) for 𝜌 ∼ 1 . and we ﬁnd an associated decrease in the memory capacityof the ESN. A chaotic dynamic inside the reservoir is notobserved in our numerical simulations.Additionally, we ﬁnd that the increase in the predictionerror coincides with a sharp decrease in the consecutive-timepair-wise absolute correlations as shown in Figs. 2 and 4.This decrease in the correlations, which was to be expectedin any anti-Hebbian type of rule —by their very own deﬁni-tion — occurs also along the training of the IP rule. This re-markable common trend hints at the possibility that, to someextent, decorrelation inside the reservoir could indeed en-hance the network computational capability. However, anover-training of the plasticity rules yields an error increase. G. B. Morales et al.:

Preprint submitted to Elsevier

Page 6 of 12nveiling the role of plasticity rules in RC

Figure 5:

Distribution of reservoir states after training for theexplored ESN models. Histograms were constructed averagingover 20 diﬀerent realizations of the corresponding ESNs.

To further evaluate the eﬀects of plasticity over the dy-namics of the reservoir, we also analyzed the distribution ofthe reservoir states 𝐱 ( 𝑡 ) before and after implementation ofeach rule. Keeping the values of all the states at each in-put 𝑢 𝑡 of the training sequence, then averaging the resultingmatrix over 20 realizations with diﬀerent training sets, an“average” histogram describing the distribution of the statesis presented in Fig. 5. It can be observed that the appli-cation of the plasticity rules changes the distribution of thereservoir states from a rather uniform shape to a unimodalone. The initial uniform shape is given by our choice of uni-form input weights and the given input scaling. As expectedfrom the mathematical formulation of the IP rule, the distri-bution of the states after its implementation approaches thatof a Gaussian centered around zero. Remarkably, the appli-cation of synaptic plasticity also shapes the initial distribu-tion to a unimodal one peaking around zero. The observeddistribution of reservoir states after the application of plas-ticity rules entails that the individual reservoir neurons tendto avoid operating at the saturation of the 𝑡𝑎𝑛ℎ non-linearity. Inﬂuence of plasticity rules on the neuron dynamics.

So far, we have focused on understanding the eﬀects ofplasticity at the network level, but nothing has been saidabout the way each individual neuron “sees” or “reacts” tothe input after implementation of the plastic rules. To shedsome light into this question, we deﬁne the eﬀective input ̃𝑢 𝑛 ( 𝑡 ) of a neuron 𝑛 at time 𝑡 as the sum of the input and biasunit once ﬁltered through the input mask, ̃𝑢 𝑛 ( 𝑡 ) = 𝑤 𝑖𝑛 𝑛 + 𝑤 𝑖𝑛 𝑛 ⋅ 𝑢 𝑛 ( 𝑡 ) . In this way, the state update equation for a singleneuron (with no IP implemented) can be rewritten as: 𝑥 𝑛 ( 𝑡 + 1) = 𝑡𝑎𝑛ℎ ( 𝜖̃𝑢 𝑛 ( 𝑡 ) + ∑ 𝑗 𝑤 𝑟𝑒𝑠𝑛𝑗 𝑥 𝑗 ( 𝑡 ) ) (14)In Fig. 6, we plot the response of 4 diﬀerent neurons tothis eﬀective input before (blue dots) and after (red and yel- Figure 6:

Activity of 4 diﬀerent neurons as a function of the ef-fective input in a non-plastic ESN (blue) and in the same reser-voir after training it with NL anti-Hebbian rule for 8 epochs(red) and IP rule for 100 epochs (yellow). On the right side wezoom in one of the neurons, plotting also the evolution of theeﬀective input over a section of the training. We highlight ingreen the range of inputs for which the activity broadens morenotably with respect to the non-plastic case, coinciding withone of the most variable parts of the input. low dots) the implementation of the non-local anti-Hebbianand IP rules. On the right side we have zoomed in on one ofthese neurons and plotted also 1000 points of the eﬀective in-put ̃𝑢 𝑛 ( 𝑡 ) that arrives to it. It can be clearly seen that plasticityhas the eﬀect of widening the activity range of the neurons,specially on those areas —as highlighted in green— in whichthe same point may lead to very diﬀerent continuations of theseries depending on its past. To quantify this widening, wemeasured the average area of the reservoir neurons’ activityphase space before and after implementation of the plastic-ity rules. The results for these average phase space areas, aspresented in Table 3, back up the aforementioned expansion,which is specially signiﬁcant in the case of the IP rule.Note from Eq. 14 that if a neuron 𝑛 is mainly inﬂu-enced by the external input at each time 𝑡 , then 𝑥 𝑛 ( 𝑡 + 1) ≈ 𝑡𝑎𝑛ℎ ( 𝜖̃𝑢 𝑛 ( 𝑡 )) and the corresponding states are distributed ina narrow region around the hyperbolic tangent curve. This iswhat we see in Fig. 6 for the non-plastic case. Conversely, abroadened phase space (as found after plasticity implemen-tation) suggests a greater role of the interactions among pastvalues of the reservoir units in determining the neuron state.In the case of neurons where IP was implemented, a furtherdisplacement of their activity towards the center of the ac-tivation function is observed, which should come as no sur-prise since we chose a zero-mean Gaussian as our IP targetdistribution. The fact that mechanisms apparently so dis-parate exhibit similar eﬀects at the neuron and network levelmotivates the idea of synergistic learning involving both,synaptic and nonsynaptic plasticity, which has been exten-sively backed up also in biological systems [48, 49].To ﬁnalize, we apply the same neuron-level frameworkto see if we can understand the eﬀects of an over-trained plas-ticity. It was shown in Fig. 2 that once a certain number ofepochs were exceeded, the prediction error increased, and G. B. Morales et al.:

Preprint submitted to Elsevier

Page 7 of 12nveiling the role of plasticity rules in RC

Table 3

Average neuron phase space area for plastic and non-plastic implementations of a 300neurons ESN. The error is given as the standard deviation over all neurons.Non-Plastic Anti-Oja NL anti-Hebb IP NL anti-Hebb + IP .

02 ± 0 .

02 0 .

03 ± 0 .

02 0 .

04 ± 0 .

03 0 .

07 ± 0 .

07 0 .

07 ± 0 . Figure 7:

Activity of 4 diﬀerent neurons as a function of theeﬀective input in a non-plastic ESN (blue) and in the same net-work after training it with NL anti-Hebbian rule for 25 epochs(red) and IP rule for 175 epochs (yellow). that this co-occurred with a sharp increase in the spectral ra-dius of the weight matrix for the NL anti-Hebbian rule (asshown in Fig. 4). Is this observed transition from stable tounstable (periodic) dynamics reﬂected in any way in the ac-tivity of the neurons? Choosing the same initial ESN andtraining set as in Fig. 6, we now apply either the NL anti-Hebbian or IP rules for a total of 𝑛 𝐻𝑒𝑏𝑏 = 25 , 𝑛 𝐼𝑃 = 175 epochs, respectively. From the resulting plot of the activityas a function of the eﬀective input, as shown in Fig. 7, twodiﬀerent paths leading to the reported worsening of the pre-diction performance can be identiﬁed. On the one hand, theIP rule leads to a seemingly blurred phase space represen-tation at each unit of the reservoir, in which each eﬀectiveinput value leads to a very spread network activity. We haveidentiﬁed that in this regime some neurons of the ESN losetheir consistency (an important property that needs to be ful-ﬁlled in RC as discussed in [50, 51]), producing diﬀerent re-sponses for the same input when the initial conditions arechanged. On the other hand, an excess of NL anti-Hebbiantraining produces the split of the original phase space rep-resentation into two disjoint regions. We observed that theinstability in this case is associated with a self-sustained pe-riodic dynamics of the reservoir states, leading to consecu-tive jumps from one phase space region to the other. We havenoticed that this transition, which results from the imposeddecorrelation, is also followed by a decrease in the memorycapacity of the network.

4. Discussion and Outlook

We showed that numerical implementation of plasticityrules can increase both the prediction capabilities on tem-poral tasks and the memory capacity of reservoir comput-ers. In the case of the Hebbian-type synaptic plasticity, weproved that a non-local anti-Hebbian rule outperforms theclassically used anti-Oja approximation. We also found thatthe synergistic action of synaptic (non-local anti-Hebbian)and non-synaptic (Intrinsic Plasticity) rules lead to the bestresults.At the network level, diﬀerent quantities that might bemodiﬁed by the plasticity rules were analyzed. For the non-local anti-Hebbian rule, we showed in Fig. 4 how the suddenincrease in the reservoir weight matrix spectral radius co-occurs with a sharp drop on the states correlations at con-secutive times. More concretely, we observed that the op-timal number of epochs occurred just before the transitionto a periodic self-sustained dynamics inside the reservoir.Similarly, continuous application of the IP rule also tends todecorrelate the states of the neurons within the reservoir. Anover-training of the IP rule eventually results in a loss of theconsistency of the neural responses and the correspondingperformance degradation.From the distributions of the states depicted in Fig. 5, wefound that both types of plasticity lead to unimodal distribu-tions of states centered around zero. This seems to implythat the optimal performance is achieved —for this type oftemporal tasks— when most neurons distribute along the hy-perbolic tangent by avoiding the saturation regions. Indeed,similar results observed both in vivo, on large monopolarcells of the ﬂy [52], and in artiﬁcial single neuron models[53], suggest that this form of states distribution helps toachieve optimal encoding of the inputs.Interesting results emerged also from the one-to-one com-parison among individual neurons before and after the im-plementation of plasticity rules. At the neuron level, wesaw how plasticity rules expand each neuron activity space—measured in terms of its area—, adapting to the propertiesof the input and thus possibly enhancing the computationalcapability of the whole network. Within this same frame-work, we observed how the regime of performance degra-dation found when over-training the plastic parameters is ofdiﬀerent nature for the synaptic and non-synaptic rules. Inthe synaptic case, the phase space region occupied by theactivity of each single neuron splits in two disjoint regions,with the state jumping from one region to the other at consec-utive time steps. In the IP rule, on the other hand, we foundthat decorrelation of the states and expansion of their phasespace continues progressively, with diﬀerent inputs eventu-

G. B. Morales et al.:

Preprint submitted to Elsevier

Page 8 of 12nveiling the role of plasticity rules in RC ally leading to similarly broad projections of the reservoirstates in the activity phase space.Our ﬁndings also raise interesting questions that will hope-fully stimulate future works. On the one hand, we observein Fig. 6 that the resulting phase space after implementationof synaptic and non-synaptic plasticity is qualitatively sim-ilar for three out of the four neurons presented, while dif-fering considerably from the non-plastic units. This remark-able result comes as fairly surprising given that the synapticand non-synaptic rules employed have indeed very little incommon from an algorithmic point of view. Nevertheless,they drive the network toward similar optimal states. On theother hand, the instability arising from an over-training ofthe plastic connections or intrinsic parameters shows to be offundamentally diﬀerent nature in the NL anti-Hebbian andIP rules. A thoughtful characterization of these transitionsand a deeper understanding of the underlying similarities be-tween synaptic and non-synaptic plasticity rules will likelytrigger interesting research avenues.The computational paradigm of reservoir computing hasbeen shown to be compatible with the implementation con-straints of hardware systems [54, 55]. The ﬁnding that aphysical substrate with non-optimized conditions can be usedfor computation has been exploited in the context of elec-tronic and photonic implementations of reservoir computing[56, 57]. Although the physical implementation of plastic-ity rules is certainly challenging, the results presented in thismanuscript anticipate a potential advantage of consideringsuch plasticity rules also in physical systems.

Declaration of Competing Interest

The authors declare that they have no known competingﬁnancial interests or personal relationships that could haveappeared to inﬂuence the work reported in this paper.

Acknowledgements

We thank M. A. Muñoz and M. A. Matías for fruitful sci-entiﬁc discussions. This work was supported by MINECO(Spain), through project TEC2016-80063-C3 (AEI/FEDER,UE). We acknowledge the Spanish State Research Agency,through the Severo Ochoa and María de Maeztu Programfor Centers and Units of Excellence in R&D (MDM-2017-0711). The work of MCS has been supported by the SpanishMinisterio de Ciencia, Innovación y Universidades, la Agen-cia Estatal de Investigación, the European Social Fund andthe University of the Balearic Islands through a "Ramón yCajal” Fellowship (RYC-2015-18140). GBM has been partlysupported by the Spanish National Research Council via aJAE Intro fellowship (JAEINT18_EX_0684).

A. Mackey-Glass series.

The Mackey-Glass chaotic time series is generated fromthe following delay diﬀerential equation: 𝑑𝑥𝑑𝑡 = [ 𝛼𝑥 ( 𝑡 − 𝜏 )1 + 𝑥 ( 𝑡 − 𝜏 ) 𝛽 − 𝛾𝑥 ( 𝑡 ) ] , (15) where 𝜏 represents the delay and the parameters are set to 𝛼 = 0 . , 𝛽 = 10 and 𝛾 = 0 . , a common choice for this typeof prediction tasks [15, 58].To construct the temporal series we solved Eq. 15 using 𝑀𝑎𝑡𝑙𝑎𝑏 dde23 delay diﬀerential equation solver, generating10000 points with an initial washout period of 1000 points.The step size between points in the extracted series was setto 1, although it was initially computed with step size of 0.1and then sampled every 10 points. The absolute error toler-ance was set to −16 , as suggested in [9]. All series werere-scaled to lay in the range [0,1] before being fed to theESNs. Similarly, the predicted points were re-scaled backto the original series range of values before computing theprediction accuracy.We evaluated the prediction performance of the ESN fortwo values of the delay, 𝜏 = 17 for MG-17 and 𝜏 = 30 for MG-30, respectively. For the MG-30 we generated 𝑇 =6000 consecutive points for the training set, while for theMG-17 we found that 𝑇 = 4000 training points suﬃce to ob-tain good results. The evaluated prediction task consisted onthe continuation of the series from the last input of the train-ing set. Accordingly, the target series in the supervised train-ing was deﬁned as the one-step-ahead prediction 𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 =[ 𝑢 , 𝑢 , ..., 𝑢 𝑇 +1 ] for an input 𝑢 = [ 𝑢 , 𝑢 , ..., 𝑢 𝑇 ] . For thecomputation of the output weights 𝑊 𝑜𝑢𝑡 , we kept all inter-nal reservoir states of the ESN and only after passing all theinput training set we applied Eq. 5.When implementing any of the plasticity rules, we ranthe corresponding unsupervised learning procedure using thesame 𝑇 points of the training mentioned above. The reser-voir conﬁguration was updated after every point of the inputduring this procedure. We passed over the whole training seta number of times (epochs).At the end of the unsupervised learning procedure, thereservoir is kept ﬁxed to the last conﬁguration. We thencollected the states of the reservoir after a last presentationof the training set and computed the new optimal readoutweights using Eq. 5, as described in the previous paragraph. B. Derivation of anti-Oja rule.

When applying a plasticity rule of the Hebbian type toESNs, a common choice is the anti-Oja rule [15, 59]. It isusually implemented using the following local rule: 𝑤 𝑘𝑗 ( 𝑡 + 1) = 𝑤 𝑘𝑗 ( 𝑡 ) − 𝜂𝑦 𝑘 ( 𝑡 ) ( 𝑥 𝑗 ( 𝑡 ) − 𝑦 𝑘 ( 𝑡 ) 𝑤 𝑘𝑗 ( 𝑡 ) ) . where we denote the post-synaptic state of neuron 𝑘 by 𝑦 𝑘 ( 𝑡 ) ≡ 𝑥 𝑘 ( 𝑡 + 1) in our RC scheme.Originally, Oja proposed the plasticity rule that takes hisname as a way of deriving a local rule that implements Heb-bian learning while normalizing the synaptic weights at eachstep of the training. To do so, he started from a common nor-malization of the basic Hebbian rule: 𝑤 𝑘𝑗 ( 𝑡 + 1) = 𝑤 𝑘𝑗 ( 𝑡 ) − 𝜂𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 ) √∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) − 𝜂𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 ) ) , (16) G. B. Morales et al.:

Preprint submitted to Elsevier

Page 9 of 12nveiling the role of plasticity rules in RC where we are already using the minus sign to account foranti-Hebbian behavior. Notice that this form of the rule doesnot assume any particular form of the activation function 𝑦 𝑘 ( 𝑡 ) = 𝑓 ( ⃗𝑥 ) . Now, this update rule can be approximatedby expanding the above expression in powers of 𝜂 : 𝑤 𝑘𝑗 ( 𝑡 + 1) = 𝑂 ( 𝜂 ) + 𝑤 𝑘𝑗 ( 𝑡 ) √∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) ) ++ 𝜂 ⎡⎢⎢⎢⎣ − 𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 ) √∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) ) + 𝑤 𝑘𝑗 ( 𝑡 )2 ∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) ) 𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 )2 (∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) ) ) ⎤⎥⎥⎥⎦ Imposing normalization of the incoming weights, leadsto 𝑤 𝑘𝑗 ( 𝑡 +1) ∼ 𝑤 𝑘𝑗 ( 𝑡 )− 𝜂 [ 𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 )− 𝑤 𝑘𝑗 ( 𝑡 ) 𝑦 𝑘 ( 𝑡 ) ∑ 𝑗 𝑤 𝑘𝑗 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 )] . with √∑ 𝑗 ( 𝑤 𝑘𝑗 ( 𝑡 ) ) = 1 .Finally, assuming linear activation functions and no ex-ternal input, so that 𝑦 𝑘 ( 𝑡 ) = ∑ 𝑗 𝑤 𝑘𝑗 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 ) , we obtain thewidely-known anti-Oja’s rule: 𝑤 𝑘𝑗 ( 𝑡 + 1) ∼ 𝑤 𝑘𝑗 ( 𝑡 ) − 𝜂 ( 𝑦 𝑘 ( 𝑡 ) 𝑥 𝑗 ( 𝑡 ) − 𝑤 𝑘𝑗 ( 𝑡 ) 𝑦 𝑘 ( 𝑡 ) ) . (17)The more adequate use of Eq. 16 comes of course withan important computational cost compared to Eq. 17, but itis still feasible for the reservoir sizes we considered. C. Measures of reservoir dynamics duringplasticity training.

To evaluate the decorrelation among pre and post-synapticreservoir states, we employed the Pearson correlation coef-ﬁcient between activity of unit 𝑖 at time 𝑡 and that of unit 𝑘 at time 𝑡 + 1 , given by: 𝑐𝑜𝑟𝑟 ( 𝑥 𝑖 ( 𝑡 ) , 𝑥 𝑘 ( 𝑡 + 1)) = ∑ 𝑇 −1 𝑡 =1 ( 𝑥 𝑖 ( 𝑡 ) − 𝑥 𝑖 ) ( 𝑥 𝑘 ( 𝑡 + 1) − 𝑥 𝑘 )√∑ 𝑇 −1 𝑡 =1 ( 𝑥 𝑖 ( 𝑡 ) − 𝑥 𝑖 ) √∑ 𝑇 −1 𝑡 =1 ( 𝑥 𝑘 ( 𝑡 + 1) − 𝑥 𝑘 ) (18)After each epoch of the plasticity training, the mean absolutecorrelation was computed as: 𝐶𝑜𝑟𝑟 = 1 𝑀 𝑀 ∑ 𝑚 =1 𝑁 𝑁 ∑ 𝑖 =1 𝑁 ∑ 𝑘 =1 || 𝑐𝑜𝑟𝑟 ( 𝑥 𝑖 ( 𝑡 ) , 𝑥 𝑘 ( 𝑡 + 1)) || (19)where N denotes the size of the reservoir and M the num-ber of independent realizations over which the results wereaveraged. References [1] D. O. Hebb, The Organization of Behavior. A NeuropsychologicalTheory, Wiley, 1949. [2] P. Milner, A Brief History of the Hebbian Learning Rule., CanadianPsychology 44 (1) (2003) 5–9. doi:10.1037/h0085817 .[3] F. Rosenblatt, Cornell Aeronautical Laboratory, The Perceptron :a Theory of Statistical Separability in Cognitive Systems (ProjectPARA), Cornell Aeronautical Laboratory, 1958.[4] M. Minsky, S. Papert, Perceptrons: an Introduction to ComputationalGeometry, Cambridge, Mass.-London, 1969.[5] A. Kurenkov, A ’Brief’ History of Neural Nets and Deep Learning(2015).URL [6] S. Linnainmaa, The representation of the cumulative rounding errorof an algorithm as a Taylor expansion of the local rounding errors.,Ph.D. thesis (1970).[7] P. Werbos, New Tools for Prediction and Analysis in the BehavioralSciences., PhD thesis, Harvard University, Cambridge, MA. (1974).[8] F. Grezes, Reservoir Computing. Dissertation Submitted to the Grad-uate Faculty in Computer Science, The City University of New York.,Ph.D. thesis (2014).[9] H. Jaeger, The “Echo State” Approach to Analysing and Training Re-current Neural Networks, GMD-Report 148, German National Re-search Institute for Computer Science (2001).[10] W. Maass, T. Natschläger, H. Markram, Real-Time Computing with-out Stable States: a New Framework for Neural Computation Basedon Perturbations, Neural Computation 14 (11) (2002) 2531–2560. doi:10.1162/089976602760407955 .[11] Š. Babinec, J. Pospíchal, Improving the Prediction Accuracy ofEcho State Neural Networks by Anti-Oja’s Learning, LectureNotes in Computer Science vol 4668 (2007) 19–28. doi:10.1007/978-3-540-74690-4_3 .[12] J. Triesch, A Gradient Rule for the Plasticity of a Neuron’s Intrin-sic Excitability, Artiﬁcial Neural Networks: Biological InspirationsICANN 2005, 2005. doi:10.1007/11550822_11 .[13] B. Schrauwen, M. Wardermann, D. Verstraeten, J. J. Steil,D. Stroobandt, Improving Reservoirs Using Intrinsic Plasticity, Neu-rocomputing 71 (7-9) (2008) 1159–1171. doi:10.1016/j.neucom.2007.12.020 .[14] J. J. Steil, Online Reservoir Adaptation by Intrinsic Plasticity forBackpropagation–Decorrelation and Echo State Learning, NeuralNetworks 20 (3) (2007) 353–364. doi:10.1016/j.neunet.2007.04.011 .[15] M.-H. Yusoﬀ, J. Chrol-Cannon, Y. Jin, Modeling Neural Plasticity inEcho State Networks for Classiﬁcation and Regression, InformationSciences 364-365 (2016) 184–196. doi:10.1016/j.ins.2015.11.017 .[16] X. Wang, Y. Jin, K. Hao, Echo State Networks Regulated by LocalIntrinsic Plasticity Rules for Regression, Neurocomputing 351 (2019)111–122. doi:10.1016/j.neucom.2019.03.032 .[17] H. Ju, M. R. Dranias, G. Banumurthy, A. M. J. VanDongen, Spa-tiotemporal Memory Is an Intrinsic Property of Networks of Dissoci-ated Cortical Neurons, Journal of Neuroscience 35 (9) (2015) 4040–4051. doi:10.1523/jneurosci.3793-14.2015 .[18] P. Enel, E. Procyk, R. Quilodran, P. F. Dominey, Reservoir ComputingProperties of Neural Dynamics in Prefrontal Cortex, PLOS Compu-tational Biology 12 (6) (2016) e1004967. doi:10.1371/journal.pcbi.1004967 .[19] M. D. Skowronski, J. G. Harris, Automatic Speech Recognition Usinga Predictive Echo State Network Classiﬁer, Neural Networks 20 (3)(2007) 414–423. doi:10.1016/j.neunet.2007.04.006 .[20] H. Jaeger, Harnessing Nonlinearity: Predicting Chaotic Systemsand Saving Energy in Wireless Communication, Science 304 (5667)(2004) 78–80. doi:10.1126/science.1091277 .[21] J. Hertzberg, H. Jaeger, F. Schönherr, Learning to Ground Fact Sym-bols in Behavior-Based Robots, ECAI’02 Proceedings of the 15th Eu-ropean Conference on Artiﬁcial Intelligence (2002).[22] X. Lin, Z. Yang, Y. Song, The Application of Echo State Networkin Stock Data Mining, Advances in Knowledge Discovery and DataMining (2008) 932–937 doi:10.1007/978-3-540-68125-0_95 .[23] Š. Babinec, J. Pospíchal, Gating Echo State Neural Networks forTime Series Forecasting, Advances in Neuro-Information Process-

G. B. Morales et al.:

Preprint submitted to Elsevier

Page 10 of 12nveiling the role of plasticity rules in RC ing. Lecture Notes in Computer Science 5506 (2009) 200–207. doi:10.1007/978-3-642-02490-0_25 .[24] X. Lin, Z. Yang, Y. Song, Short-term Stock Price Prediction Based onEcho State Networks, Expert Systems with Applications 36 (3) (2009)7313–7317. doi:10.1016/j.eswa.2008.09.049 .[25] H. Wang, Y. Bai, C. Li, Z. Guo, J. Zhang, Time Series PredictionModel of Grey Wolf Optimized Echo State Network, Data ScienceJournal 18 (2019) 16. doi:10.5334/dsj-2019-016 .[26] M. Lukoševičius, A Practical Guide to Applying Echo State Net-works, in: Neural networks: Tricks of the trade, Springer, 2012, pp.659–686.[27] R. F. Reinhart, J. J. Steil, A Constrained Regularization Approachfor Input-Driven Recurrent Neural Networks, Diﬀerential Equa-tions and Dynamical Systems 19 (1-2) (2010) 27–46. doi:10.1007/s12591-010-0067-x .[28] R. F. Reinhart, J. J. Steil, Reservoir Regularization Stabilizes Learn-ing of Echo State Networks with Output Feedback, ESANN 2011 Pro-ceedings, European Symposium on Artiﬁcial Neural Networks, Com-putational Intelligence and Machine Learning (2011).[29] G. Berlucchi, H. A. Buchtel, Neuronal Plasticity: Historical Roots andEvolution of Meaning, Experimental Brain Research 192 (3) (2008)307–319. doi:10.1007/s00221-008-1611-6 .[30] L. F. Abbott, S. B. Nelson, Synaptic Plasticity: Taming the Beast,Nature Neuroscience 3 (S11) (2000) 1178. doi:10.1038/81453 .[31] I. Song, R. L. Huganir, Regulation of AMPA Receptors during Synap-tic Plasticity, Trends in Neurosciences 25 (11) (2002) 578–588. doi:10.1016/s0166-2236(02)02270-1 .[32] K. Gerrow, A. Triller, Synaptic Stability and Plasticity in a FloatingWorld, Current Opinion in Neurobiology 20 (5) (2010) 631–639. doi:10.1016/j.conb.2010.06.010 .[33] G. G. Turrigiano, Homeostatic plasticity in neuronal networks: themore things change, the more they stay the same, Trends in neuro-sciences 22 (5) (1999) 221–227.[34] E. Marder, J.-M. Goaillard, Variability, compensation and homeosta-sis in neuron and network function, Nature Reviews Neuroscience7 (7) (2006) 563.[35] A. Alonso, M. de Curtis, R. Llinas, Postsynaptic Hebbian and non-Hebbian long-term potentiation of synaptic eﬃcacy in the entorhinalcortex in slices and in the isolated adult guinea pig brain., Proceedingsof the National Academy of Sciences 87 (23) (1990) 9280–9284. doi:10.1073/pnas.87.23.9280 .[36] H. K. Kato, A. M. Watabe, T. Manabe, Non-Hebbian Synaptic Plas-ticity Induced by Repetitive Postsynaptic Action Potentials, Journal ofNeuroscience 29 (36) (2009) 11153–11160. doi:10.1523/jneurosci.5881-08.2009 .[37] E. Oja, Simpliﬁed neuron model as a principal component analyzer,Journal of Mathematical Biology 15 (3) (1982) 267–273. doi:10.1007/bf00275687 .[38] H. B. Barlow, Adaptation and decorrelation in the cortex, The com-puting neuron (1989).[39] J. Triesch, Synergies between intrinsic and synaptic plasticity mech-anisms, Neural Computation 19 (4) (2007) 885–909. doi:10.1162/neco.2007.19.4.885 .[40] Y. Li, C. Li, Synergies between intrinsic and synaptic plasticity basedon information theoretic learning, PloS One 8 (5) (2013) e62894. doi:10.1371/journal.pone.0062894 .[41] M. K. Janowitz, M. C. W. van Rossum, Excitability changes that com-plement Hebbian learning, Network (Bristol, England) 17 (1) (2006)31–41. doi:10.1080/09548980500286797 .[42] W. Aswolinskiy, G. Pipa, RM-SORN: a reward-modulated self-organizing recurrent neural network, Frontiers in Computational Neu-roscience 9 (2015) 36. doi:10.3389/fncom.2015.00036 .[43] H. Jaeger, Short Term Memory in Echo State Networks, GMD-Report152, German National Research Center for Information Techonology(2001).[44] J. Boedecker, O. Obst, J. T. Lizier, N. M. Mayer, M. Asada, Infor-mation Processing in Echo State Networks at the Edge of Chaos,Theory in Biosciences 131 (3) (2011) 205–213. doi:10.1007/ s12064-011-0146-8 .[45] M. Lukoševičius, H. Jaeger, Reservoir computing approaches to re-current neural network training, Computer Science Review 3 (3)(2009) 127–149. doi:10.1016/j.cosrev.2009.03.005 .[46] I. B. Yildiz, H. Jaeger, S. J. Kiebel, Re-visiting the echo state property,Neural Networks 35 (2012) 1–9. doi:10.1016/j.neunet.2012.07.005 .[47] G. Manjunath, H. Jaeger, Echo State Property Linked to an Input: Ex-ploring a Fundamental Characteristic of Recurrent Neural Networks,Neural Computation 25 (3) (2013) 671–696. doi:10.1162/neco_a_00411 .[48] E. Hanse, Associating Synaptic and Intrinsic Plasticity, The Journalof Physiology 586 (3) (2008) 691–692. doi:10.1113/jphysiol.2007.149476 .[49] R. Mozzachiodi, J. H. Byrne, More than Synaptic Plasticity: Role ofNonsynaptic Plasticity in Learning and Memory, Trends in Neuro-sciences 33 (1) (2010) 17–26. doi:10.1016/j.tins.2009.10.001 .[50] J. Bueno, D. Brunner, M. C. Soriano, I. Fischer, Conditions for reser-voir computing performance using semiconductor lasers with delayedoptical feedback, Optics express 25 (3) (2017) 2401–2412.[51] T. Lymburn, A. Khor, T. Stemler, D. C. Corrêa, M. Small, T. Jüngling,Consistency in echo-state networks, Chaos 29 (2) (2019) 023118.[52] S. Laughlin, A simple coding procedure enhances a neuron’s informa-tion capacity, Zeitschrift Fur Naturforschung. Section C, Biosciences36 (9-10) (1981) 910–912.[53] A. Bell, T. Sejnowski, An Information-Maximization Approach toBlind Separation and Blind Deconvolution, Neural computation 7(1995) 1129–59. doi:10.1162/neco.1995.7.6.1129 .[54] G. Van der Sande, D. Brunner, M. C. Soriano, Advances in photonicreservoir computing, Nanophotonics 6 (3) (2017) 561–576.[55] G. Tanaka, T. Yamane, J. B. Héroux, R. Nakane, N. Kanazawa,S. Takeda, H. Numata, D. Nakano, A. Hirose, Recent advances inphysical reservoir computing: a review, Neural Networks (2019).[56] L. Appeltant, M. Soriano, G. Van der Sande, J. Danckaert, S. Mas-sar, J. Dambre, B. Schrauwen, C. Mirasso, I. Fischer, Informationprocessing using a single dynamical node as complex system, NatureCommunications 2 (1) (2011). doi:10.1038/ncomms1476 .[57] D. Brunner, M. C. Soriano, C. R. Mirasso, I. Fischer, Parallel pho-tonic information processing at gigabyte per second data rates usingtransient states, Nature communications 4 (1) (2013) 1–7.[58] S. Ortín, M. C. Soriano, L. Pesquera, D. Brunner, D. San-Martín,I. Fischer, C. Mirasso, J. Gutiérrez, A uniﬁed framework for reservoircomputing and extreme learning machines based on a single time-delayed neuron, Scientiﬁc reports 5 (2015) 14945.[59] Š. Babinec, J. Pospíchal, Improving the prediction accuracy of echostate neural networks by anti-oja’s learning, in: International Confer-ence on Artiﬁcial Neural Networks, Springer, 2007, pp. 19–28.Guillermo B. Morales is currently working towardshis PhD at the University of Granada. Previously,he received the MSC degree on complex systems atthe University of the Balearic Islands. His main re-search interests cover topics of neural network dy-namics and epidemics spreading from a complexsystems perspective.Claudio R. Mirasso is Full Professor at the PhysicsDepartment of the Universitat de les Illes Balearsand member of the IFISC. He has co-authoredover 160 publications included in the SCI withmore than 7500 citations. He was coordina-tor (and principal investigator) of the OCCULTproject (IST-2000-29683) and PHOCUS project(IST-2010-240763) and principal investigator ofother national and European projects. His researchinterests include information processing in com-plex systems, synchronization, fundamentals and

G. B. Morales et al.:

Preprint submitted to Elsevier

Page 11 of 12nveiling the role of plasticity rules in RC applications of machine learning, neuronal mod-elling and dynamics, and applications of nonlineardynamics in general.Miguel C. Soriano was born in Benicarlo, Spain, in1979. He received the Ph.D. degree in applied sci-ences from the Vrije Universiteit Brussel, Brussels,Belgium, in 2006. He currently holds a tenure-track position at the University of the Balearic Is-lands, Spain. His main research interests cover top-ics of nonlinear dynamics and information process-ing based on reservoir computing. As an author orco-author, he has published over 60 research papersin international refereed journals.

G. B. Morales et al.: