[PDF] Chaos may enhance expressivity in cerebellar granular layer

Abstract

Recent evidence suggests that Golgi cells in the cerebellar granular layer are densely connected to each other with massive gap junctions. Here, we propose that the massive gap junctions between the Golgi cells contribute to the representational complexity of the granular layer of the cerebellum by inducing chaotic dynamics. We construct a model of cerebellar granular layer with diffusion coupling through gap junctions between the Golgi cells, and evaluate the representational capability of the network with the reservoir computing framework. First, we show that the chaotic dynamics induced by diffusion coupling results in complex output patterns containing a wide range of frequency components. Second, the long non-recursive time series of the reservoir represents the passage of time from an external input. These properties of the reservoir enable mapping different spatial inputs into different temporal patterns.

Full PDF

CChaos may enhance expressivity in cerebellar granular layer

Keita Tokuda a, ∗ , Naoya Fujiwara b,c,d , Akihito Sudo e , Yuichi Katori f,c a Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Japan b Graduate School of Information Sciences, Tohoku University, 6-3-09 Aoba, Aramaki-aza Aoba-ku, Sendai, Miyagi,980-8579, Japan c Institute of Industrial Science, The University of Tokyo, 4-6-1 Komaba, Meguro-ku, Tokyo 153-8505, Japan d Center for Spatial Information Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8568, Japan e Faculty of Informatics, Shizuoka University, 3-5-1 Johoku, Naka-ku, Hamamatsu-shi, Shizuoka, Japan f The School of Systems Information Science, Future University Hakodate, 116-2 Kamadanakano-cho, Hakodate, Hokkaido041-8655, Japan

Abstract

Recent evidence suggests that Golgi cells in the cerebellar granular layer are densely connected to eachother with massive gap junctions. Here, we propose that the massive gap junctions between the Golgi cellscontribute to the representational complexity of the granular layer of the cerebellum by inducing chaoticdynamics. We construct a model of cerebellar granular layer with diﬀusion coupling through gap junctionsbetween the Golgi cells, and evaluate the representational capability of the network with the reservoircomputing framework. First, we show that the chaotic dynamics induced by diﬀusion coupling results incomplex output patterns containing a wide range of frequency components. Second, the long non-recursivetime series of the reservoir represents the passage of time from an external input. These properties of thereservoir enable mapping diﬀerent spatial inputs into diﬀerent temporal patterns.

Keywords:

Cerebellar Golgi cells, Cerebellar granular layer, Reservoir computing, Gap junction, Diﬀusioncoupling, Chaotic dynamics, Degrees of freedom, Sierpinski gasket, Reaction-diﬀusion system

1. Introduction

Recent experimental studies have revealed that neighboring Golgi cells in the granular layer of the cerebellarcortex are densely interconnected with gap junctions that allow direct diﬀusion of ions between neuronalintracellular spaces (Dugu´e et al. (2009); Vervaeke et al. (2010)). Vervaeke et al. (2010) reported that morethan 80 % of neighboring neuron pairs are interconnected with gap junctions, and that each Golgi cell isconnected to approximately 10 other Golgi cells via gap junctions. They also showed that the diﬀusioncurrent between neighboring Golgi cells has the eﬀect of transiently desynchronizing the spike activitiesafter external excitation (Vervaeke et al. (2010)). This is contradictory to the classical view of the role ofgap junctions, which is to synchronize nearby neurons (Watanabe (1958)). In spite of the complex eﬀectof diﬀusion coupling between Golgi cells on the ongoing dynamics, the causal relationship between thisdynamics and cerebellar computation has yet to be elucidated. ∗ Keita Tokuda

Email address: [email protected] (Keita Tokuda)

Preprint submitted to Elsevier June 23, 2020 a r X i v : . [ q - b i o . N C ] J un everal theoretical studies have pointed out that diﬀusion coupling between nonlinear oscillators notnecessarily realizes synchronization, but also induces instability (Turing (1952)) or even chaotic activity(Yamada & Kuramoto (1976); Fujii & Tsuda (2004); Tsuda et al. (2004); Schweighofer et al. (2004); Tokudaet al. (2010); Katori et al. (2010); Tadokoro et al. (2011); Tokuda et al. (2019)). Fujii & Tsuda (2004)and Tsuda et al. (2004) reported that introducing diﬀusion coupling through gap junctions between class1 neurons induces chaotic dynamics. Schweighofer et al. (2004) proposed a theory that the abundant gapjunctions in the inferior olive produce chaotic neural activity that enables eﬃcient transmission of informationin the high-frequency components of inputs. It has also been proposed that the adaptive strength of thegap junction in the inferior olive regulates the degrees of freedom of the system, and the brain modiﬁes thegap junction strength during learning to ensure that the system operates at an optimal level of degrees offreedom (Kawato et al. (2011); Tokuda et al. (2013, 2017); Hoang et al. (2019)). The possible computationalrole of diﬀusion coupling through gap junction in the granular layer should be elucidated as well.The majority of cerebellar computational theories assume that the cerebellum is a supervised machinethat learns a desirable input-output relationship (Marr (1969); Albus (1971); Ito (1970); Kawato et al. (1987);Buonomano & Mauk (1994); Wolpert et al. (1998); Schweighofer et al. (2004); Yamazaki & Tanaka (2007);Raymond & Medina (2018)). It is well known that two major input pathways converge on the Purkinje cells;the mossy ﬁber-granular layer-Purkinje cell pathway originating from a precerebellar nucleus such as thepontine nucleus, and the climbing ﬁber-Purkinje cell pathway originating from the inferior olive (Ruigroket al. (2015)). These theories assume that the former pathway is the input layer of the supervised machineand the latter pathway conveys the supervising signals. The computational role of the cerebellar granularlayer is assumed to be the preprocessing – feature engineering – of incoming signals from the mossy ﬁbers.It transforms an input to a dynamical representation in a high-dimensional space realized by the enormousnumber ( ∼ in human) of the granular neurons (Marr (1969); Albus (1971); Badura & De Zeeuw (2017);Raymond & Medina (2018)). The granule cells and Golgi cells are the two major components of the cerebellargranular layer. Even though the major outputs of the granular layer are conveyed by the parallel ﬁbers of thegranule cells, the Golgi cells are also thought to play the central role of forming the representation because ofthe lack of direct recurrent connection within granule cells (Marr (1969); Albus (1971); Raymond & Medina(2018)).It has long been known that the cerebellum plays a crucial role in motor learning, which requires exe-cution of sequential movements with temporally precise timing (Ito (1984)). For example, a vast amountof experimental studies have characterized the essential information ﬂow in the cerebellum that supportsmotor learning called classical eyeblink conditioning (Thompson (2005)). In a typical eyeblink conditioning,the animal is exposed to paired presentation of tone stimulus and a periorbital air puﬀ stimulus intervenedwith a ﬁxed interval (typically 250 ms) repetitively. After learning occurs, the animal acquires a tempo-rally precise motor response (eyeblink) to the tone. The eyeblink response is precisely timed at the air puﬀonset with millisecond precision. It is also known that the interval discrimination task can be learned in2yeblink conditioning: animals can learn to elicit eyeblink responses at diﬀerent latencies to diﬀerent tonestimuli (Kehoe et al. (1993); Green & Steinmetz (2005)). Vast amount of evidence supports the fact thatthe cerebellum acquires the desired map to return a speciﬁc spatiotemporal pattern to a speciﬁc input. Toexplain this computation of the cerebellum, Buonomano & Mauk (1994) proposed a model of the granularlayer consisting of sparse reciprocally connected granule cells and Golgi cells that are capable of representingthe passage of time from the onset of an external sensory stimulus. In their model, mossy ﬁber excitationconveying the information of external tone stimulus elicits activity of the granule cells and the Golgi cells,with diﬀerent sub-populations activated at diﬀerent times. As a result, a speciﬁc sub-population of thegranule cells is activated at a speciﬁc time from the onset of the stimulus, thereby representing the passageof time. This model successfully explains an important aspect of the behavioral and physiological traits ineyeblink conditioning.Yamazaki & Tanaka (2007) extended Buonomano’s model, and proposed the view that the cerebellumis a liquid state machine – a type of reservoir machine (Jaeger (2001); Maass et al. (2002)). In the reservoircomputing framework, the input signals to the system project to a recurrent network called reservoir that hasa highly nonlinear dynamics in a high dimensional space, and only the readout connections from this reservoirare modiﬁed to give the desired output signals. In Yamazaki’s model, the general computational role of thecerebellum is to acquire a map between spatiotemporal input patterns and desired spatiotemporal outputpatterns. This is a natural elaboration of the classical Marr-Albus-Ito model regarding the cerebellum as asupervised machine, in that the reservoir machine can process spatiotemporal patterns. The granular layerworks as the reservoir, and long-term depression (LTD) of the parallel ﬁber-Purkinje cell connection worksas the learning rule. The Purkinje cell corresponds to the output neuron of the reservoir. They showed thatthe network of random recurrent connections between granule and Golgi cells realized temporally speciﬁcactivation of diﬀerent sub-populations of granule cells in response to external inputs. Their model successfullyacquires a function that maps speciﬁc diﬀerent inputs to speciﬁc diﬀerent temporal patterns. To date, severalstudies of cerebellar function with the reservoir computing framework have been conducted (Yamazaki &Nagao (2012); R¨ossert et al. (2015)). However, even though these theories assume the inevitable functionalrole of the Golgi cells in realizing the reservoir, to our knowledge, no study has focused on the functionalrole of massive gap junctions between the Golgi cells. Considering the facts that chaotic activity is relatedto the performance of the reservoir (Jaeger (2001); Bertschinger & Natschl¨ager (2004); Natschl¨ager et al.(2005); Sussillo & Abbott (2009); Yildiz et al. (2012); Laje & Buonomano (2013)) and that gap junctionoften induces chaotic activity (Yamada & Kuramoto (1976); Fujii & Tsuda (2004); Tsuda et al. (2004);Schweighofer et al. (2004); Katori et al. (2010); Tokuda et al. (2010); Tadokoro et al. (2011); Tokuda et al.(2019)), it is necessary to elucidate how the gap junctions aﬀect the computational performance of thegranular layer as the reservoir, especially in terms of the eﬀect of chaotic dynamics it may produce.3 Purkinje cell)Instructional Signal(climbing fiber)

Reservoir

Input (parallel fiber)(mossy fiber) a b

Output (Purkinje cell)Instructional Signal(climbing fiber)

Reservoir

Input (mossy fiber)

Output W out (parallel fiber) W out : excitatory: inhibitory: Golgi cell : guranule cell Figure 1: Two network architectures of the reservoir machine considered in the current study. The assumed correspondinganatomical structures of the cerebellum are also indicated. (a) The Golgi cells coupled with gap junctions are the reservoir.Other structures in the granular layer, such as the granule cells and chemical synapses, are omitted in order to focus on studyingthe computational performance of neurons with gap junctions acting as the reservoir. (b) The model incorporating the granulecells, the excitatory projections from the granule cells to the Golgi cells, and the inhibitory projections from the Golgi cells tothe granule cells.

2. Methods

Figure 1 shows the schematic diagrams of the two network architectures of the reservoir machines studiedin the current study. We take the view that the cerebellum is a reservoir machine (Yamazaki & Tanaka(2007)). In this view, the granular layer of the cerebellum works as the reservoir, the mossy ﬁber projecting tothe granular layer is the input, and the Purkinje cell is the output neuron. Learning via synaptic modiﬁcationis realized by changing the connection strength of the parallel ﬁbers, which are the readout connections. Thereservoir maps an incoming input into a high-dimensional time series by nonlinear dynamics. Figure 1(a)shows a simple reservoir machine composed of only Golgi cells that are mutually connected with diﬀusioncoupling through gap junctions. In this study, we mainly focused on this model to evaluate the computationalperformance of diﬀusion-induced chaos as a reservoir machine. To conﬁrm whether the results obtained withthis model can also be observed in a more realistic situation, we perform a simulation incorporating thegranule cells as well (Fig. 1(b)). This model incorporates other major components of the granular layer:the granule cells, the excitatory projections from the granule cells to the Golgi cells, and the inhibitoryprojections from the Golgi cells to the granule cells. The readout projection to the Purkinje cell originatesfrom the granule cells as the actual cerebellar anatomical structure. Note that we do not incorporate afeedback loop from the reservoir output to the input in this study.

A cerebellar Golgi cell is known to have the following properties: (1) it shows a periodic activity invitro (Forti et al. (2006); Solinas et al. (2007); Vervaeke et al. (2010)), (2) its spike frequency increases as4xternal input increases (Forti et al. (2006); Solinas et al. (2007)), and (3) its diﬀusion coupling inducesdesynchronization of neighboring cells (Vervaeke et al. (2010)). We model the Golgi cells with the µ -model,which is a simple model described by a two-dimensional ordinary diﬀerential equation (Tsuda et al. (2004)).The µ -model is a class 1 neuron model that shows spiking activity after a saddle-node bifurcation occursas tonic external input increases. This model shows periodic activity under isolated conditions with a tonicinput, increases spike frequency to increasing tonic input, and shows aperiodic activity when coupled withdiﬀusion (Tsuda et al. (2004); Katori et al. (2010); Tadokoro et al. (2011); Tokuda et al. (2013); Tokudaet al. (2019)). The simple model shown in Fig. 1(a) is described as follows:d V go i d t = − R go i − µ · ( V go i ) ( V go i −

32 ) + J go i + I go , tonic + I go , input i , (1)d R go i d t = − R go i − µ · ( V go i ) , (2) J go i =  g GJ ( V go2 − V go1 ) (for i = 1) ,g GJ ( V go i +1 + V go i − − V go i ) (for i = 2 , ..., N go − ,g GJ ( V go N go − − V go N go ) (for i = N go ) , (3)where V go i is the membrane potential of the i th Golgi cell, R go i is the recovery variable representing the ionchannel activity of the i th Golgi cell, µ is the parameter of the µ -model, I go , tonic is the common tonic input toall cells, J go i is the diﬀusion current into the i th Golgi cell from the neighboring cells conducted through thegap junctions, g GJ is the conductance of a gap junction, I go , input i is the input signal to the reservoir describedbelow, and N go is the number of Golgi cells. The model only diﬀers from that of the former studies (Fujii &Tsuda (2004); Tsuda et al. (2004); Tadokoro et al. (2011)) in that the input signal I go , input i is incorporatedin Eq. (1). In the µ -model, both the units of time and the variables are arbitrary. We use milliseconds asthe unit of time for convenience. We use parameters µ = 1 . , g GJ = 0 .

08 (typical parameter set showingchaotic dynamics) except in Figs. 4, 5 where the dependency of the dynamics on these parameters is studied,and in Figs. 3, 9 where g GJ = 0 is used. For the parameter I go , tonic , we use I go , tonic = 0 .

004 other than in asimulation shown in Fig. 2(b) and (c).We restrict the form of the input signal I go , input ∈ R N go to an instantaneous pulse as follows: I go , input = x in δ ( t − t ) , (4)where δ ( t ) is the Dirac delta function, x in ,i ∈ R N go is the N go -dimensional vector representing the amplitudeof the input, t is the time when the input is given to the reservoir. Practically, giving an input pulse isdone by setting V go → V go + x in at time t , where V go = ( V go i ) ∈ R N go is the vector representation of themembrane potentials. In Sec. 3.4, a series of two diﬀerent pulses are given to the system. In this case, theinput signal I go , input is described as follows: I go , input = x in , δ ( t − t ) + x in , δ ( t − t ) , (5)where t and t are the times when the input pulses are given to the reservoir.5he model incorporating the granule cells shown in Fig. 1(b) is described as follows:d V gr i d t = − R gr i − µ · ( V gr i ) ( V gr i −

32 ) + I gr , tonic + I ei i + I gr , input i , (6)d R gr i d t = − R gr i − µ · ( V gr i ) , (7)d V go i d t = − R go i − µ ( V go i ) ( V go i −

32 ) + J go i + I go , tonic + I ie i + I go , input i , (8)d R go i d t = − R go i − µ ( V go i ) , (9)where V gr i is the membrane potential of the i th granule cell, R gr i is the recovery variable representing the ionchannel activity of the i th granule cell, I gr , tonic is the common tonic input to the granule cells, and I gr , input i is the input signal to the reservoir projecting to the i th granule cell. The diﬀusion currents are described asfollows: J go i =  g GJ ( V go2 − V go1 ) (for i = 1) ,g GJ ( V go i +1 + V go i − − V go i ) (for i = 2 , ..., N go − ,g GJ ( V go N go − − V go N go ) (for i = N go ) . (10)The currents I ei i , I ie i are the currents caused by chemical synapses, each representing the inhibition of the i th granule cell by the Golgi cells and the excitation of the i th Golgi cell by the granule cells, respectively,described with the following equations: I ei i = N go (cid:88) j =1 w ei ij f ( V go j − θ ) , (11) I ie i = N gr (cid:88) j =1 w ie ij f ( V gr j − θ ) , (12)where w ei ij is the strength of the synaptic connection from the j th Golgi cell to the i th granule cell, w ie ij is thestrength of the synaptic connection from the j th granule cell to the i th Golgi cell, θ is a parameter deﬁningthe threshold above which each neuron can be regarded as emitting a spike, and f ( V ) is an activationfunction. In this study, we use θ = 0 . , f ( V ) = − V ) . The synaptic strengths are determined usingthe following procedure. For each granule cell i , n ei presynaptic Golgi cells are randomly chosen. Then, thestrength is set at w ei ij = c ei /n ei , if the j th Golgi cell is in the chosen group. The strength is set at w ei ij = 0, ifthe j th Golgi cell is not in the chosen group. Similarly, for each Golgi cell i , n ie presynaptic granule cells arerandomly chosen. Then, the strength is set at w ie ij = c ie /n ie , if the j th granule cell is in the chosen group.The strength is set at w ie ij = 0, if the j th granule cell is not in the chosen group. The parameter valuesused are µ = 1 . , g GJ ∈ { , . } , N gr = 10 , N go = 10 , n ei = 4 , n ie = 10 , c ie = − c ei = 0 .

2. Theseparameters are determined such that the ratio of the numbers of granule cells and Golgi cells, N gr /N go , andthe number of synapses each neuron receive, n ei , n ie , are compatible with those described in the formerstudy (Sudhakar et al. (2017)).Neurons in a speciﬁc subset of the reservoir neurons are connected to the outputs, which we refer to asthe projecting neurons hereafter. Let V out i be the membrane potential of the i th projecting neuron. In the6imple model shown in Fig. 1(a), all the Golgi cells are the projecting neurons. Thus, V out i = V go i . In themodel shown in Fig. 1(b), all (and only) the granule cells are the projecting neurons. Thus, V out i = V gr i .The i th output of the reservoir, y i is deﬁned as follows: y i = N out (cid:88) j =1 w out ij V out j , (13)where N out is the number of projecting neurons in the reservoir, and w out ij is the synaptic weight of theconnection from the j th projecting neuron in the reservoir to the i th output y i . The vector y = ( y i ) ∈ R N y deﬁnes the instantaneous output of the reservoir machine, where N y is the number of outputs (the numberof Purkinje cells considered). The output synaptic weight w out ij is time independent, and its value is modiﬁedonly in the batch learning procedure described below. Let y target ∈ R N y be the target pattern consisting of N y -dimensional time series deﬁned over a timeinterval [ t , t + T train ], where T train is the length of the time series. We determine the readout weightmatrix W out = (cid:2) w out ij (cid:3) ∈ R N y × N out to minimize the following residual value: (cid:90) t + T train t | y target − y | d t = (cid:90) t + T train t | y target − W out V out | d t, (14)where | x | is the Euclidean norm of a vector x . In practice, this is conducted by sampling both the outputvectors and the target vectors with a small sampling interval ∆. Let Ω ∈ R N y × K be the discretized timeseries of the numerically integrated membrane potentials of the reservoir dynamics over the time interval[ t , t + T train ] as follows: Ω = (cid:0) V out ( t ) , V out ( t + ∆) , V out ( t + 2∆) , · · · , V out ( t + K ∆) (cid:1) , (15)where K is the natural number satisfying K ∆ ≤ T train < ( K + 1)∆, and V ( t ) = ( V out i ( t )) ∈ R N out is theinstantaneous membrane potentials of the projecting neurons. The target pattern matrix Y is also deﬁnedby the same discretization as follows: Y target = (cid:0) y target ( t ) , y target ( t + ∆) , y target ( t + 2∆) , · · · , y target ( t + K ∆) (cid:1) . (16)Then, the optimal readout weight matrix (cid:99) W out is obtained by solving the following linear least squareregression: (cid:99) W out = arg min | Y target − W out Ω | . (17)where | W | fro is the Frobenius norm of a matrix W . 7 .4. Evaluation of model performance In order to evaluate the performance of the model, we use normalized root mean square error normalizedby that of the target pattern (nRMSE), as follows:nRMSE =  (cid:90) t + T train t ( | y target − (cid:99) W out V out | )d t (cid:90) t + T train t ( | y target | )d t.  (18)With the discretized time series, nRMSE is calculated as follows:nRMSE = | Y target − (cid:99) W out Ω | fro | Y target | fro . (19) For the simple model (Fig. 1(a)) under no dynamical input, we characterized the strength of chaoticactivity of the reservoir with the Lyapunov dimension (Kaplan & Yorke (1979)). First, we calculate theLyapunov spectrum by the standard method with continuous Gram-Schmidt orthonormalization of thefundamental solutions to the linearized diﬀerential equation along the trajectory (Shimada & Nagashima(1979)). Then, let λ ≥ λ ≥ λ · · · ≥ λ N go be the Lyapunov exponents of the reservoir dynamics, and k bethe maximal value of j such that (cid:80) ji =1 λ i ≥

0, the Lyapunov dimension of the system is deﬁned as follows: D L = k + (cid:80) ki =1 λ i | λ k +1 | . (20)In this study, the Lyapunov dimension of the system is calculated under no external input to the system,where the system can be regarded as an autonomous dynamical system. Because we restrict the form of theinput to the system as an instantaneous pulse (Eq. (4), the property of the reservoir without any externalinput characterizes the system’s response to the external input. It is commonly assumed that the cerebellar granular layer is able to exhibit activity speciﬁc to thepassage of time in response to an input (Buonomano & Mauk (1994); Yamazaki & Tanaka (2007)). In orderto evaluate the speciﬁcity of the instantaneous activity of the reservoir with respect to the passage of timeand external input, we use the similarity index between diﬀerent states of the reservoir. Let V out , (1) ( t ) and V out , (2) ( t ) be two diﬀerent time series of the projecting neurons of the reservoir generated with two diﬀerentexternal inputs. We use the following correlation function between the instantaneous values of V out , (1) ( t )and V out , (2) ( t ) at two time points t and t , which we call the similarity index: C (1) , (2) ( t , t ) = (cid:80) N out i =1 ( V out , (1) i ( t ) − (cid:104) V out , (1) ( t ) (cid:105) )( V out , (2) i ( t ) − (cid:104) V out , (2) ( t ) (cid:105) ) (cid:113)(cid:80) N out i =1 ( V out , (1) i ( t ) − (cid:104) V out , (1) ( t ) (cid:105) ) (cid:113)(cid:80) N out i =1 ( V out , (2) i ( t ) − (cid:104) V out , (2) ( t ) (cid:105) ) , (21)where (cid:104) V out , (1) ( t ) (cid:105) is the mean membrane potential at time t as follows: (cid:104) V out , (1) ( t ) (cid:105) = 1 N out N out (cid:88) i =1 V out , (1) i ( t ) . (22)8 .7. Numerical calculation The numerical simulations were conducted using the ode45 function in Matlab R2019a (MathWorks Inc.,Natick, MA, USA). The obtained time series of the reservoir states were further discretized with time step∆ = 0 . interp1 function (Eq. (15)). The learning procedure to obtain the optimal readout weightmatrix (cid:99) W out was conducted by solving a linear least squares regression in Eq. (17) using the Matlab function mldivide .

3. Results f r equen cy ( s - ) t i m e ( m s ) a b cde f V go x x x V Figure 2: The chaotic dynamics induced by gap junctions in the simple reservoir consist of a one-dimensional chain (Eqs. (1-3),Fig. 1(a)). (a) The dynamics of the model with a small positive tonic input, I go , tonic = 0 . V go . The parameter values are N go = 100 , µ = 1 . , g GJ = 0 . , I = 0 . I go , tonic = − . N = 100 in (b) and N go = 1000 in (c). (d) The time evolution of the membrane potentials of the 50th neuron in (a). (e) Periodic activity of anisolated µ -model neuron under a small positive tonic input, I go , tonic = 0 . × seconds is shown. The bin width is 1 ms. First, we show that introducing diﬀusion coupling causes chaotic dynamics, which in turn results ina broad distribution of interspike interval (ISI) of a neuron in the model. Figure 2 shows examples ofthe evolution of the simple reservoir (Fig. 1(a)) described by Eqs. (1-3). Namely, the model network isa one-dimensional chain of neurons coupled to nearest neighbors with gap junctions and does not consist9f any connections via chemical synapses. Figure 2(a) illustrates an evolution of the membrane potentials( N go = 100) under a positive tonic input, I go , tonic = 0 . t = 0 is given with x in in Eq. (4) as x in = 0 . e , to the all-synchronized state, where e is the 50th standard basis. Namely,the membrane potential of the neuron in the center (50th neuron) is set as V go50 → V go50 + 0 .

2. The networkshows chaotic activity induced by diﬀusion coupling, as reported previously (Tsuda et al. (2004); Tadokoroet al. (2011)). Periodic synchronous activity is observed before the propagation of chaotic dynamics is elicitedby an external perturbation. The behavior of the cell in the center of the network is shown in Fig. 2(d).The activity is quite diﬀerent from an isolated µ -model neuron with the same parameter, as shown inFig. 2(e) ( I go , tonic = 0 . , µ = 1 . µ -model shows a saddle-node bifurcation at I go , tonic = 0,and it has a periodic spiking activity with I go , tonic > I go , tonic < × seconds (Fig. 2(f)) by regarding the neuron emitting a spike whencrossing the threshold θ = 0 . I go , tonic = − . N go = 1000) shown in Fig. 2(c) clearly depictsself-similarity of the spatiotemporal pattern. Namely, the spatiotemporal pattern has no characteristic scale.The spatiotemporal pattern with positive tonic input (Fig. 2(a)) could be interpreted as a ruined pattern ofthe Sierpinski gasket (Fig. 2(b)), thus inheriting the fractals property of scale invariance over multiple timescales (i.e., broad distribution of ISI). Next, we evaluate the eﬀect of introducing gap junctions on the expressivity of the reservoir. Morespeciﬁcally, we examine how closely the model can output a sinusoidal temporal pattern with various temporalfrequencies. Namely, we use the following sinusoidal wave (a scalar function of time t ) as the target pattern y target described in Eq. (14): f ( t ; T wave ) = sin (cid:18) πtT wave (cid:19) , (23)where T wave is the period of the sinusoidal wave. We quantify the dependency of the following nRMSE valueon the period of the target sinusoidal wave, T wave , and on the model parameters:nRMSE( T wave ) = (cid:32) RSS( T wave ) (cid:82) t + T train t | f ( t ; T wave ) | d t (cid:33) , (24)where RSS( T wave ) is the value of the objective function described in Eq. (14), which depends on T wave .The spectrum of the value nRMSE( T wave ) over various T wave gives a way to evaluate the model’s ability10o output a more general complex temporal pattern that contains various frequency components. Supposethere is a target pattern, y target that is composed of a linear superposition of various sinusoidal waves asfollows: y target = N wave (cid:88) k =1 c k f ( t ; T wave k ) , (25)where N wave is the number of diﬀerent sinusoidal waves that compose the target pattern, T wave k is the periodof the k th sinusoidal pattern. Let nRMSE target be the optimal nRMSE values for y target that minimizes theobjective function described in Eq. (14). Then, from the triangle inequality, the following inequality holds: (cid:0) nRMSE target (cid:1) ≤ N wave (cid:88) k =1 c k (nRMSE( T wave k )) . (26)Thus, the right hand side of Eq. 26 gives the upper limit of the nRMSE value for the target pattern.11 b - - - T wave (ms) time (ms) 5000500 T wave =1 T wave =2 T wave =3 T wave =10 T wave =40 T wave =100 time (ms) time (ms) 5000500 T wave =1 T wave =2 T wave =3 T wave =10 T wave =40 T wave =100 c g GJ /N g GJ /N n R M SE Figure 3: Evaluation of the expressivity in the frequency domain. (a) The readout weights are determined separately for eachtarget sinusoidal pattern with a diﬀerent period, T wave . Each dashed line indicates the target sinusoidal pattern and each solidline indicates the ﬁtted output of the model. The parameters are T train = 500 ms , µ = 1 . , g GJ = 0 . , N go = 500 , I go , tonic =0 . , ∆ = 0 .

1. Note that the spike width of a neuron is ∼ t = [0 50] areshown for T wave ≤

3, though ﬁtting is done for t = [0 500]. (b) Without diﬀusion through the gap junctions ( g GJ = 0). Eachsolid line indicates the ﬁtted output of the model. (c) The nRMSE dependency on the period of the target pattern, with gapjunctions ( g GJ = 0 .

08) (solid line), and without gap junctions ( g GJ = 0) (dashed line). To evaluate nRMSE( T wave ) (Eq. (24)), the following analysis is conducted. Firstly, a time evolutionover [ t , t + T train ] of the reservoir with a random initial input at t = 0 is numerically generated. Then,we use various sinusoidal waves (scalar function of time) over the same time interval [ t , t + T train ] as thetarget patterns. The readout connection is ﬁtted to match each sinusoidal wave separately by the linear leastsquares. The dashed lines in Fig. 3(a) indicate the target sinusoidal patterns and the solid lines indicate theﬁtted outputs of the model. The diﬀerence between the target patterns and the model outputs are visuallynot detectable when T wave ≥ ∼ g GJ = 0 .

08 to g GJ = 0. Second, the initial state of the reservoir is generatedby randomly shuﬄing the phases of neurons. This is because, with g GJ = 0, all neurons behave as parallelisolated neurons with a shared identical period. Thus, the distribution of the phase of the neurons is timeinvariant, and biased distribution of the phase of the neurons should be disadvantageous in producing thesinusoidal target patterns. The precision of the model output drastically decreases compared to the conditionwith diﬀusion coupling through gap junctions (Fig. 3(b)). The nRMSE values for both conditions are shownin Fig. 3(c). The nRMSE takes very small values over a wide range of the period of the target pattern whenthe gap junctions are incorporated in the model, whereas the nRMSE takes small values only at some speciﬁcrange of the period if the model lacks the gap junctions. This result suggests that incorporating diﬀusioncoupling in a reservoir enhances the expressivity of the output that the network can generate. nRMSEFigures 4 and 5 illustrate the parameter dependency of nRMSE and the Lyapunov dimension. The colormap in Fig. 4(a) depicts the dependency of the nRMSE on the strength of gap junction coupling, g GJ /N ,and the target wave period T wave . Figure 4(b) illustrates the nRMSE dependence on g GJ /N when thetarget pattern period is T wave = 100 ms. We employ g GJ /N rather than g GJ because diﬀerent networkmodels with diﬀerent sizes, but a same value of g GJ /N , behave qualitatively the same (Tadokoro et al.(2011)). This is because the model described with Eqs. (1-3) can be regarded as a discretization of a partialdiﬀerential equation of a continuous one-dimensional excitable media. Models with diﬀerent network sizes, N go , and the same value of g GJ /N correspond to discretizations of the same partial diﬀerential equation withdiﬀerent spatial resolutions. Equivalently, diﬀerent models with a same network size N and diﬀerent g GJ values show spatiotemporal patterns with diﬀerent spatial scales proportional to 1 / √ g GJ . As illustrated inFigs. 4(a) and (b), the nRMSE takes small value at a wide but speciﬁc range of g GJ (10 − . ≤ g GJ ≤ − . ).At the same time, the Lyapunov dimension, D L , takes a large value at the same range of g GJ (Fig. 4(c)).The Lyapunov dimension characterizes the strength of chaos, and represents the degrees of freedom of thedynamics (Kaplan & Yorke (1979)). Figure 5 illustrates the nRMSE dependency on the parameter µ . Chaoticactivity induced by diﬀusion coupling appears at a speciﬁc range of µ , as shown in the positive Lyapunovdimension in Fig. 5(c). As in the case of the dependency of the nRMSE value on g GJ (Fig. 4), the nRMSEtakes very small values when the dynamics is chaotic and the Lyapunov dimension is large. These resultsshowing the inverse correlation between the Lyapunov dimension and nRMSE suggest that diﬀusion-inducedchaotic dynamics enhances the complexity of the representation in the reservoir.13 L abc T w a v e ( m s ) n R M SE - x - - - g GJ /N - - - - - - - - - - - g GJ /N g GJ /N Figure 4: The dependency of the nRMSE on the periodof the target pattern, T wave , and the strength of the gapjunction, g GJ /N . (a) The nRMSE is illustrated overvarious values of T wave and g GJ /N . The parameter val-ues are N go = 500 , T train = 500 ms , µ = 1 . , I go , tonic =0 . , ∆ = 0 .

1. (b) The nRMSE value for a target pat-tern with T wave = 100 ms plotted against g GJ /N . (c)The Lyapunov dimension plotted against g GJ /N . D L

21 3 abc n R M SE µ

21 3 µ

21 3 µ T w a v e ( m s ) - - - - x Figure 5: The dependency of the nRMSE on the periodof the target pattern, T wave , and the parameter, µ . (a)The nRMSE is illustrated over various values of T wave and µ . The parameter values are N go = 500 , T train =500 ms , g GJ = 0 . , I go , tonic = 0 . , ∆ = 0 .

1. (b) ThenRMSE value for a target pattern with T wave = 100 msplotted against µ . (c) The Lyapunov dimension plottedagainst µ . We evaluate the reservoir’s ability to represent the passage of time with the similarity index (Eq. (21)).The color map in Fig. 6(a) shows similarity indices C (1) , (1) ( t , t ) deﬁned by Eq. (21), calculated within atime series, V go , ( t ), shown in the upper and left panels in Fig. 6(a). An external perturbation is given at t = 0 to an all-synchronized state as V go50 → V go50 + 0 .

5. The length of the time series V go , ( t ) is 1000 ms, andit is discretized with a time step of 1 ms to calculate the similarity indices, yielding a 1000 × C (1) , (1) ( t , t ) is calculated within one time series,the diagonal elements are all 1. Figure 6(b) shows the histogram of the values of upper triangular elementsof the C (1) , (1) ( t , t ) shown in panel (a). Most of the elements have smaller values than ∼ .

4. This suggests14hat the state of the system does not come back close to the same point in the phase space –close enough thatthe similarity index takes high value close to 1 – within 1000 ms. In other words, the state of the reservoiractivity has speciﬁcity to the passage of time. The distribution of the similarity index between diﬀerenttime points shifts towards even smaller values with larger network size, as shown in the case of N go = 500(Fig. 6(d)). t time (ms) a bc d

10 10000 V go, 1 t V go , t V go , t V go, 1 f r equen cy f r equen cy C (1) , (1) ( t - t ) C (1) , (1) ( t - t ) Figure 6: The reservoir state is speciﬁc tothe passage of time from a speciﬁc input.(a) Similarity indices C (1) , (1) ( t , t ) calcu-lated within a time series. The upper andleft panels show the time series, V go , ( t ):the membrane potentials of reservoir neu-rons over 1000 ms. The neuron in the cen-ter (cell t = 0 as V go50 → V go50 + 0 .

5. The parameter values are µ = 1 . , g GJ = 0 . , N go = 100 , I go , tonic =0 . .

01. Normalizedwith total counts. (c) The similarity indicescalculated between two time series, V go , ( t )and V go , ( t ). The time series in the left, V go , ( t ), evolves with the same initial con-dition as V go , ( t ), but an additional inputpulse is given at the 100th cell at t = 200 ms(shown with an open magenta circle). Thelower panel shows the diagonal elements ver-sus time. (d) Histogram of the upper trian-gular part of a matrix of similarity indicescalculated with a model with N go = 500cells. Normalized with total counts. The discriminative ability of the reservoir to diﬀerent inputs is also important. Figure 6(c) shows thesimilarity indices calculated between two time series, V go , (1) ( t ) and V go , (2) ( t ), that have slightly diﬀerentinputs. The time series shown in the left panel, V go , (2) ( t ), evolves with the same initial condition as V go , (1) ( t )shown in Fig. 6(a), but an additional input pulse is given to the cell at the edge of the network at t = 200 as V go1 → V go1 + 0 . V go , (1) ( t ), and the correlation between the two timeseries vanishes at around t = 350 ms (Fig. 6(c), lower panel). This can be explained by the sensitivity tothe initial condition of chaotic dynamics. Thus, the reservoir’s response to input has high speciﬁcity to theinput. 15 .5. Generation of diﬀerent activities for diﬀerent inputs Next, we examine whether a model with a ﬁxed readout weight matrix is actually able to generate diﬀer-ent temporal patterns as outputs for diﬀerent inputs. Firstly, we show a simulation of eyeblink conditioning:an extensively studied model of cerebellar dependent learning, which has been reproduced in several compu-tational studies as well (Buonomano & Mauk (1994); Bullock et al. (1994); Medina et al. (2000); Yamazaki& Tanaka (2007); Li et al. (2013); Yamazaki & Igarashi (2013)). We simulated a situation where the modeloutputs a speciﬁc time series with respect to a speciﬁc external input. In eyeblink conditioning, animalsare able to acquire motor response with diﬀerent timings to diﬀerent types of tone stimuli (Kehoe et al.(1993); Green & Steinmetz (2005)). For example, an animal can learn to elicit a motor reﬂex to a tonestimulus 200 ms after the stimulus onset if the pitch of the tone is 600-Hz, and 600 ms after the onset ifits pitch is 1-kHz (Kehoe et al. (1993)). To reproduce this phenomenon, we consider the case where themodel output, y , is a scalar function of time representing the eyeblink response, and train the model withmultiple target patterns with its peaks at diﬀerent latencies from the external input. Figure 7(a) showssimilarity indices calculated between four time series of the reservoir state with a same initial condition andfour diﬀerent inputs x in , , x in , , x in , , x in , given at time t = 50. Four inputs, x in , , x in , , x in , , x in , are generated from a multivariate Gaussian distribution N ( , (0 . E ), where E is the identity matrix. Asshown in Fig. 7(a), the four spatiotemporal patterns with the inputs x in , , x in , , x in , , x in , rapidly losesimilarity among them. When the time series with the inputs x in , , x in , , x in , , x in , are ﬁtted to diﬀerenttarget patterns (dashed lines in Fig. 7(b)) simultaneously, the model acquires diﬀerent outputs assigned foreach time series of the reservoir (solid lines). 16 time (ms)0 300 x in, 0 x in, 1 x in, 3 x in, 2 x in, 0 x in, 1 x in, 3 x in, 2 y target y x i n , x i n , x i n , x i n , a b Figure 7: Diﬀerent output patterns for diﬀerent inputs acquired by the model with a same readout connection matrix. (a)Each heat maps show the similarity indices calculated between two diﬀerent time series corresponding to two diﬀerent inputs.The parameter values are µ = 1 . , g GJ = 0 . , N go = 500 , I go , tonic = 0 . x in , , x in , , x in , , x in , are ﬁtted to diﬀerent target patterns (dashed lines) with a shared readout connection. The outputpatterns are plotted with solid lines. We also demonstrate the model’s ability to generate two diﬀerent human motions responding to twodiﬀerent inputs. Figure 8 shows the two outputs of a learned model with a ﬁxed output matrix: walking(Fig. 8(a)) and boxing (Fig. 8(b)). The model is trained on two time series simultaneously, each for a speciﬁcinput at time t = 0. The motion capture data provided by the Carnegie Mellon University Motion CaptureLibrary (MOCAP) (http://mocap.cs.cmu.edu/) were used. Datasets 08 01.amc (walking) and 13 17.amc(boxing) were used as the target output patterns. Three variables representing the spatial oﬀset of the personare discarded from the 62 dimensional signal. Thus, the target patterns are the remaining 59-dimensionaltemporal patterns. The result demonstrates that the model is able to generate completely diﬀerent temporalpatterns with a ﬁxed readout connection, when the initial condition is diﬀerent. See also the supplementaryvideo (walk.avi, box.avi). 17 b Figure 8: The network showing walking (a) and boxing (b) patterns for two diﬀerent initial inputs. The upper panels showthe human motion generated by the model output visualized using a skeleton, and the lower panels show the correspondingraw output signals of the model representing the angles of the bones. Only the representative nine out of 59 traces are shownfor simplicity. The same readout connection matrix is used for both movements, walking and boxing. The motion capturedata from a human are learned. The parameters are µ = 1 . , g GJ = 0 . , N go = 500 , I go , tonic = 0 . Lastly, we brieﬂy conﬁrm that the observed property of the simple model (Fig. 1(a)) can also be repro-duced in the model incorporating the excitatory granule cells and the chemical synapses, shown in Fig. 1(b)(Eqs. (6)–(10)). A model incorporating 10 granule cells, 100 Golgi cells, and the reciprocal connectionsbetween the granule cells and the Golgi cells with the chemical synapses is composed. We evaluate thesimilarity index using the membrane potentials of the granule cells because the projecting neurons of thereal cerebellar granular layer to the Purkinje cells are the granule cells. Figure 9 shows examples of thedynamics of the model after a random initial input, with the gap junctions ( g GJ = 0 .

08, Figs. 9(a)-(d)) andwithout gap junctions ( g GJ = 0, Figs. 9(e)-(f)). Figures 9(a) and (b) show the raster plots of the spikes ofthe granule cells and the Golgi cells, respectively. Both the granule cells and the Golgi cells show irregularactivity with no apparent repetitive pattern. The similarity index between two diﬀerent time points withinthis time series veriﬁes that the state of the granule cells, V gr , is speciﬁc to time (Fig. 9 (c)). The histogramof the upper triangular elements of this matrix reveals the small similarity indices between two diﬀerenttime points (Fig. 9 (d)). The similarity index is distributed at far smaller values than that of the Golgimembrane potentials in the simple model (Figs. 6(c) and (d)), presumably because of the large number ofgranule cells. Figures 9 (e)–(h) show the result of same analysis with the parameter g GJ = 0. All the18ther settings are identical. It is observed that, without diﬀusion coupling, the model dynamics converges toan all-synchronized periodic orbit by the interaction through the chemical synapses. The similarity indices(Fig. 9 (g)) and the distribution of non-diagonal elements clearly show that changing the parameter g GJ to 0 abolished the model’s ability to represent the passage of time. These results suggests that the stableall-synchronized orbit exists in the system without gap junction, but it is destroyed by the chaotic dynamicsif diﬀusion current through gap junctions exists. efcd ghab o f neu r on s o f neu r on s o f neu r on s o f neu r on s t i m e ( m s ) time (ms) time (ms) t i m e ( m s ) f r equen cy C (1) , (1) ( t - t ) f r equen cy C (1) , (1) ( t - t ) With gap junctions Without gap junctions

Figure 9: The dynamics of the model incor-porating the granule cells in the reservoir.(a-d) The model behavior with gap junctionsunder random initial input. The parametersare µ = 1 . , g GJ = 0 . , N gr = 10 , N go =100 , I gr , tonic = 0 . , I go , tonic = 0 . , θ =0 .

7. (a) The spike raster plot of the granulecells. Only the 1-500th cells out of N gr = 10 cells are shown. (b) The spike raster plotof all the Golgi cells. (c) The similarity in-dex calculated between instantaneous mem-brane potentials of the granule cells at diﬀer-ent time points in the time series shown in(a). (d) Histogram of the upper triangularpart of the matrix of similarity indices shownin (c). Normalized with total counts. (e-h)The model dynamics without gap junctions.All of the conditions of simulation and prop-erties of the model are the same as those in(a), other than that parameter value g GJ = 0is used. (e) Spike raster plot showing the ac-tivities of the granule cells of a model with-out gap junctions. (f) The spike raster plotof all the Golgi cells. (g) The similarity in-dices showing high values over diﬀerent timepoints because of the rapid relaxation to asynchronized activity. (h) Histogram of theupper triangular part of the matrix of simi-larity indices shown in (g).

4. Discussion

In the current study, we investigated the computational role of the gap junction between Golgi cells inthe cerebellar granular layer. Speciﬁcally, we evaluated the computational performance of the model of thecerebellar cortex using a reservoir computing framework. First, we showed that introducing gap junctionsin the model induces chaotic dynamics that enables the reservoir to output complex patterns containing a19ide range of frequency components (Figs. 2-5). Second, we showed that the chaotic dynamics has a longnon-recursive time series that is capable of representing the passage of time (Fig. 6). These properties ofthe chaotic dynamics realize the reservoir’s ability to output the desired temporal patterns (Figs. 7, 8).Yamazaki and their colleagues have proposed a model with these abilities based on a diﬀerent mechanism,i.e., the random connection by chemical synapses between granule cells and Golgi cells (Yamazaki & Tanaka(2007)). In the current study, we pointed out another possible mechanism (diﬀusion through gap junctions)that would be capable of reproducing the aforementioned abilities of the model. Because the gap junctionsconnect neighboring neurons, the connections realized by the gap junctions must inevitably be local ratherthan distant. Thus, the average number of neighboring cells a neuron can contact cannot be as large as thecase of chemical synapses. The average degree of the network realized by gap junctions may therefore besmall. In the literature, it has been argued that the desirable feature of a reservoir is that the connectionshould be sparse (Jaeger (2001)). This is in line with our hypothesis that diﬀusion coupling by the gapjunction constitutes the reservoir in the cerebellum. On the other hand, one report showed that the small-worldness of the reservoir contributes to better performance (Kawai et al. (2019)). In the granular layerof the real brain, the chemical synapses and the gap junctions may work in concert to realize preferableproperties as a reservoir.The ISI of a neuron in the chaotic dynamics induced by diﬀusion coupling through gap junctions showsa broad distribution over a wide range of periods, unlike that of an isolated neuron model (Fig. 2). Ourresult may explain the fact that a Golgi cell exhibits periodic activity in some experimental settings suchas in vitro recordings (Forti et al. (2006); Solinas et al. (2007); Vervaeke et al. (2010)), but also shows anirregular activity with broad ISI distribution in vivo (Holtzman et al. (2006)).Reservoir computing has drawn considerable attention in recent years, as it is expected to be a suitableand powerful framework for processing temporal sequences. However, the desirable dynamical properties thatthe reservoir must have, and its proper implementation, have not been well characterized, and still remainan open question (Boyd & Chua (1985); Jaeger (2001); Maass et al. (2002); Bertschinger & Natschl¨ager(2004); Natschl¨ager et al. (2005); Yildiz et al. (2012)). A variety of systems, including both physical systemsand mathematical models, have been used as implementations of the reservoir (Tanaka et al. (2019)). Thecurrent study investigated a reservoir consisting of a reaction-diﬀusion system (i.e., neurons coupled withgap junctions) and obtained results suggesting chaos in a reaction-diﬀusion system may contribute to theperformance of the model. Further investigation of a reservoir machine using chaotic dynamics in a reaction-diﬀusion system may be an interesting direction for future study.We found that the spatiotemporal pattern of the chaotic dynamics of µ -models coupled with diﬀusionin a one-dimensional chain shows the Sierpinski gasket (Fig. 2). It is reported that some other nonlinearreaction-diﬀusion systems also shows the Sierpinski gasket (Hayase & Ohta (1998, 2000)). The simple modelin our current study (Fig. 1(a)) also belongs to the class of reaction-diﬀusion system, as it consists of a one-dimensional chain of the neurons with nearest neighbor connections. There maybe a common mathematical20tructure behind our model and these former studies. It is well known that the Sierpinski gasket appearsin the spatiotemporal patterns of a cellular automaton (CA) such as Wolfram’s Rule 90 (Wolfram (1994)).Some previous studies attempted to implement a reservoir with a CA (Yilmaz (2014); Mor´an et al. (2018)).Mor´an et al. used CA as the reservoir and constructed a classiﬁer for handwritten characters of the MNISTdataset (Lecun et al. (1998)), and compared the performance across the rules in CA. They reported thatWolfram’s Rule 90 gives the best performance in the test data of the cross-validation (Mor´an et al. (2018)).In the current study, we did not construct a classiﬁer with our model. Thus, it is diﬃcult to directly comparethe current results with Mor´an and their colleagues’ work. However, it should be an interesting direction toevaluate the performance of a classiﬁer model with a reaction-diﬀusion system as the reservoir. Additionally,studies with CA may contribute to the elucidation of the cerebellar computation.An important issue we did not consider in depth in the current study is the generalization ability of themodel. Namely, the ability of the model to generate the same output from similar input. We showed thatthe chaotic dynamics realizes the speciﬁcity of the response to the input (Fig. 6). This can be interpretedby the chaotic dynamics’ high sensitivity to the initial condition. However, the high sensitivity to the initialcondition may cause poor generalization ability, because a small noise or a deviance in the input signal growsrapidly over time. Thus, systematic evaluation of the generalization ability of the current model should bean important issue to be elucidated in the future study. The aforementioned study by Mor´an et al. showedthat Wolfram’s Rule 90, which shows the same Sierpinski gasket pattern as the current model we use, showsthe best performance in the test data of cross-validation (i.e., it shows the highest generalization ability).In their study, the MNIST data is used as the initial state of the reservoir, and the spatiotemporal patternof the evolution of CA over speciﬁc steps is used as the feature vector used for the subsequent classiﬁcationtask. Similarly, one could modify our current model so that the activity caused by the input decays withina speciﬁc time scale, before the small deviance in the input grows to the system size. This situation isactually similar in the real cerebellum because the cerebellum is believed to be able to maintain the inputinformation for a ﬁxed time, approximately ∼

500 ms (Thompson (2005); Kotani et al. (2003)). It should bean interesting issue to elucidate the relationship to previously proposed properties of the reservoir such asthe echo state property (Jaeger (2001); Yildiz et al. (2012)) or the edge of chaos (Bertschinger & Natschl¨ager(2004); Natschl¨ager et al. (2005)). Another important issue to be elucidated is the relationship between thestrength of gap junctions and the generalization ability of the model. As shown in Fig. 4, the Lyapunovdimension of the system takes a large value at a speciﬁc range of the strength of gap junctions. Some previousstudies pointed out the possibility that the strength of gap junctions changes the degrees of freedom of thenetwork dynamics, which is crucial for the generalization ability (Kawato et al. (2011); Schweighofer et al.(2013); Tokuda et al. (2017); Hoang et al. (2019)). Additionally, it should also be noted that cerebellardependent motor learning requires repetitive training (Thompson (2005)), which would help generalization.This is very diﬀerent from hippocampal dependent learning, where an episode is learned one-shot.In the last part of the Results section, we conﬁrmed that the non-recursiveness of the system is inherited in21he model incorporating the excitatory granule cells (Fig. 9). Actually, the similarity index between diﬀerenttime points within a time series (Fig. 9(d)) shows higher speciﬁcity to the passage of time than in the simplemodel consisting of only the Golgi cells (Fig. 6(c)). This suggests that the large number of granular cellscontributes to the speciﬁcity. In this study, we incorporated granular neurons with a population size only10 times larger than the Golgi cells ( N gr = 10 , N go = 10 ) because of the computational cost. However,the ratio of the number of the granule cells to the Golgi cells in the real brain is reported to be even larger,as much as 430 times (Korbo et al. (1993)). The numerous granule cells may serve a role in multiplying therepresentational ability of the Golgi neurons.In conclusion, we proposed the hypothesis that the massive gap junctions between the Golgi cells in thecerebellar granular layer contributes to expressivity by inducing chaotic dynamics.

5. Acknowledgements

This work was supported by JSPS KAKENHI (Nos. 17K16365, 19K12235, 20H04258 and 20K19882).This study was also partially supported by the JST Strategic Basic Research Programs (Symbiotic Interac-tion: Creation and Development of Core Technologies Interfacing Human and Information Environments,CREST Grant Number JPMJCR17A4). This paper is based on results obtained from a project commissionedby the New Energy and Industrial Technology Development Organization (NEDO).

6. Declaration of Conﬂict of Interest

The authors declare that there is no conﬂict of interest.Albus, J. S. (1971). A theory of cerebellar function.

Mathematical Biosciences , , 25–61.Badura, A., & De Zeeuw, C. I. (2017). Cerebellar granule cells: Dense, rich and evolving representations. Current Biology , , R415–R418.Bertschinger, N., & Natschl¨ager, T. (2004). Real-time computation at the edge of chaos in recurrent neuralnetworks. Neural Computation , , 1413–1436.Boyd, S., & Chua, L. (1985). Fading memory and the problem of approximating nonlinear operators withvolterra series. IEEE Transactions on Circuits and Systems , , 1150–1161.Bullock, D., Fiala, J. C., & Grossberg, S. (1994). A neural model of timed response learning in the cerebellum. Neural Networks , , 1101–1114.Buonomano, D. V., & Mauk, M. (1994). Neural network model of the cerebellum: Temporal discriminationand the timing of motor responses. Neural Computation , , 38–55.22ugu´e, G. P., Brunel, N., Hakim, V., Schwartz, E., Chat, M., L´evesque, M., Courtemanche, R., L´ena, C.,& Dieudonn´e, S. (2009). Electrical coupling mediates tunable low-frequency oscillations and resonance inthe cerebellar Golgi cell network. Neuron , , 126–139.Forti, L., Cesana, E., Mapelli, J., & D’Angelo, E. (2006). Ionic mechanisms of autorhythmic ﬁring in ratcerebellar golgi cells. The Journal of Physiology , , 711–729.Fujii, H., & Tsuda, I. (2004). Neocortical gap junction-coupled interneuron systems may induce chaoticbehavior itinerant among quasi-attractors exhibiting transient synchrony. Neurocomputing , , 151–157.Green, J. T., & Steinmetz, J. E. (2005). Purkinje cell activity in the cerebellar anterior lobe after rabbiteyeblink conditioning. Learning & Memory , , 260–269.Hayase, Y., & Ohta, T. (1998). Sierpinski gasket in a reaction-diﬀusion system. Phys. Rev. Lett. , ,1726–1729.Hayase, Y., & Ohta, T. (2000). Self-replicating pulses and sierpinski gaskets in excitable media. Phys. Rev.E , , 5998–6003.Hoang, H., Lang, E. J., Hirata, Y., Tokuda, I. T., Aihara, K., Toyama, K., Kawato, M., & Schweighofer, N.(2019). Electrical coupling controls dimensionality and chaotic ﬁring of inferior olive neurons. bioRxiv , .Holtzman, T., Rajapaksa, T., Mostoﬁ, A., & Edgley, S. A. (2006). Diﬀerent responses of rat cerebellarpurkinje cells and golgi cells evoked by widespread convergent sensory inputs. The Journal of Physiology , , 491–507.Ito, M. (1970). Neurophysiological aspects of the cerebellar motor control system. Int. J. Neurol. , , 162–176.Ito, M. (1984). The cerebellum and neural control . New York: Raven Press.Jaeger, H. (2001).

The “echo state” approach to analysing and training recurrent neural networks . GMDReport 148 German National Research Institute for Computer Science.Kaplan, J. L., & Yorke, J. A. (1979). Chaotic behavior of multidimensional diﬀerence equations. In H.-O.Peitgen, & H.-O. Walther (Eds.),

Functional Diﬀerential Equations and Approximation of Fixed Points (pp. 204–227). Berlin, Heidelberg: Springer Berlin Heidelberg.Katori, Y., Lang, E. J., Onizuka, M., Mitsuo, K., & Aihara, K. (2010). Quantitative modeling of spatio-temporal dynamics of inferior olive neurons with a simple conductance-based model.

International Journalof Bifurcation and Chaos , , 583–603. 23awai, Y., Park, J., & Asada, M. (2019). A small-world topology enhances the echo state property andsignal propagation in reservoir computing. Neural Networks , , 15–23.Kawato, M., Furukawa, K., & Suzuki, R. (1987). A hierarchical neural-network model for control and learningof voluntary movement. Biological Cybernetics , , 169–185.Kawato, M., Kuroda, S., & Schweighofer, N. (2011). Cerebellar supervised learning revisited: biophysicalmodeling and degrees-of-freedom control. Current Opinion in Neurobiology , , 791–800.Kehoe, E. J., Horne, P. S., & Horne, A. J. (1993). Discrimination learning using diﬀerent CS-US intervalsin classical conditioning of the rabbit’s nictitating membrane response. Psychobiology , , 277–285.Korbo, L., Andersen, B. B., Ladefoged, O., & Møller, A. (1993). Total numbers of various cell types in ratcerebellar cortex estimated using an unbiased stereological method. Brain Research , , 262–268.Kotani, S., Kawahara, S., & Kirino, Y. (2003). Trace eyeblink conditioning in decerebrate guinea pigs. European Journal of Neuroscience , , 1445–1454.Laje, R., & Buonomano, D. V. (2013). Robust timing and motor patterns by taming chaos in recurrentneural networks. Nature Neuroscience , , 925–933.Lecun, Y., Bottou, L., Bengio, Y., & Haﬀner, P. (1998). Gradient-based learning applied to documentrecognition. In Proceedings of the IEEE (pp. 2278–2324).Li, W.-K., Hausknecht, M. J., Stone, P., & Mauk, M. D. (2013). Using a million cell simulation of thecerebellum: Network scaling and task generality.

Neural Networks , , 95–102.Maass, W., Natschl¨ager, T., & Markram, H. (2002). Real-time computing without stable states: A newframework for neural computation based on perturbations. Neural Computation , , 2531–2560.Mandelbrot, B. B. (1983). The fractal geometry of nature . (3rd ed.). New York: W. H. Freeman and Comp.Marr, D. (1969). A theory of cerebellar cortex.

The Journal of Physiology , , 437–470.Medina, J. F., Garcia, K. S., Nores, W. L., Taylor, N. M., & Mauk, M. D. (2000). Timing mechanisms inthe cerebellum: Testing predictions of a large-scale computer simulation. Journal of Neuroscience , ,5516–5525.Mor´an, A., Frasser, C. F., & Rossell´o, J. L. (2018). Reservoir computing hardware with cellular automata. arXiv , .Natschl¨ager, T., Bertschinger, N., & Legenstein, R. (2005). At the edge of chaos: Real-time computationsand self-organized criticality in recurrent neural networks. In Advances in Neural Information ProcessingSystems (pp. 145–152). MIT Press. 24aymond, J. L., & Medina, J. F. (2018). Computational principles of supervised learning in the cerebellum.

Annual Review of Neuroscience , , 233–253.R¨ossert, C., Dean, P., & Porrill, J. (2015). At the edge of chaos: How cerebellar granular layer networkdynamics can provide the basis for temporal ﬁlters. PLOS Computational Biology , , 1–28.Ruigrok, T. J., Sillitoe, R. V., & Voogd, J. (2015). Chapter 9 - cerebellum and cerebellar connections. InG. Paxinos (Ed.), The Rat Nervous System (Fourth Edition) (pp. 133–205). San Diego: Academic Press.(Fourth edition ed.).Schweighofer, N., Doya, K., Fukai, H., Chiron, J. V., Furukawa, T., & Kawato, M. (2004). Chaos mayenhance information transmission in the inferior olive.

Proceedings of the National Academy of Sciences , , 4655–4660.Schweighofer, N., Lang, E., & Kawato, M. (2013). Role of the olivo-cerebellar complex in motor learningand control. Frontiers in Neural Circuits , , 94.Shimada, I., & Nagashima, T. (1979). A numerical approach to ergodic problem of dissipative dynamicalsystems. Progress of Theoretical Physics , , 1605–1616.Solinas, S., Forti, L., Cesana, E., Mapelli, J., De Schutter, E., & DAngelo, E. (2007). Computationalreconstruction of pacemaking and intrinsic electroresponsiveness in cerebellar golgi cells. Frontiers inCellular Neuroscience , , 2.Sudhakar, S. K., Hong, S., Raikov, I., Publio, R., Lang, C., Close, T., Guo, D., Negrello, M., & De Schutter,E. (2017). Spatiotemporal network coding of physiological mossy ﬁber inputs by the cerebellar granularlayer. PLOS Computational Biology , , 1–35, e1005754.Sussillo, D., & Abbott, L. F. (2009). Generating coherent patterns of activity from chaotic neural networks. Neuron , , 544–557.Tadokoro, S., Yamaguti, Y., Fujii, H., & Tsuda, I. (2011). Transitory behaviors in diﬀusively couplednonlinear oscillators. Cognitive Neurodynamics , , 1–12.Tanaka, G., Yamane, T., H´eroux, J. B., Nakane, R., Kanazawa, N., Takeda, S., Numata, H., Nakano, D.,& Hirose, A. (2019). Recent advances in physical reservoir computing: A review. Neural Networks , ,100–123.Thompson, R. F. (2005). In search of memory traces. Annual Review of Psychology , , 1–23.Tokuda, I. T., Han, C. E., Aihara, K., Kawato, M., & Schweighofer, N. (2010). The role of chaotic resonancein cerebellar learning. Neural Networks , , 836–842.25okuda, I. T., Hoang, H., & Kawato, M. (2017). New insights into olivo-cerebellar circuits for learning froma small training sample. Current Opinion in Neurobiology , , 58–67.Tokuda, I. T., Hoang, H., Schweighofer, N., & Kawato, M. (2013). Adaptive coupling of inferior olive neuronsin cerebellar learning. Neural Networks , , 42–50.Tokuda, K., Katori, Y., & Aihara, K. (2019). Chaotic dynamics as a mechanism of rapid transition ofhippocampal local ﬁeld activity between theta and non-theta states. Chaos: An Interdisciplinary Journalof Nonlinear Science , , 113115.Tsuda, I., Fujii, H., Tadokoro, S., Yasuoka, T., & Yamaguchi, Y. (2004). Chaotic itinerancy as a mechanismof irregular changes between synchronization and desynchronization in a neural network. Journal ofIntegrative Neuroscience , , 159–182.Turing, A. M. (1952). The chemical basis of morphogenesis. Philosophical Transactions of the Royal Societyof London. Series B, Biological Sciences , , 37–72.Vervaeke, K., L¨orincz, A., Gleeson, P., Farinella, M., Nusser, Z., & Silver, R. A. (2010). Rapid desynchro-nization of an electrically coupled interneuron network with sparse excitatory synaptic input. Neuron , ,435–451.Watanabe, A. (1958). The interaction of electrical activity among neurons of lobster cardiac ganglion. TheJapanese Journal of Physiology , , 305–318.Wolfram, S. (1994). Cellular automata and complexity . Boca Raton: CRC Press.Wolpert, D. M., Miall, R., & Kawato, M. (1998). Internal models in the cerebellum.

Trends in CognitiveSciences , , 338–347.Yamada, T., & Kuramoto, Y. (1976). A reduced model showing chemical turbulence. Progress of TheoreticalPhysics , , 681–683.Yamazaki, T., & Igarashi, J. (2013). Realtime cerebellum: A large-scale spiking network model of thecerebellum that runs in realtime using a graphics processing unit. Neural Networks , , 103–111.Yamazaki, T., & Nagao, S. (2012). A computational mechanism for uniﬁed gain and timing control in thecerebellum. PLOS ONE , , 1–12, e33319.Yamazaki, T., & Tanaka, S. (2007). The cerebellum as a liquid state machine. Neural Networks , , 290–297.Yildiz, I. B., Jaeger, H., & Kiebel, S. J. (2012). Re-visiting the echo state property. Neural Networks , ,1–9.Yilmaz, ¨O. (2014). Reservoir computing using cellular automata. arXiv ,1410.0162