A Neurorobotic Embodiment for Exploring the Dynamical Interactions of a Spiking Cerebellar Model and a Robot Arm During Vision-based Manipulation Tasks
AA Neurorobotic Embodiment for Exploring the DynamicalInteractions of a Spiking Cerebellar Model and a Robot Arm DuringVision-based Manipulation Tasks
Omar Zahra, David Navarro-Alarcon and Silvia Tolu
Abstract —While the original goal for developing robots isreplacing humans in dangerous and tedious tasks, the final targetshall be completely mimicking the human cognitive and motorbehaviour. Hence, building detailed computational models for thehuman brain is one of the reasonable ways to attain this. Thecerebellum is one of the key players in our neural system toguarantee dexterous manipulation and coordinated movementsas concluded from lesions in that region. Studies suggest thatit acts as a forward model providing anticipatory correctionsfor the sensory signals based on observed discrepancies fromthe reference values. While most studies consider providing theteaching signal as error in joint-space, few studies considerthe error in task-space and even fewer consider the spikingnature of the cerebellum on the cellular-level. In this study,a detailed cellular-level forward cerebellar model is developed,including modeling of Golgi and Basket cells which are usuallyneglected in previous studies. To preserve the biological featuresof the cerebellum in the developed model, a hyperparameteroptimization method tunes the network accordingly. The effi-ciency and biological plausibility of the proposed cerebellar-based controller is then demonstrated under different roboticmanipulation tasks reproducing motor behaviour observed inhuman reaching experiments.
Index Terms —Spiking neural networks, Cerebellum, RoboticManipulation, Sensor-based control, Robotics, Control
I. INTRODUCTIONNeurorobotics is getting more attention nowadays not onlyfor improving the performance of robot controllers but alsoto reveal some of the mysteries about how our brains work[1], [2]. Limitation of state-of-the-art techniques to monitorspiking activity of all neurons in the brain makes it nec-essary to develop accurate computational models to verifytheories about neural brain mechanisms [3]. Also, the mutualbenefit that derives from the joint research in neuroscienceand robotics fields [4] enables the development of adaptivebiologically inspired controllers that can be used as a basis toexplain mechanisms of learning and enhance the performanceof robots. Hence, the motor system lies among the moststudied for the common interest in both fields. Each of thebrain regions contributing to the motor control have distinctivefeatures leading to different roles in the control process.This study is concerned with modeling an essential complexregion in our motor system, the cerebellum [5], [6]. Thecerebellum is well known to help achieve fine motor controland precise timing and coordination of the movement of the
O. Zahra and D. Navarro-Alarcon are with The Hong Kong PolytechnicUniversity, Department of Mechanical Engineering, Kowloon, Hong Kong.Corresponding author e-mail: [email protected] .S. Tolu is with Technical University of Denmark, Department of ElectricalEngineering, Copenhagen, Denmark. joints to achieve dexterous motion [7]. That was proven bystudies of patients with lesions in the cerebellum [8] sufferingfrom clumsy staggering movements similar to a drunkenbehaviour. In robotics field, incorporating a cerebellar modelcontributes to enhancing the accuracy and precision of robotmovements which is critical in many robotic applications likesurgery [9], [10].Various models [11] were built for the cerebellum based upondifferent theories for its role in our movements. One theoryis that cerebellum acts as an inverse model that providescorrections for the motor commands [12]. Another theory isthat it acts instead as a forward model to improve the sensorypredictions [13]. Further theories were introduced as well tocombine the merits of both forward and inverse models [14],[15].The aim of this work is to develop a controller based ona cerebellar-like model, developed on the cellular-level, toguide the motion of robots with real-time sensory information.The cerebellar model developed in this study is more detailedfrom the biological perspective to the previously developedmodel [16], [17] to demonstrate the effect of these additionalfeatures and its effect on the performance. The controller firstbuilds a sensorimotor differential map through motor babblingand the cerebellum acts as a Smith predictor [18] to correctdiscrepancies in sensory readings to enhance accuracy andprecision of robot movements.While developing cellular-level model adds more challengesfor tuning and constructing the network, it provides insightinto the real working of the biological counterparts and allowsto import additional features from the wide repository ofstudies exploiting neural mechanisms to achieve such features.Building a detailed cellular level model requires utilizingspiking neural network (SNN), the third generation of artificialneural networks (ANN), to give a more faithful representationof the neuronal dynamics which provide them with morecomplex and realistic firing patterns based on spikes [19].The SNN adds a temporal dimension compared to the pre-vious generations of ANN, which allows for developing morebiologically realistic learning mechanisms and a more efficientrepresentation of the spaces/dimensions encoded [19].In [17], a controller is developed based on an inverse cellular-level cerebellar model to enhance the robot movement. Thecontroller relies on a trajectory planner and inverse kinematicsmodel to generate reference signals for angular position andspeed. This limits the ability to learn only the manipulation ofthe robot based on the given reference value and is not suitableto learn directly from sensory feedback. In [20], a cerebel-lar model based on the adaptive filter theory is developed a r X i v : . [ c s . R O ] F e b otor cortexAssociationcortexBrain stemSensoryCortexCerebellum Spinal CordBasal Ganglia SkeletalMusclesSensoryreceptors Cerebellum
Brain stem
Spinal Cord
SkeletalMuscles
SensoryCortex
Sensoryreceptors
Motor cortex
Associationcortex
Basal Ganglia P l ann i ng F ee dba c k E x e c u t i o n T r an s f o r m a t i o n (a) R+D++ -- TG+ SDM CB vvvv ~ * ^ v ^ v * u q (b)Figure 1: (a) A simplified schematic of the hierarchical motor systembased on studies from the literature[21], [22]. The Thalamus is not in-cluded for clarity. (b) The block diagram for the proposed cerebellar-based control system for the robot R . Based on the motor command u generated by the differential map DM , the forward cerebellar model CB provides sensory predictions (i.e., the robot state) for the nextcycle. Discrepancy between the actual state observed by the sensors S and the predicted state is used to correct the desired state signalgenerated by the target generator T G before introducing to DM . combining some computational neuroscience techniques alongwith machine learning. Such combination aims to achieve thesensorimotor adaptation, but it does not model the spikingactivity in the cerebellum. In [16], a cellular level cerebellarmodel is developed in which the teaching signal is providedbased on the error in task-space. In all the previous studies, noclear method was identified to tune the network parameters,and the developed networks were used to manipulate only theend-effector in some predefined trajectory or to reach a certaintarget.This study contributes to building a detailed spikingcellular-level cerebellar model including more biologicalfeatures compared to the previous studies. The parameters ofthe network are set using a Bayesian optimization methodto preserve several biological features observed in thecerebellum. This is demonstrated by monitoring the activityand firing rates of the different groups of neurons in thecerebellum, and the output from each layer, which is thencompared to those obtained from biological counterparts. Theteaching signal is provided as the error in task-space anddemonstrates the ability to adapt to executing different tasksand handling manipulation of deformable objects.In the next sections, the preliminaries are introduced (Sec.2), then modeling of the cerebellar controller and the hy-perparameter optimization method is discussed (Sec. 3). Theexperimental setup along with the results are explored (Sec.4). Finally, an analysis of the obtained results is discussed andthe main conclusions are derived (Sec. 5).II. PRELIMINARIESThe motor system is composed of several regions, eachof which is responsible for a different function. These areas follow a hierarchical organization as shown in Fig.1a. In suchhierarchy [21], [22], both parallel and serial connections pro-vide different behaviours in different situations (i.e., dependentupon sensory feedback).The higher order areas are responsible for decision makingand planning the sequence of motion while coordinating theactivity of several limbs. Lower order areas, on the other hand,control muscles while regulating forces and velocities withchanges in posture and various interactions with the environ-ment. The higher level tasks start from the cortical associationareas and prefrontal cortex (along with the Basal Ganglia),which receive sensory information (from the sensory cortex)to generate an abstract plan for motion and the sequence ofexecution. This plan is then transformed to motor commands inthe motor cortex which send these commands to the brain stemand the cerebellum (as an efference copy). The cerebellumplays a role in the coordination of movements and adjustmentof timing to attain fine movements. Signals from the motorcortex travel to trigger motor neurons which innervate theskeletal muscles. The motor neurons in the spinal cord controlthe limbs and the movement of the body, while those in thebrain stem control facial and head movements.In relation with the critical role of the cerebellum in bothplanning and execution of motion, this study focuses onthe cerebellar corrections due to noisy sensory readings toobtain fine movements while executing the motor commandsgenerated by the motor cortex to reach a target point in thespace. Thus, a computational model of the cerebellum can helpimprove the performance of robotic controllers. In this work,a biologically inspired control system is built to guide a robotin a servoing task after performing motor babbling for severaliterations, as shown in Fig.1. Computational spiking modelsare developed to both form a coarse sensorimotor map throughmotor babbling, and to reproduce the cerebellum. The sensoryinput to the formed map is modulated through the cerebellumto enhance the movement and reduce the deviation from thedesired path. The developed cerebellar controller is capable ofguiding the robot based on real-time noisy data.III. METHODS A. The Cerebellar-Based Control Architecture
As discussed earlier, various theories were developed aboutthe formation of the cerebellar forward or inverse models.However, more biological evidences support the theorythat the cerebellum acts as a forward model based on thebehaviour observed in the case of cerebellar damage [23]. Inthis study, the cerebellum acts as a computational forwardmodel to predict the next sensory states based on the desiredspatial velocity and the current robot states. This model fitsin the designed controller to act as a Smith predictor which isknown to be capable of handling control schemes with longdead time as shown in Fig.1. Analogous to control systems,in which dead-time is introduced due to the time neededfor sensing, processing of the inputs, computing the controloutput and actuation, in biological systems the dead-time isintroduced due to the time needed for the sensory signals totravel through the nervous system, generate a motor command gTS JS ( )( ) q q q q * uu * v * x * v uv , , Figure 2: The forward and inverse models correlating task and jointspaces. In this study, DM approximates the inverse model, while CB approximates the forward model. and travel back for execution to the muscles. In our system,the dead time (expressed as a delay ( D )) is caused by slowsensory readings ( S ) and robot ( R ) action. In this context,the motor cortex acts as a differential map ( DM ) that cangenerate commands ( u ) in the joint space based on the desiredspatial velocity v ∗ and current joint angles q , and thus actsas an inverse model as shown in Fig.2.This can be expressed as: u ( t ) = f ( q ( t ) , v ∗ ( t )) (1)where f represents the inverse mapping formed in the motorcortex to correlate the joint space and task space. In case therobot moves from the current end-effector position x to a targetposition x ∗ , v ∗ can be formulated as: v ∗ ( t ) = x ∗ ( t ) − x ( t ) (cid:107) x ∗ ( t ) − x ( t ) (cid:107) (2)However, due to the delay τ , discussed earlier, along withthe imperfection of the map f , the motor command needsa correction before introducing it to the robot. Thus, thecerebellum acts as a forward model to predict spatial velocitiesupon applying u . While the robot motion can be expressed as: v ( t ) = g ( q ( t ) , u ( t − τ )) (3)where the map g represents the actual differential kinematicsof the robot, and v ( t ) is the spatial velocity of the end-effectorupon executing the command u ( t − τ ) while the robot jointconfiguration is q ( t ) . The spatial velocity predictions generatedby the cerebellum ˜ v ( t ) can then be expressed as: ˜ v ( t ) = ˜ g ( q ( t ) , u ( t − τ ) , v ( t ) , v ∗ ( t )) (4)where ˜ g represents the forward model built by the cerebellumto approximate the map g . The model approximations areimproved during training relying on the feedback error e ,where e ( t ) = v ( t ) − ˜ v ( t ) . Hence, the error in sensory signalsand discrepancy from the predicted value is used to modulatethe desired spatial velocity before introducing to the DM ,as detailed later in subsection III-C. So, DM makes useof the predictions provided by the cerebellum to correct theanticipated error in sensory readings due to the dead-timeeffect: u ( t ) = f ( q ( t ) , v ∗ ( t ) , ˜ v ( t ) , ˜ v ( t + τ )) (5) Taking into consideration both the error from previous trialsand the sensory prediction, the controller is enabled to correctthe next motor command in an indirect way. This is carried outby adding the error from previous attempts to the predictionsof the robot state to make up for the expected error: ˇ v ( t ) = ˜ v ( t + τ ) + e ( t ) (6)Finally, the corrected prediction ˇ v ( t ) is compared to thedesired velocity v ∗ : ˆ v ( t ) = 2 v ∗ ( t ) − ˇ v ( t ) (7)To put it in other words, the ˆ v is the sensory signal that canbe introduced to the DM to give better estimations and makeup for the delay in the feedback cycle. u ( t ) = f ( q ( t ) , ˆ v ( t )) (8) B. The Differential Mapping SNN
A two layer SNN, one input and one output layer, areconnected via all-to-all plastic synapses (weights can change)to provide the transformation between two correlated spaces.The network encodes the current joint angles and the spatialvelocity of the end-effector in the input layer and encodes theangular velocity of the joints in the output layer. As the net-work correlates mainly the velocities in two spaces, it acts as adifferential map; it is named as Differential Mapping SpikingNeural Network (DMSNN) [24]. For a robotic manipulator of n degrees of freedom (DOF), the task space is represented by n assemblies of neurons, where each dimension is encoded bya corresponding assembly.Similarly, for m DOF of joint space m assemblies areneeded. Hence, the input layer is made up of n + m assemblies.The assemblies l v n encode the n -dimensional spatial velocity,while the assemblies l q m encode the m -dimensional joints’angular positions. At the output, the assemblies l ˙ q m encodethe m -dimesnional joints’ angular velocities as depicted inFig. 3. The two layers of the network are connected via all-to-all plastic synapses (both excitatory and inhibitory). Theinhibitory synapses regulate the motor(output) neurons activityto maintain stable learning and hence avoid the unboundedincrease in the weights of the excitatory synapses. At theoutput layer, local inter-inhibitory connections (i.e., withineach assembly) are added with low inhibition to proximalneurons and higher inhibition to the distal neurons.Modulation of the plastic synapses occurs during trainingthe network as shown in Fig.3, where random target angles q d are generated for motor babbling. The robot joints movetowards q d linearly in joint-space based on the error calculatedfrom the difference between the desired joint angles andcurrent one q . The internal (i.e., proprioception ) and externalsensors (i.e., exterioception ) provide the necessary data to boththe sensory and motor layers while training DM . Thesevariables are encoded based on the preferred/central value ψ c defined for each neuron, and a distribution that allows all theneurons in the assembly to contribute to what is known as MSNNRobot
ProprioceptionMotor ActionVisualTracking
DMSNN
Target Generator
Robot
ProprioceptionMotor Action
MotorBabbling E n c o d e r s E n c o d e r E n c o d e r s D e c o d e r TrainingPhase ControlPhase v * q u u v qq ll q v l q ll q v l q b Figure 3: Input (sensory) neurons are connected to output (motor)neurons through excitatory and inhibitory plastic synapses. Thesignals introduced to each neuron assembly are depicted. The motionis guided during the training of the network through motor babblingactions in joint space. After the training phase, the robot is controlledby decoding motor commands from the activity in the output layer.The hollow arrow heads refer to plastic synapses. population coding [25]). The neurons’ firing rates are definedby the Gaussian tuning curve which can be expressed as: Θ i ( t ) = exp (cid:18) −(cid:107) ψ − ψ c (cid:107) σ (cid:19) (9)where ψ is the variable’s value, and σ is the radius calculatedbased on the number of neurons per assembly N l , and thedefined range of values of each variable. The synaptic weightsare modulated accordingly forming a proper DM . Aftertraining ends, the control phase starts where DM is ready toto execute a coarse robotic servoing task. The correspondingvalues of the variable are encoded and introduced to assem-blies l q m and l v n , and the output is then decoded fromthe activity of l ˙ q m . The decoding scheme in this case is the central neuron [25]: ψ est = Σ ψ i . Θ i ΣΘ i (10)where ψ i is the defined central value of neuron i in l Ψ assembly l Ψ , and ψ est is the decoded/estimated output value. However, DM is formed as a coarse map which lacks inboth the precision and accuracy needed for fine control of therobot. C. The Proposed Forward Cerebellar-like Model
The cerebellum is composed of three layers. The
Granulelayer , which is the innermost layer, is made up mainly ofgranule cells, which counts up to 80% of the neurons inthe brain [26]. It contains as well the
Mossy fiber axonsand the Golgi cells. The
Purkinje layer is the middle layerand contains the Purkinje cells which are featured by thedistinctive firing pattern and considered a key component forlearning to occur in the cerebellum. The
Molecular layer q v * v ~ BC (a) Non-plastic Excitatory SynPlastic Excitatory SynNon-plastic Inhibitory SynPlastic Inhibitory SynPopulation Coding usingGaussian Receptive Field BC q v * v ~ (b)Figure 4: (a) A cellular and (b) functional schematic diagrams of thecerebellum. is the outermost layer containing the axons extended fromthe granule cells to purkinje cells (known as parallel fibers)intersecting with axons extended from the inferior olive (known asclimbing fibers). The molecular layer also contains the basketand stellate cells.The cerebellar computational model developed, as shown inFig. 7, acts to rectify the sensory readings before introducing tothe differential map ( DM ) which replicates the function of themotor cortex. This enhances the motor commands generatedby DM by accounting for sensory discrepancies in the previ-ous cycles as explained in the two previous subsections. Thecerebellar forward model is developed based on the cerebellarmicrocircuit.The Mossy Fibers ( M F ) encode the current angular positionof the joints and the desired Cartesian velocity. These twovariables are chosen to provide the essential information todefine the state of the robot in both joint space and task space.Thus,
M F consists of n MF assemblies of neurons, and eachassembly encodes a dimension of one of the variables. Theangular position is encoded by n JS assemblies forming the M F JS group, where n JS is the number of degrees of freedom(DOF) studied. The desired Cartesian velocity is encoded by n T S assemblies forming the
M F
T S group, where n T S is thenumber of Cartesian DOF studied.The synapses connecting between
M F , Granular cells ( GC )and Golgi cells ( GgC ) shall lead to a sparse coding of robotstates. The Inferior olive ( IO ) provides the teaching signal,which in this study is the task space error, through the climbingfibers ( CF ) to Purkinje Cells ( P C ). The error ( e ) is definedas the discrepancy between the actual and the predicted spatialelocities. The plastic synapses connecting GC to P C , knownas parallel fibers
P F , are modulated initially under the effectof the teaching signals from CF . These signals evoke activityin P C to provide the desired correction, and thus allows toencode such corrections at the corresponding state of the robot.The
M F connects to Deep Cerebellar Nuclei (
DCN )through excitatory synapses to maintain a basal spiking ac-tivity. The
P C connects to
DCN through inhibitory synapsesto allow for the right value to be decoded from the activity ofthese neurons, while IO connects to DCN through excitatoryplastic connections studied to provide fast convergence oflearning [27].The
P C ,and similarly IO and DCN , consists of n T S neurons’ groups with each group consisting of two assembliesfor positive and negative change for each DOF.After the DM training goes for several iteration ,i.e., tillthe coarse control map is formed, the training then starts forthe cerebellum to build the corrective mapping.In [28], a study was conducted on infants between 6 to12 months to monitor the neural activity in the motor cortex(M1) while reaching targets. It was observed that the activityshifts from a diffused state across M1 to a focused one asthe age of infants increases. Additionally, insufficient data isreported about the development of the cerebellum in infantsat that age[29]. Thus, it is safe to assume that in infantsthe development of the cerebellum starts later than the motorcortex and its contribution increases with the increase in itssize which is reflected by performing fine movements andexhibiting some motor skills [30].Similar to encoding in DM , the input values to M F isfirst encoded employing population coding, with the currentintroduced to the i th neuronal unit in the j th M F assembly( Θ MF i,j ( t ) ) can be calculated using the following equation: Θ MFi,j ( t ) = exp (cid:32) −(cid:107) θ MFj − θ MFi,j (cid:107) σ MF j (cid:33) (11)where θ MFj is the input to the j th M F assembly, and σ MF j isthe radius for the Gaussian distribution, and the variable rangesfrom θ MF jmin to θ MF jmax . θ MF i,j is the preferred/centralvalue of i th neuron in j th M F assembly. Both θ MF i,j and σ MF j are adjusted using the self-organization algorithm(SOA), where the data previously collected for babbling Ξ isused to adjust the values of θ MF i,j for all neurons. SOA:
Abest matching unit β is chosen for each central value fromthe linearly initialized set θ MF j (i.e., equally spaced from θ MF jmin to θ MF jmax ). β is picked based on the Euclideandistance from a random data sample ξ (from the set Ξ ): β = arg min κ ( θ MF κ,j − ξ ) (12)The value of θ MF β,j and that of the neighbouring units at aninstant k are updated such that: θ MF i,j ( k + 1) = θ MF i,j ( k )+ ρ ( k ) ν iβ ( k )( ξ − θ MF i,j ( k )) (13)where ρ is the learning rate and ν is the neighbour-ing/proximity function given by: ν iβ ( k ) = exp (cid:18) −(cid:107) i − β (cid:107) ϑ ( k ) (cid:19) (14) and ϑ is the radius gauging the proximity of the neighborhood.Both ρ and ϑ decay exponentially over the whole period K : ρ ( k ) = ρ ◦ exp( − kK ) , ϑ ( k ) = ϑ ◦ exp( − kK ) (15)where ρ ◦ and ϑ ◦ are the initial values for ρ and ϑ , respectively.After running the SOA for K iterations, the radius σ MF i,j isset for every neuron in M F separately such that: σ MF i,j = σ MF i,j − σ MF i +1 ,j (16)with the last radius (i.e., σ MF Nl,j ) set to have the same radiusas the previous one (i.e., σ MF Nl − ,j ).The application of the SOA to M F allows for a moredistinctive input to the GC and better encoding of the robotstates. Both GC and M F are connected through excitatorysynapses to the Golgi Cells
GgC , which is connected to the GC through plastic inhibitory connections. The recurrencethrough the reciprocal connections between GC and GgC along with the connections to the
P C allows for spare encod-ing of the robot state space and can interpreted as a LiquidState Machine (LSM) as argued in [31]. It is also claimedthat the inhibitory action of
GgC allows to have a minimumnumber of neurons in GC active at the same time and thushigher sparsity and better encoding of the states [8].Each neuron from GC is connected to a randomly pickedneuron from each assembly M F ,and hence each neuron from GC shall be connected to n MF neurons. Furthermore, M F connects via excitatory synapses (all-to-all connections) to
DCN . GC connects to P C via plastic excitatory projections(which are known as Parallel Fibers
P F ). The parameters ofthe network shall be tuned such that
P F are modulated onlywhen IO neurons are active. The projections from IO to P C ,known as Climbing Fibers CF , are one-to-one synapses toensure that neurons belonging to the same group (i.e., the sameDOF) and direction (i.e., positive/negative changes) connectto each other, and hence ensuring that the right members of CF are modulated. IO connects through excitatory synapticconnections to DCN . The activity of neurons in IO is givenby: Θ IO + j = (cid:26) Θ IO max e pred > Υ0 e pred ≤ Υ (17) Θ IO − j = (cid:26) e pred ≥ − ΥΘ IO max e pred < − Υ (18)where Θ IO + j and Θ IO − j describe the mean firing rates ofthe two opposite directions of the j th DOF encoded by the IO assemblies, while Θ IO max describes the maximum firingrate of neurons in IO . A threshold value Υ is defined to avoidan overlapping oscillatory activity around the reference value.Similarly, the connections between PC and DCN follow thesame concept to allow for activation of the right group ofneurons. The cerebellar output is decoded from the activity of DCN neurons: ˜ v j = ΣΘ DCN + i,j − ΣΘ DCN − i,j Θ DCN max ∗ n DCN ∗ v max j (19)where ˜ v j gives the anticipated/predicted velocity for the j th DOF, Θ DCN + i ,j and Θ DCN − i ,j are the firing rates of the i th euron in the two opposing directions assemblies of the j th DOF, and n DCN is the number of neurons in each
DCN assembly. Θ DCN max is the maximal firing rate observed in
DCN . v max j defines the maximum speed for the j th DOF.
D. Optimization of The Network Parameters
To have the cerebellar microcircuits employ more featuresof the biological counterparts, an optimization process isapplied to finely tune the parameters. The optimization allowsto set the values of the hyperparameters H of the networkto meet specific goals for that mean. Such goals are set totune the layer by layer of the cerebellar model through thedefined objective function f i ( H i ) to be minimized, where H i ⊂ H . Each objective function consists of a set of objectives O i weighed by a vector w i O , such that f i ( H i ) = O i w i O .Tuning the model layer by layer allows to properly define andmonitor the expected output from each layer. Moreover, thisallows to simplify the optimization and avoid the complexitiesaccompanying choosing a large number of hyperparameters tobe optimized simultaneously. Objective 1: (Uniform firing in
M F ) The input signalscoming from the DM is first introduced to the M F . Thus,the first step is to make sure that the firing pattern in
M F is suitable to be introduced to the next layer. The parametersaffecting the firing are the neuron parameters and the ampli-tude of the input current to the neurons. The criteria selectedare the maximum firing rate of the neurons and the patternof firing. For the Gaussian distribution of the input current, itis expected to have a corresponding Gaussian distribution, aswell, for the firing rates in the
M F layer. Thus, a Gaussiandistribution fitting, via maximum likelihood estimation [32], isapplied to the number of spikes released from the neurons tocompare the mean value of the curve G mean to the expectedoutput (i.e., the neuron M F nearest whose central value isthe closest to the input value). Hence, the two objectives aredefined O = [ O , O ] and can be formulated as: O = (cid:80) N test n =1 | f r MFmax ( n ) − f r MFdesired | N test O = (cid:80) N test n =1 | G mean ( n ) − M F nearest ( n ) | N test (20)With w O = [0 . , . , the first objective function is definedas f ( H ) = O w O . Objective 2: (Sparsity in GC ) The output from
M F isthen introduced to the GC . As mentioned in the previoussubsection, GC and GgC act together to give a distinctiveoutput for each input from the
M F , and thus allowing forsparse coding of the input. Consequently, f is designed toensure that a minimum number of GC neurons is active andchecks as well for the uniqueness of the activity pattern forevery input. Additionally, the firing rate is defined to resemblethe neuron activity in the biological counterpart. With thesefour objectives to be satisfied, O = [ O , O , O , O ] , the firing rates is defined in a way similar to that in M F , where: O = (cid:80) N test n =1 | f r GCmax ( n ) − f r GCdesired | N test O = (cid:80) N test n =1 | f r GgCmax ( n ) − f r GgCdesired | N test (21)The objective O targets minimizing the number of activeneurons as much as possible, but also ensures that there isstill active neurons in GC (i.e., at least one neuron is active).During each iteration, the number of spikes triggered by eachneuron is recorded in the set S GCn after the n th test trial, to beused to define the firing rates and the active neurons as well.Neurons with the a firing rate greater than third of the desiredfiring rate (i.e., firing rate greater than . f r GCdesired ) are keptand the rest are discarded, then S GCn is updated accordingly.This allows to consider only the neurons that would affect thelearning in
P C . The difference between the optimal numberof firing neurons (chosen as 1 in this case) and the number ofactive neurons λ GCn is recorded for each trial n in a set Λ GCn .To penalize the state in which all neurons in GC are inactive,a large number/score φ is returned, which is formulated asfollows: Λ GCn = (cid:40) φ λ GCn = 0 | λ GCn − | λ GCn > (22) O = (cid:80) N test n =1 | Λ GCn − | N test (23)The objective O describes the uniqueness of the outputobtained across the N test trials, where the repetition of spikingof a neuron across many trials is penalized. To simplify thecomputations, from each trial n the neuron with maximumfiring rate is recorded, and then, the number of repetitions ofeach of the neurons in the set is computed. The mean µ mean and maximum µ max number of repetitions is finally computedto formulate O as: O = 0 . µ max + 0 . µ mean (24)With w O = [0 . , . , . , . , the second objective functionis defined as f ( H ) = O w O . Objective 3: (Proper firing rates in
P C ) The neuronsof
P C are known to have distinguishable firing patterns inreponse to different inputs. The input from GC tends to triggersimple spikes ( SS ), while the input from IO / CF triggerscomplex spikes ( CS ) in P C . These two spiking patternsdiffer from each other in many aspects, but in this studyonly their respective firing rates are considered. Moreover,the assemblies of neurons within each group tend to have analternating activity for different directions of motion. Thus,three objectives are defined, O = [ O , O , O ] . The first twoobjectives can be formulated as: O = (cid:80) N test n =1 | f r P CSS ( n ) − f r SS | N test O = (cid:80) N test n =1 | f r P CCS ( n ) − f r CS | N test (25)o construct the formula to describe the third objective O ,firstly the activity of the P C is compared to a thresholdvalue (chosen as the maximum firing frequency of the simplespikes) with those above and below the threshold assignedas active/true and inactive/false, respectively. In this case, themost desirable state is to only have one group of neurons activeper DOF, which is represented by an XNOR logic gate: A ( P C ± j ) = (cid:40) f r P C ± j > f r P CSS f r P C ± j ≤ f r P CSS (26)where f r
P C ± j is the mean firing rate of neuron in either ofthe two assemblies of neurons for each DOF j . O can thenbe expressed as: O = 1 n T S n TS (cid:88) j =1 A ( P C + j ) (cid:12) A ( P C − j ) (27)With w O = [0 . , . , . , the third objective function isdefined as f ( H ) = O w O . Objective 4: (Proper output from DC ) After optimizationis carried for the previous hyperparameters, this last objectivefunction is directed to test the operation of the network whileoptimizing the parameters affecting the activity in the
DCN .The robot, in a simulation environment, repeats a chosenmotion towards a target four times, and both the error whilemoving e pred and the execution time ∆ are recorded. The errorin this case is formulated as: e pred = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) arccos (cid:32) (cid:126) ˜ v · (cid:126)v (cid:107) (cid:126) ˜ v (cid:107)(cid:107) (cid:126)v (cid:107) (cid:33)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (28)Consequently, O consists of six weighted objectives, where O = [ O , O , O , O , O , O ] . The objective O acts tokeep the mean value of the error e pred across the four trialsas minimum as possible, and an inverted firing pattern inthe opposing groups of neurons within the DCN and whencompared to
P C as well. The objectives O and O areset to promote the decrease in the the error e pred and theexecution time ∆ , respectively, as the training proceeds. Toachieve this, each of these objectives is assigned a value of 1at the beginning of the training, and the variables (mean valueof e pred and ∆ ) are compared to those from the previoustrial, to deduct . in case of a decrease in the value ofthe variable. Hence, in case of a consistent decrease in theerror and execution time from one trial to another, reflectinga successful learning process, the values of these objectiveswould return a value zero, which is the absolute minimum inthis case. The objective O adjusts the firing rate of DCN neurons in the desired range such that O = (cid:80) N test n =1 | f r DCNmax ( n ) − f r DCNdesired | N test (29)Similar to P C , in
DCN the opposing groups of neurons shallbe set to fire in an alternating manner, which is formulated as: A ( DCN ± j ) = (cid:40) f r DCN ± j > f r DCN f r DCN ± j ≤ f r DCN (30) where f r
DCN is the mean firing rate of
DCN , and f r
DCN ± j is the mean firing rate of neuron in either of the two assembliesof neurons for each DOF j . O can then be expressed as: O = 1 n T S n TS (cid:88) j =1 A ( DCN + j ) (cid:12) A ( DCN − j ) (31)Additionally, the objective O ensures that the activity in theassemblies of DCN opposes that of the corresponding onesfor same DOF in
P C (i.e., fire in an inverted manner): O = 1 n T S n TS (cid:88) j =1 A ( DCN ± j ) (cid:12) A ( P C ± j ) (32)With w O = [0 . , . , . , . , . , . , the fourth objectivefunction is defined as f ( H ) = O w O . E. Bayesian Optimization for The Objectives
To meet these objectives, Bayesian Optimization (BO) isutilized to optimize the defined objective functions [33].These functions would be very costly and time consumingto optimize through manual or random searching methodsdue to the high dimensionality and stochasticity of the searchspace. BO develops a probabilistic model for the objectivefunction to facilitate the evaluation of the objective functionwhile making use of the history of previous trials to guidethe optimization process, as explained later in this subsection.Thus, the optimal solution is sought to minimize each objectivefunction to obtain the optimal hyperparameters h ∗ i , such that: h ∗ i = arg min h i ∈H i f i ( h i ) (33)The main constituents through which a BO method is iden-tified are the regression model and the acquisition function.The probabilistic regression model surrogates the objectivefunction (and referred to usually as the surrogate model). Thisprobabilistic model is initiated with some random evaluationsto guide the algorithm, starting from complete uncertainty(prior), and develops as more evaluations are stored in thehistory to give better future evaluations and decrease theuncertainty (posterior). The surrogate model is expressedas S i = P ( L i |H i ) representing the mapping of the H i hyperparameters to a probability of a loss/score L i for anobjective function f i . The acquisition function (also known asselection function) allows for selecting appropriate candidatesto improve the surrogate model while exploring for the opti-mum values for the hyperparameters. Hence, a proper choiceof the acquisition functions guarantees a balance betweenexploration and exploitation without getting trapped in a localminima. In this study, an Adaptive Tree Parzen Estimator (ATPE) is chosen as a regression model [34], and the
ExpectedImprovement (EI) as the acquisition function.The standard TPE is known to fit for optimization problemswhere either a mixture of discrete and continuous hyperpa-rameter spaces are to be studied or when the hyperparametersare contingent upon each other, and it is chosen for the latterreason. In TPE, rather than describing the posterior P ( L i |H i ) ,it describes instead P ( H i |L i ) , relying on Bayes rule such that: P ( L i |H i ) = P ( H i |L i ) P ( L i ) P ( H i ) (34)he TPE targets building two separate hierarchical processes, P ( H i |L i ∈ U ) and P ( H i |L i ∈ D ) , where the sets U and D contain the highest and lowest values of L i , respectively,observed so far reference to a defined threshold value L ∗ i . Thisthreshold value is decided based on a predefined percentage γ such that P ( L i < L ∗ i ) = γ . The likelihoods U and D are modelled via kernel density estimators (which in this caseis the Parzen estimator ). The Parzen estimator PE allows torepresent a function through a mixture of kernels K , whichare continuous distributions, to be expressed as: P ( H ) = 1 N p η N p (cid:88) j =1 K H − H j η (35)where N p is the number of kernels used for the approximation, η is the bandwidth of each kernel, and K is chosen to be anormal distribution. Modelling U and D gives a way to choosehyperparameters for the next observations that are more likelyto return lower values for the objective functions (in the caseof minimization of objective functions).Although TPE has less time complexity compared toother BO methods (as Gaussian Process BO), TPE doesnot model interaction/correlations among the hyperparameters.The ATPE addresses this drawback by juding from Spearmancorrelation [35] between the hyperparameters which parame-ters to vary and which parameters to lock to achieve a moreefficient exploration. Also, among the drawbacks of TPE isthat it has a fixed value for γ and a fixed number of candidatesintroduced to the acquisition function to predict the nextcandidate optimal solution, which were introduced initiallywhile solving some specific problems [33]. ATPE introducedempirically concluded formulas based on the cardinality of thesearch spaces for the hyperparameters to give better values forthese two variables.In this study, the Expected Improvement
EI is chosen asthe acquisition function, to maximize the ratio P ( H i |L i ∈D ) /P ( H i |L i ∈ U ) . The EI generates a probability of obtain-ing a better solution than the current optimum solution and theamount of expected improvement as well, and consequently,favors bigger improvements. The basic formula for the EI is[33]: EI L ∗ i ( H i ) = (cid:90) L ∗ i −∞ ( L ∗ i − L i ) P ( L i |H i ) d L i (36)By applying Bayes rule (equation 34) and substituting inequation 36, the EI can be written as [33]: EI L ∗ i ( H i ) = γ L ∗ i D ( H i ) − D ( H i ) (cid:82) L ∗ i −∞ P ( L i ) d L i γ D ( H i ) + (1 − γ ) U ( H i ) (37) EI L ∗ i ( H i ) ∝ ( γ + U ( H i ) D ( H i ) (1 − γ )) − (38)From the formula 38, it can be concluded that EI acts to max-imize the ratio D ( H i ) / U ( H i ) and thus leading to introducingbetter candidates for the next search while still maintaining atrade-off between exploration and exploitation. F. Neuron Model
To model the spiking activity of the neurons in the cere-bellum in real-time, the simple neuron model developed byIzhikevich is chosen for its ability to reproduce different firingpatterns [36]. The model provides a decent biological plausi-bility at a relatively low computational cost. The followingdifferential equations describe the model: ˙ V = f ( V , U ) = 0 . V + 5 V + 140 − U + I (39) ˙ U = g ( V , U ) = a ( b V − U ) (40)Additionally, the membrane potential is reset after triggeringa spike such that:if v ≥ mV , then v ← c, u ← ( u + d ) (41)where V (in mV ) is the membrane potential and U is thevariable that acts to lower the neuron’s membrane potential(which is also known as the recovery variable). The parameter a (in ms − ) decides the time scale of U (i.e., the decay rate),and b (dimensionless) describes the sensitivity of the neuronbefore trigerring a spike (i.e., sensitivity of U to the sub-threshold V ). The parameter c (in mV ) describes the resetvalue of V after a spike is triggered, and d (in mV ) gives thereset value of u after triggering spike. The external currentsintroduced to the neurons are described by I . G. Synaptic Connections
To model the plastic connections in both DM and CB ,the Spike-timing-dependent plasticity (STDP) learning rule ischosen. In STDP, the change in the synaptic weight dependson the spikes relative timing in both the pre and post synapticneurons [37]. For the CB , the antisymmetric STDP [38]modulates the weight of the plastic synapses, and is formulatedas: ∆ ε ij = − S a exp ( − ∆ t/τ a ) ∆ t ≤ S b exp ( − ∆ t/τ b ) ∆ t > (42)where the two coefficients S a and S b decide the amount ofthe decrease and increase of the synaptic weight, respectively. τ a and τ b determine the time windows through which synapticweights decrease and increase, respectively.The symmetric STDP is applied for the plastic synapses in DM [39], and can be formulated as: ∆ ε ij = S (cid:16) − (∆ t/τ ) (cid:17) exp ( | ∆ t | /τ ) (43)where S determines the amount of change in synaptic weights,while the ratio between the τ and τ adjusts the time windowfor increasing and decreasing the synaptic weights. ∆ t is thedifference between the timing of spikes at post-synaptic andpre-synaptic neurons, respectively, such that ∆ t = t post − t pre . IV. RESULTS A. Setup
A UR3 univeral robot is used to test the developed cerebellarcontroller in two experiments. The shoulder and elbow joints qqq W (a) (b)Figure 5: The robot setups to manipulate (a) the end-effector, and (b)the deformable object. are controlled in an experiment to test the manipulation ofthe end-effector to a desired position as shown in Fig. 5a,while the elbow and wrist joints are controlled to test themanipulation of a deformable object as shown in Fig 5b. Theshoulder and elbow joints for the first experiment are definedto have ranges of q S = [ − ◦ , − ◦ ] and q E = [ − ◦ , ◦ ] ,respectively. While the elbow and wrist joints for the firstexperiment are defined to have ranges of q E = [ − ◦ , − ◦ ] and q W = [ − ◦ , − ◦ ] , respectively. A camera is fixedon top of the robot to track the end-effector position and thedeformable object through color filtration. The proprioceptivereadings (i.e, joints’ positions and velocities) of the robot jointsare obtained from motor encoders. The information requiredfor motor babbling is recorded by giving commands to moveto random joint angles linearly in joint-space within he definedranges, through a script-based programming language devel-oped for universal robots.For the end-effector manipulation task, the robot is instructedto move to only 100 points in the joint space and the collecteddata during motion is then used to train DM , while for thedeformable object case the robot is instructed to move to 300points as small increments in joint angles may lead to a bigchange in the centroid of the deformable object and hencemore rich data is needed. The spiking neural networks aredeveloped using NeMo package [40] which allows simulatingthe network using GPU. In this study, a computer with i7-6700K CPU and a GeForce GTX 1080Ti is used to build thenetworks and control the robot.V. O PTIMIZATION R ESULTS
After running the optimization described in the two previoussubsections, the chosen parameters are obtained as shown intable I. The optimization is run on a robot in the simulationenvironment to avoid wear of the mechanical parts and forsafety, but the final experiments are conducted on a real robot.Neurons in
M F achieve a maximum firing rate of 62 Hz whichis comparable to 60 ±
35 Hz spontaneous firing rate observedat excitatory synapses connecting
M F to GC , and an averagefiring rate (for active neurons only) of 40 Hz compared to 20 ±
21 [41]. The activity in each assembly is shown in Fig.6a.For the neurons in GC the sparsity is achieved by having6 ± ±
65 Hz reported in [41] for maximal firing rate in GC while at locomotion state. The sparse activity is achieved bythe aid of firing in GgC and the plasticity in synapses between GC and GgC , to suppress the undesirable activity and thuslimiting number of active neurons.The parameters of
P C neurons are close to that of bistableneurons demonstrated by Izhikevich [42], where analysis ofthe
P C properties in [43] provide indications of bistablebehaviour. Hence, the neurons in
P C fire simple spikes witha firing rate of 70Hz (i.e., when no input is introduced from IO through the CF ) which is comparable to an averageof firing rate of 50 Hz [44]. In case of complex spikes afiring rate of 160Hz is achieved , which is mentioned toachieve in biological counterpart to firing rates up to 400Hz [44]. While the modeled P C fire slower than the realcells, however the big difference in the firing rates of simpleand complex spikes facilitates the choice of adequate learningparameters for proper modulation of the
P F synapses. It shallbe noted from Fig. 7d that higher firing rates are achievedin the beginning, corresponding to bigger errors (and thushigher activity in IO ), then the rate of activity decreases.After several trials, the strength of P F synapses increases,and hence the firing rate increases. The spikes generated by IO is characterized by a low rate ( < CF synapses connects betweeneach neuron of P C to a corresponding neuron of IO . This N e u r on
00 0.5 110 0203040 0.5 1 0.5 1 time(s) q q v * v * (a) N e u r on time(s) N e u r on time(s) Without GgCWith GgC (b)Figure 6: Firing in (a) MF and (b) GC with and without inhibitionfrom GgC . rial 1 Trial 2 Trial 3 Trial 4 e p r e d (a) I O f i r i ng r a t e ( H z ) D C N f i r i ng r a t e ( H z ) (b) -100 -80 -60 -40 -20 membrane potential (mV) -80-70-60-50-40-30-20 r e c o v e r y v a r i ab l e (c) P C f i r i ng r a t e ( H z ) (d)Figure 7: (a) The error in cerebellar predictions e pred , as defined inequation 28, as the learning proceeds for 4 trials, and (b) the averagefiring rates of IO and DCN . (c)The phase portrait and (d) firingrates of
P C . allows this low frequency stimulus to trigger high frequencyspikes in P F . The maximum firing rate in
DCN is 40 Hz,which is comparable to the instantaneous firing rate 30 ∼ A. Radial Reaching
In [47], patients with cerebellar damage attempt to reachradially towards targets at the same distancestarting from the same central point with the aim of testingthe smoothness of movements of these patients and jointmotion coordination compared to healthy persons. Similarly,the robot in this study is commanded to reach radially towardstarget points equally distributed in an eight-angled star shapewith a 45 ◦ internal angle and distant from the center 7cm each. It can be concluded from the robot movementsin the 8 directions while relying only on DM in the leftpanel in Fig. 8b that always the biggest error is presentedin the first generated motor commands while the robot movesfrom rest. Thus, a small distance is defined for reaching todemonstrate the subsequent effects of the from-rest estimationerror. The cerebellar network training is done by repeatingthe reaching motion to each of the 8 targets only six timeswith an improvement in the reaching motion as shown in theright panel in Fig. 8b compared to what obtained after only Table I: CB Network Parameters
Values of neuronal parametersArea a b c d N MF GC GgC
P C BC IO DCN ω init ω max MF → GC Rnd 4 3.6 - MF → GgC
Rnd 1 0.6 - MF → DCN
A2A 1.0 9.0 - GC → GgC
Prob 0.01 0.3 -
GgC → GC Prb 0.5 -9.8 -15.0 GC → P C
Prb 0.8 0.01 24.0 IO → P C
O2O - 43.0 - IO → DCN
A2A 1.0 0.47 -
P C → DCN
O2O - -13.0 -
P C → BC Prb 0.4 2.4 - GC → BC Prb 0.3 3.2 - BC → P C
Prb 0.5 -45.4 -The ’T’ column gives the connections’ type from A to B in ’Projection’with a value ’V’ for the parameter. ’Rnd’ implies that ’V’ neurons randomlypicked from A connect to one neuron in B. ’Prb’ implies that for each neuronin A there is a probability V to connect to each neuron in B. ’A2A’ denotesAll-to-All connections, where all neurons from A connect to all neurons inB. ’O2O’ denotes One-to-One connections, where a neuron in A connects toa corresponding neuron in B. N is the total number of neurons of a specifictype. ω init and ω max are the initial and maximum value of synaptic weights,respectively. x x x x (a) y ( c m ) x(cm) -4 0 4 -4 0 4-4 0 4-4 004 with CBwithout CBdesired (b)Figure 8: (a)A representative motion of target reaching. (b)The robotreaches the eight radial targets starting from the center of the drawncircle with targets apart 45 ◦ from each other. The performance isshown for reaching to targets at the beginning, in the middle and atthe end of learning for 6 trials from left to right. The black and redtrails are for reaching without and with cerebellum, respectively. Theblue color refers to the desired path. v q * v ~ v (a) -10 0 10-10010 x(cm) y ( c m ) (b)Figure 9: (a)The network proposed in [16] and (b) the results of radialreaching based after training the network for 8 repetitions. B. Deformable Object Manipulation
To demonstrate the ability of the developed cerebellarcontroller to facilitate learning different skills, an experimentis set for the robot to manipulate a deformable object. Thecontour of the deformable object is observed based on thecolor, and the moments are calculated such that: M ij = (cid:88) x (cid:88) x x i ) x j ) ϕ ( x , x ) (44)where ϕ gives the intensity of pixels. The centroid C x =( C x , C y ) is then calculated in pixels based on the moment suchthat C x = M / M and C y = M / M as explained in[48].The centroid of the object is then converted to the worldcoordinates using the intrinsic parameters of the camera [49]: x = ( C x − P x ) x F , x = ( C y − P y ) x F (45)where C x and C y denote to the components of the centroidposition in pixels, x is the object’s depth away from thecamera, P x and P y denote the principal point and F is thefocal length.Similar to the motor babbling in case of end-effectortracking, in this case motor babbling is carried out and thecorresponding value of centroid is recorded for the jointvalues to be used for training DM . It shall be noted thatusing the same parameters as those used in the previousexperiment fails to develop a proper map to guide the robot.Thus, DM alone fails to drive the robot to manipulate theobject properly. However, the cerebellar model and the controlarchitecture allow to drive the robot and develop the plasticsynapses properly to guide the robot motion to deform theobject properly as shown in Fig. 10. The robot is given 10random targets in the studied work-space which are at least x x x time(s) e c ( c m ) Figure 10: The upper panel shows the shifting of the deformableobject centroid (red dot) towards the target point (green cross). Thelower panel demonstrates the almost linear decrease in the norm ofthe distance between the centroid and the target. DM with an average final error of around 7mm, while the 10 targetsare reached successfully with an average final error of less than4mm is achieved. While this study considers using the centroidas one feature to manipulate the deformable object, it may notbe the optimal choice to achieve such task. Other featurescan be explored for future studies to give better candidatesto reduce the high dimensional data needed to characterize adeformable object [50].VI. D ISCUSSION AND C ONCLUSIONS
In this study, a biomimetic control system is developedbased on the detailed microcircuit of the cerebellum. A spikingcellular-level forward cerebellar model is integrated with adifferential map to helps improve the motion in terms ofspeed and deviation from the desired path. The learning issupervised by a teaching signal based on task-based sensoryfeedback, in contrast to most studies in the literature relyingon joint-based errors as mentioned earlier. The forward model,acting as a Smith Predictor, then compares the expected outputwith the actual one to build an anticipation of error for thenext cycle and provide corrections to the sensory feedbackto the motor-cortex-like differential mapping network, wherethe spatial motion plan is converted into motor commands.Both the angular and spatial accelerations are not includedin this study as the robot moves at moderate speed with alight structure, hence the motion dynamics is not an effectivefactor. The network parameters are optimized using an ATPEbased Bayesian Optimization. The optimization is carried outstep by step to avoid the complexity of handling all theparameters at the same time. This allows to achieve thedesirable performance while still maintaining the biologicalproperties of the network components. This allows for futurestudies to use the detailed computational model to study caseswith cerebellar damage in which monitoring the activity of allthe neurons simultaneously is not feasible.The obtained results show that cerebellum acts to reducethe deviation from the target path and the execution time. Ad-ditionally, it demonstrates the ability to learn new skills suchas deformable object manipulation based on the error in taskperformance. The developed model demonstrates the ability toreduce the error in a certain direction in only few repetitionsndicating the fast convergence of learning, and the suitabilityto be further developed for real-time adaptive robot controlin multiple scenarios and applications. Moreover, it shall benoted from the conducted experiments that the estimation errorin DM is not uniform across the map. This is related to theway of collecting and introducing data for training. This isanalogous to having areas mapping different body parts in themotor cortex having different density of neurons dependingon how frequent and accurate are the motions generated byeach body part. Hence, providing sensory corrections from CB allows DM to improve the quality of the motor outputand points to the possibility of transfer of learning by havingthe cerebellum correcting the motion, and thus improving thequality of the training data introduced to the motor cortex.The radial reaching experiment obtained with the robot armdisplays a fair similarity with the observations in [47]. Thisstudy considers only the planar motion at the end-effector asthose carried out to test the motion of patients suffering fromcerebellar damage. Future studies shall include non-planarmotions and incorporate dynamics/forces acting at the end-effector [51]. Moreover, future models would include morefeatures, where developing highly detailed models would allowidentifying cerebellar dysfunctions and studying lesions at thecellular-level, which may not be possible using current state-of-art techniques. VII. F UNDING
This research work was supported in part by the ResearchGrants Council (RGC) of Hong Kong under grant number14203917, in part by PROCORE-France/Hong Kong JointResearch Scheme sponsored by the RGC and the ConsulateGeneral of France in Hong Kong under grant F-PolyU503/18,in part by the Key-Area Research and Development Programof Guangdong Province 2020 under project 76 and in part byThe Hong Kong Polytechnic University under grant G-YBYT.R
EFERENCES[1] H. J. Chiel and R. D. Beer, “The brain has a body: adaptive behavioremerges from interactions of nervous system, body and environment,”
Trends in neurosciences , vol. 20, no. 12, pp. 553–557, 1997.[2] M. Rucci, D. Bullock, and F. Santini, “Integrating robotics and neuro-science: brains for robots, bodies for brains,”
Advanced Robotics , vol. 21,no. 10, pp. 1115–1129, 2007.[3] J. L. Krichmar and G. M. Edelman, “Machine psychology: autonomousbehavior, perceptual categorization and conditioning in a brain-baseddevice,”
Cerebral Cortex , vol. 12, no. 8, pp. 818–830, 2002.[4] G. M. Edelman, “Learning in and from brain-based devices,” science ,vol. 318, no. 5853, pp. 1103–1105, 2007.[5] J. C. Eccles,
The cerebellum as a neuronal machine . Springer Science& Business Media, 2013.[6] R. L. Buckner, “The cerebellum and cognitive function: 25 years ofinsight from anatomy and neuroimaging,”
Neuron , vol. 80, no. 3, pp.807–815, 2013.[7] J. Porrill, P. Dean, and S. R. Anderson, “Adaptive filters and internalmodels: multilevel description of cerebellar function,”
Neural Networks ,vol. 47, pp. 134–149, 2013.[8] E. R. Kandel, J. H. Schwartz, T. M. Jessell, D. of Biochemistry, M. B. T.Jessell et al. , Principles of neural science . McGraw-hill New York,2000, vol. 4.[9] N. R. Luque, J. A. Garrido, R. R. Carrillo, S. Tolu, and E. Ros,“Adaptive cerebellar spiking model embedded in the control loop:Context switching and robustness against noise,”
International Journalof Neural Systems , vol. 21, no. 05, pp. 385–401, 2011. [10] S. Tolu, M. Vanegas, J. A. Garrido, N. R. Luque, and E. Ros, “Adaptiveand predictive control of a simulated robot arm,”
International journalof neural systems , vol. 23, no. 03, p. 1350010, 2013.[11] D. M. Wolpert, R. C. Miall, and M. Kawato, “Internal models in thecerebellum,”
Trends in cognitive sciences , vol. 2, no. 9, pp. 338–347,1998.[12] M. Kawato, K. Furukawa, and R. Suzuki, “A hierarchical neural-networkmodel for control and learning of voluntary movement,”
Biologicalcybernetics , vol. 57, no. 3, pp. 169–185, 1987.[13] T. Ishikawa, S. Tomatsu, J. Izawa, and S. Kakei, “The cerebro-cerebellum: Could it be loci of forward models?”
Neuroscience research ,vol. 104, pp. 72–79, 2016.[14] D. M. Wolpert and M. Kawato, “Multiple paired forward and inversemodels for motor control,”
Neural networks , vol. 11, no. 7-8, pp. 1317–1329, 1998.[15] T. Honda, S. Nagao, Y. Hashimoto, K. Ishikawa, T. Yokota et al. ,“Tandem internal models execute motor learning in the cerebellum,”
Proceedings of the National Academy of Sciences , vol. 115, no. 28, pp.7428–7433, 2018.[16] O. Zahra, D. Navarro-Alarcon, and S. Tolu, “Vision-based control forrobots by a fully spiking neural system relying on cerebellar predictivelearning,” arXiv preprint arXiv:2011.01641 , 2020.[17] I. Abad´ıa, F. Naveros, J. A. Garrido, E. Ros, and N. R. Luque, “On robotcompliance: A cerebellar control approach,”
IEEE Trans. on cybernetics ,2019.[18] S. Tolu, M. C. Capolei, L. Vannucci, C. Laschi, E. Falotico, and M. V.Hernandez, “A cerebellum-inspired learning approach for adaptive andanticipatory control,”
International Journal of Neural Systems , vol. 30,no. 01, p. 1950028, 2020.[19] W. Maass, “Networks of spiking neurons: the third generation of neuralnetwork models,”
Neural networks , vol. 10, no. 9, pp. 1659–1671, 1997.[20] M. C. Capolei, N. A. Andersen, H. H. Lund, E. Falotico, and S. Tolu, “Acerebellar internal models control architecture for online sensorimotoradaptation of a humanoid robot acting in a dynamic environment,”
IEEERobotics and Automation Letters , vol. 5, no. 1, pp. 80–87, 2020.[21] G. J. Mogenson, D. L. Jones, and C. Y. Yim, “From motivation to action:functional interface between the limbic system and the motor system,”
Progress in neurobiology , vol. 14, no. 2-3, pp. 69–97, 1980.[22] J. Merel, M. Botvinick, and G. Wayne, “Hierarchical motor control inmammals and machines,”
Nature communications , vol. 10, no. 1, pp.1–12, 2019.[23] A. J. Bastian, “Moving, sensing and learning with cerebellar damage,”
Current opinion in neurobiology , vol. 21, no. 4, pp. 596–601, 2011.[24] O. Zahra, S. Tolu, and D. Navarro-Alarcon, “Differential mappingspiking neural network for sensor-based robot control,” arXiv preprintarXiv:2005.10017 , 2020.[25] S. Amari et al. , The handbook of brain theory and neural networks .MIT press, 2003.[26] S. Herculano-Houzel, “The human brain in numbers: a linearly scaled-upprimate brain,”
Frontiers in human neuroscience , vol. 3, p. 31, 2009.[27] N. R. Luque, J. A. Garrido, R. R. Carrillo, E. D’Angelo, and E. Ros,“Fast convergence of learning requires plasticity between inferior oliveand deep cerebellar nuclei in a manipulation task: a closed-loop roboticsimulation,”
Frontiers in computational neuroscience , vol. 8, p. 97, 2014.[28] R. Nishiyori, S. Bisconti, S. K. Meehan, and B. D. Ulrich, “Develop-mental changes in motor cortex activity as infants develop functionalmotor skills,”
Developmental Psychobiology , vol. 58, no. 6, pp. 773–783, 2016.[29] H. T. Chugani, “Imaging brain metabolism in the newborn,”
Journal ofchild neurology , vol. 33, no. 13, pp. 851–860, 2018.[30] R. C. Knickmeyer, S. Gouttard, C. Kang, D. Evans, K. Wilber et al. , “Astructural mri study of human brain development from birth to 2 years,”
Journal of neuroscience , vol. 28, no. 47, pp. 12 176–12 182, 2008.[31] T. Yamazaki and S. Tanaka, “The cerebellum as a liquid state machine,”
Neural Networks , vol. 20, no. 3, pp. 290–297, 2007.[32] D. Cousineau, S. Brown, and A. Heathcote, “Fitting distributions us-ing maximum likelihood: Methods and packages,”
Behavior ResearchMethods, Instruments, & Computers , vol. 36, no. 4, pp. 742–756, 2004.[33] J. Bergstra, R. Bardenet, Y. Bengio, and B. K´egl, “Algorithms for hyper-parameter optimization,” in , vol. 24. Neural InformationProcessing Systems Foundation, 2011.[34] B. Arsenault, “Adaptive tree parzen estimator,” Oct 2018. [Online].Available: https://github.com/electricbrainio[35] J. H. Zar, “Spearman rank correlation,”
Encyclopedia of biostatistics ,vol. 7, 2005.36] E. M. Izhikevich, “Simple model of spiking neurons,”
IEEE Trans. onneural networks , vol. 14, no. 6, pp. 1569–1572, 2003.[37] D. Buonomano and T. Carvalho, “Spike-timing-dependent plasticity(stdp),” in
Encyclopedia of Neuroscience , L. R. Squire, Ed. Oxford:Academic Press, 2009, pp. 265 – 268.[38] K. Buchanan and J. Mellor, “The activity requirements for spiketiming-dependent plasticity in the hippocampus,”
Frontiers in SynapticNeuroscience
Neuron , vol. 39, no. 5, pp. 807–820, 2003.[40] A. K. Fidjeland, E. B. Roesch, M. P. Shanahan, and W. Luk, “Nemo: aplatform for neural modelling of spiking neurons using gpus,” in . IEEE, 2009, pp. 137–144.[41] K. Powell, A. Mathy, I. Duguid, and M. H¨ausser, “Synaptic represen-tation of locomotion in single cerebellar granule cells,”
Elife , vol. 4, p.e07290, 2015.[42] E. M. Izhikevich, “Which model to use for cortical spiking neurons?”
IEEE Trans. on neural networks , vol. 15, no. 5, pp. 1063–1070, 2004.[43] J. D. Engbers, F. R. Fernandez, and R. W. Turner, “Bistability in purkinjeneurons: ups and downs in cerebellar research,”
Neural networks ,vol. 47, pp. 18–31, 2013.[44] L. Squire, D. Berg, F. E. Bloom, S. Du Lac, A. Ghosh, and N. C. Spitzer,
Fundamental neuroscience . Academic Press, 2012.[45] P. J. Mathews, K. H. Lee, Z. Peng, C. R. Houser, and T. S. Otis, “Effectsof climbing fiber driven inhibition on purkinje neuron spiking,”
Journalof Neuroscience , vol. 32, no. 50, pp. 17 988–17 997, 2012.[46] E. J. Lang and T. A. Blenkinsop, “Control of cerebellar nuclear cells:a direct role for complex spikes?”
The Cerebellum , vol. 10, no. 4, pp.694–701, 2011.[47] P. Fortier and J. Kalaska, “Cerebellar activity during reaching 199 rapidself-paced alternating movements ( schieber and,” 2002.[48] M.-K. Hu, “Visual pattern recognition by moment invariants,”
IRE Trans.on information theory , vol. 8, no. 2, pp. 179–187, 1962.[49] P. Sturm,
Pinhole Camera Model . Boston, MA: Springer US, 2014,pp. 610–613.[50] D. Navarro-Alarcon, Y.-h. Liu, J. G. Romero, and P. Li, “On the visualdeformation servoing of compliant objects: Uncalibrated control meth-ods and experiments,”
The International Journal of Robotics Research ,vol. 33, no. 11, pp. 1462–1480, 2014.[51] M. C. Capolei, E. Angelidis, E. Falotico, H. H. Lund, and S. Tolu, “Abiomimetic control method increases the adaptability of a humanoidrobot acting in a dynamic environment,”