[PDF] Towards Crossing the Reality Gap with Evolved Plastic Neurocontrollers

Abstract

A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the well-designed controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by self-adapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its non-plastic counterpart.

Full PDF

aa r X i v : . [ c s . R O ] M a y Towards Crossing the Reality Gap with Evolved PlasticNeurocontrollers

Huanneng Qiu

The University of New South Wales CanberraCanberra, ACT, [email protected]

Matthew Garratt

The University of New South Wales CanberraCanberra, ACT, [email protected]

David Howard

Robotics and Autonomous Systems Group, CSIROBrisbane, QLD, [email protected]

Sreenatha Anavatti

The University of New South Wales CanberraCanberra, ACT, [email protected]

ABSTRACT

A critical issue in evolutionary robotics is the transfer of controllerslearned in simulation to reality. This is especially the case for smallUnmanned Aerial Vehicles (UAVs), as the platforms are highly dy-namic and susceptible to breakage. Previous approaches often re-quire simulation models with a high level of accuracy, otherwisesigniﬁcant errors may arise when the well-designed controller isbeing deployed onto the targeted platform. Here we try to over-come the transfer problem from a diﬀerent perspective, by design-ing a spiking neurocontroller which uses synaptic plasticity to crossthe reality gap via online adaptation. Through a set of experimentswe show that the evolved plastic spiking controller can maintainits functionality by self-adapting to model changes that take placeafter evolutionary training, and consequently exhibit better perfor-mance than its non-plastic counterpart.

CCS CONCEPTS • Computer systems organization → Evolutionary robotics; •

Computing methodologies → Neural networks ; Evolutionaryrobotics ; •

Applied computing → Biological networks ; KEYWORDS evolutionary robotics, spiking neural networks, Hebbian plasticity,neuroevolution, UAV control

ACM Reference Format:

Huanneng Qiu, Matthew Garratt, David Howard, and Sreenatha Anavatti.2020. Towards Crossing the Reality Gap with Evolved Plastic Neurocon-trollers. In

Genetic and Evolutionary Computation Conference (GECCO ’20),July 8–12, 2020, CancÃžn, Mexico.

ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3377930.3389843

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor proﬁt or commercial advantage and that copies bear this notice and the full cita-tion on the ﬁrst page. Copyrights for components of this work owned by others thanthe author(s) must be honored. Abstracting with credit is permitted. To copy other-wise, or republish, to post on servers or to redistribute to lists, requires prior speciﬁcpermission and/or a fee. Request permissions from [email protected].

GECCO ’20, July 8–12, 2020, CancÃžn, Mexico © 2020 Copyright held by the owner/author(s). Publication rights licensed to the As-sociation for Computing Machinery.ACM ISBN 978-1-4503-7128-5/20/07...$15.00https://doi.org/10.1145/3377930.3389843

Unmanned Aerial Vehicles (UAVs) are challenging platforms fordeveloping and testing advanced control techniques, because theyare highly dynamic, with strong couplings between diﬀerent sub-systems [18]. Controller design for these agile platforms is natu-rally diﬃcult, as a poorly-performing controller can lead to cata-strophic consequences, e.g., the UAV crashing. In addition, manylearning approaches require large numbers of ﬁtness evaluations.Therefore, there still exist a large group of aerial robotic studiesrelying on simulations as an intermediate step to develop controlalgorithms [14].When simulating, it is not uncommon to derive UAV modelsmathematically from ﬁrst principles [2, 19]. However, such modelsare ill-suited to capturing every aspect of the system dynamics, be-cause some of them cannot easily be modeled analytically, e.g., ac-tuator kinematic nonlinearities, servo dynamics, etc [6]. Ignoringthese eﬀects can signiﬁcantly deteriorate the performance of thedesigned controller when being deployed onto the targeted plat-form. To address this issue, a common practice is to develop con-trol algorithms based on an ‘identiﬁed’ model that is a simulatedrepresentation of the real plant. This identiﬁed model is obtainedby applying a data-driven process called system identiﬁcation thatmodels the exact dynamics from the measured plant’s input andoutput data. Such implementations have been successful amongstprevious research [6, 9, 14, 17, 18].While a lot of works have pursued a perfect model that wellcharacterizes UAV platforms, a key issue is that loss of performanceis still likely to happen when transferring the well-designed (insimulation) controller onto the real platform that has somewhatdiﬀerent dynamics – the well-known reality gap . In this work wedemonstrate a novel approach to compensate the gap across diﬀer-ent platform representations, which works speciﬁcally with Spik-ing Neural Networks (SNNs) that exhibit online adaptation abil-ity through Hebbian plasticity [7]. We propose an evolutionarylearning strategy for SNNs, which includes topology and weightevolution as per NEAT [23], and integration of biological plasticlearning mechanisms. With the goal of simulation-to-reality trans-fer, we here prove the concept in a time-eﬃcient manner by trans-ferring from a simpler to a more complex model, a transfer thatencapsulates some issues inherent in crossing the reality gap, i.e.incomplete capture of true ﬂight dynamics, oversimpliﬁcation oftrue conditions.

ECCO ’20, July 8–12, 2020, CancÃžn, Mexico H. Qiu et al.

In this work, we focus on the development of UAV height con-trol. Our approach to resolve this problem is threefold. First, ex-plicit mathematical modeling of the aircraft is not required. Instead,a simpliﬁed linear model is identiﬁed based on the measurementof the plant’s input and output data. In reality, such models arefast to run and simple to develop. Second, neuroevolution takesplace as usual to search through solution space for the construc-tion of high-performance networks. Finally, Hebbian plasticity isimplemented by leveraging evolutionary algorithms to optimizeplastic rule coeﬃcients that describe how neural connections areupdated. Plasticity evolution has been used in conventional ANNs[22, 24, 25] and SNNs [10], where evolution takes place in the rulesthat govern synaptic self-organization instead of in the synapsesthemselves. The evolved controller is able to exhibit online adap-tation due to plasticity, which allows successful transfer to a morerealistic model and indicates that transfer to reality would be sim-ilarly successful.Organization of the rest of this paper is as follows. Section 2 in-troduces our SNN package that is utilized to develop our UAV con-troller, including descriptions of spiking neuron models, the mech-anism of plasticity learning and evolutionary learning strategies.Section 3 presents the plant model to be controlled in this work.Section 4, 5 and 6 describe the controller development process indetail. Results and analysis are given in Section 7. Finally, discus-sions and conclusions are presented in Section 8 and 9.

The current widely used Artiﬁcial Neural Networks (ANNs) followa computation cycle of multiply-accumulate-activate . The neuronmodel consists of two components: a weighted sum of inputs andan activation function generating the output accordingly. Both theinputs and outputs of these neurons are real-valued. While ANNmodels have shown exceptional performance in the artiﬁcial intel-ligence domain, they are highly abstracted from their biologicalcounterparts in terms of information representation, transmissionand computation paradigms.SNNs, on the other hand, carry out computation based on bi-ological modeling of neurons and synaptic interactions, and havebeen of great interest in the computational intelligence communityin recent decades. Applications have been both non-behavioral [1]and behavioral [20, 26]. Information transmission in SNNs is bymeans of discrete spikes generated during a potential integrationprocess. Such spatiotemporal dynamics are able to yield more pow-erful computation compared with non-spiking neural systems [15].Moreover, neuromorphic hardware implementations of SNNs arebelieved to provide fast and low-power information processing dueto their event-driven sparsity [3], which perfectly suits embeddedapplications such as UAVs.As shown in Fig. 1, spikes are ﬁred at certain points in time,whenever the membrane potential of a neuron exceeds its thresh-old. They will travel through synapses from the presynaptic neu-rons and arrive at all forward-connected postsynaptic neurons. Theinformation measured by spikes is in form of timing and frequency,rather than the amplitude or intensity. ∑ θ v output spikeinput spikes t1 (1) t2 (1) t2 (2) Figure 1: Illustration of spike transmission in SNNs. Mem-brane potential v accumulates as input spikes arrive and de-cays with time. Whenever it reaches a given threshold θ , anoutput spike will be ﬁred, and the potential will be reset toa resting value. In order to assist the process of designing our spiking controller,we have developed the eSpinn software package. The eSpinn li-brary stands for E volving Spi king N eural N etworks. It is designedto develop controller learning strategies for nonlinear control mod-els by integrating biological learning mechanisms with neuroevo-lution algorithms. It is able to accommodate diﬀerent network im-plementations (ANNs, SNNs and hybrid models) with speciﬁc dataﬂowschemes. eSpinn is written in C++ and has abundant interfaces toeasily archive data through serialization. It also contains scriptsfor data visualization and integration with MATLAB and Simulinksimulations. To date, there have been diﬀerent kinds of spiking neuron mod-els. When implementing a neuron model, trade-oﬀs must be con-sidered between biological reality and computational eﬃciency. Inthis work we use the two-dimensional Izhikevich model [11], be-cause of its capability of exhibiting richness and complexity in neu-ron ﬁring behavior (detailed in [11]) with only two ordinary diﬀer-ential equations: Û v = . v + v + − u + I Û u = a ( bv − u ) (1)with after-spike resetting following:if v ≥ v t , then (cid:26) v = cu = u + d (2)Here v represents the membrane potential of the neuron; u rep-resents a recovery variable; Û v and Û u denote their derivatives, re-spectively. I represents the synaptic current that is injected intothe neuron. Whenever v exceeds the threshold of membrane po-tential v t , a spike will be ﬁred and v and u will be reset followingEq. (2). a , b , c and d are dimensionless coeﬃcients that are tunableto form diﬀerent ﬁring patterns [11]. The membrane potential re-sponse of an Izhikevich neuron is given in Fig. 2, with an injectedcurrent signal.A spike train is deﬁned as a temporal sequence of ﬁring times: s ( t ) = Õ f δ ( t − t ( f ) ) (3) owards Crossing the Reality Gap with Evolved Plastic Neurocontrollers GECCO ’20, July 8–12, 2020, CancÃžn, Mexico v (t) I (t) Figure 2: Membrane potential response v ( t ) to an externalcurrent signal I ( t ) of an Izhikevich neuron with the follow-ing settings: a = 0.02; b = 0.2; c = -65; d = 2. where δ ( t ) is the Dirac δ function; t ( f ) represents the ﬁring time,i.e., the moment of v crossing threshold v t from below. We use a three-layer architecture that has hidden-layer recurrentconnections, illustrated in Fig. 3. The input layer consists of en-coding neurons which act as information converters. Hidden-layerspiking neurons are connected via unidirectional weighted synapsesamong themselves. Such internal recurrence ensures a history ofrecent inputs is preserved within the network, which exhibits highlynonlinear functionality. Output neurons can be conﬁgured as ei-ther activation-based or spiking. In this work a linear unit is usedto obtain real-value outputs from a weighted sum of outputs fromhidden-layer neurons. A bias neuron that has a constant outputvalue is able to connect to any neurons in the hidden and outputlayers. Connection weights are bounded within [-1, 1]. The NEATtopology and weight evolution scheme is used to form and updatenetwork connections and consequently to seek functional networkcompositions.In a rate coding scheme, neuron output is deﬁned as the spiketrain frequency calculated within a given time window. Loss ofprecision during this process is likely to happen. eSpinn conﬁg-ures a decoding method with high accuracy to derive continuousoutputs from discrete spike trains. The implementation involvesdirect transfer of intermediate membrane potentials as well as de-coding of spikes in a rate-based manner.

In neuroscience, studies have shown that synaptic strength in bi-ological neural systems is not ﬁxed but changes over time [13] –connections between pre- and postsynaptic neurons are updatedaccording to their degree of causality, which involves changes ofsynaptic weights or even formation/removal of synapses. This phe-nomenon is often referred to as Hebbian plasticity as inspired bythe Hebb’s postulate [8].In our work, plastic behaviors are deter-mined by leveraging evolutionary algorithms to optimize plasticrule coeﬃcients, such that each connection is able to develop itsown plastic rule.Modern Hebbian rules generally describe weight change ∆ w asa function of the joint activity of pre- and postsynaptic neurons: ∆ w = f ( w ij , u j , u i ) (4) ii hhh ob v z e z Tw o w o w b Figure 3: Spiking network topology that allows internal re-currence. Network inputs ( i ) consist of position error in z-axis e z and vertical velocity v z . Hidden layer neurons ( h )are spiking, whose outputs o i involve direct transfer of in-termediate membrane potential and decoding of ﬁring rate.A bias neuron ( b ) is allowed to connect to any neurons inthe hidden and output layer. Output thrust command T iscalculated based on a weighted sum of incoming neuron ac-tivations Í w i o i , which will be fed to the hexacopter plantmodel. Weights w i are bounded within [-1, 1]. where w ij represents the weight of the connection from neuron j to neuron i ; u j and u i represent the ﬁring activity of j and i , respec-tively.In a spike-based scheme, we consider the synaptic plasticity atthe level of individual spikes. This has led to a phenomenologicaltemporal Hebbian paradigm: Spiking-Timing Dependent Plasticity(STDP) [7], which modulates synaptic weights between neuronsbased on the temporal diﬀerence of spikes.While diﬀerent STDP variants have been proposed [12], the ba-sic principle of STDP is that the change of weight is driven bythe causal correlations between the pre- and postsynaptic spikes.Weight change would be more signiﬁcant when the two spikes ﬁrecloser together in time. The standard STDP learning window is for-mulated as: W ( ∆ t ) = ( A + e − ∆ tτ + ∆ t > , A − e ∆ tτ − ∆ t < . (5)where A + and A − are scaling constants of strength of potentiationand depression; τ + and τ − represent the time decay constants; ∆ t isthe time diﬀerence between pre- and post-synaptic ﬁring timings: ∆ t = t post − t pre (6)In eSpinn we have introduced a rate-based Hebbian model de-rived from the nearest neighbor STDP implementation [12], withtwo additional evolvable parameters: Û w = u i ( A + τ − + + u i + k m ( u j − u i + k c ) + A − τ − − + u i ) (7)where k m is a magnitude term that determines the amplitude ofweight changes, and k c is a correlation term that determines thecorrelation between pre- and postsynaptic ﬁring activity. These ECCO ’20, July 8–12, 2020, CancÃžn, Mexico H. Qiu et al. factors are set to as evolvable so that the best values can be au-tonomously located. Fig. 4 shows the resulting Hebbian learningcurve. The connection weight has a stable converging equilibriumat u θ , which is due to the correlation term k c . This equilibriumcorresponds to a balance of the pre- and postsynaptic ﬁring. Postsynaptic firing rate, u i (Hz) -0.8-0.6-0.4-0.200.20.40.6 S y nap t i c w e i gh t c hange , d w ( % ) u = u j +k c u j k c Figure 4: Hebbian learning curve with A + = 0.1, A − = -0.1, τ + = 0.02 s, τ − = 0.02 s While gradient methods have been very successful in training tra-ditional MLPs [4], their implementations on SNNs are not as straight-forward because they require the availability of gradient informa-tion. Instead, eSpinn has developed its own version of a popu-lar neuroevolution approach – NEAT [23], which can accommo-date diﬀerent network implementations and integrate with Heb-bian plasticity, as the method to learn the best network controller.NEAT is a popular neuroevolution algorithm that involves net-work topology and weight evolution. It enables an incremental net-work topological growth to discover the (near) minimal eﬀectivenetwork structure.The basis of NEAT is the use of historical markings , which areessentially gene IDs. They are used as a measurement of the ge-netic similarity of network topology, based on which, genomesare clustered into diﬀerent species. Then NEAT uses an explicitﬁtness sharing scheme [5] to preserve network diversities. Mean-while, these markings are also used to line up genes from varianttopologies and allow crossover of divergent genomes in a rationalemanner. eSpinn keeps a global list of innovations (e.g., structural varia-tions), so that when an innovation occurs, we can know whetherit has already existed. This mechanism will ensure networks withthe same topology will have the exactly same innovation numbers,which is essential during the process of network structural growth.

The experimental platform is a commercial hexacopter, Tarot 680Pro, ﬁtted with a Pixhawk 2 autopilot system. To assist the devel-opment and tests of our control paradigms, we have developed aSimulink model based on our previous work [21]. The model is derived from ﬁrst principles, which contains 6-DOF rigid body dy-namics and non-linear aerodynamics. Many aspects of the hexa-copter dynamics are modeled with C/C++ S-functions, which de-scribe the functionalities of Simulink blocks in C/C++ with MAT-LAB built-in APIs.The simulation system is based on a hierarchical architecture.The top-level diagram of the system is given in Fig. 5. The ‘ControlMixing’ block combines controller commands from the ‘AttitudeController’, ‘Yaw Controller’ and ‘Height Controller’ to calculateappropriate rotor speed commands using a linear mixing matrix.In the ‘Forces & Moments’ block we take the rotor speeds andcalculate the thrust and torque of each rotor based on the rela-tive airﬂow through the blades. Then the yawing torque will beobtained by simply summing up the torque of each rotor. Rollingand pitching torques can also be calculated by multiplying thethrust of each rotor with corresponding torque arms. Meanwhile,we have also introduced a drag term on the fuselage caused by air-craft climb/descent, of which the direction is opposite to the vectorsum of aircraft velocity. The collective thrust would be equal to thesum of thrust of each rotor combined with the drag eﬀect.Afterwards, the thrust and torques are fed to the ‘HexacopterDynamics’ block. Assuming the UAV is a rigid body, Newton’s sec-ond law of motion is used to calculate the linear and angular accel-erations and hence the state of the drone will be updated. To con-vert the local velocities of the UAV to the earth-based coordinatewe will need a rotation matrix, which is parameterized in terms ofquaternion to avoid singularities caused by reciprocating trigono-metric functions (gimbal lock).Finally, closed-loop simulations have been tested to validate thefunctionality of the Simulink model. Tuned PID controllers thatdisplay fast response and low steady output error are used in boththe inner and outer loops as a challenging benchmark.

In this work, we are aiming to develop an SNN controller for heightcontrol of a hexacopter without explicit modeling of the UAV. Heb-bian plasticity that is evolved oﬄine enables online adaptation tocross the gap between the identiﬁed model and the targeted plant.The controller takes some known states of the plant model (i.e.,error in z-axis between the desired and current position as wellas the vertical velocity) and learns to generate a functional actionselection policy. The output is a thrust command that will be fedinto the plant so that its status can be updated.Our approach to resolve the problem is threefold. First, systemidentiﬁcation is carried out to construct a heave model to looselyapproximate the dynamics of the hexacopter. Then neuroevolutionis used to search for functional SNN controllers to control the iden-tiﬁed heave model. Network topology and initial weight conﬁgu-rations are determined. Finally, the ﬁttest controller is selected forfurther evolution. Hebbian plasticity is activated so that the net-work is able to adapt connection weights according to local neuralactivations. An EA is used to determine the best plasticity rulesby evolving the two parameters k m and k c in Eq. 7. Each connec-tion can develop its own plasticity rule. The above-mentioned pro-cesses will be oﬄine and only involve the identiﬁed model, and thedynamics of the hexacopter are unknown to the controller. owards Crossing the Reality Gap with Evolved Plastic Neurocontrollers GECCO ’20, July 8–12, 2020, CancÃžn, Mexico Rigid BodyHexacopterDynamics

RollPitchYawThrust W1W2W3W4W5W6

Control MixingX0_FF(3)Initial HeightINITIAL_HEADINGDesired Heading0Reference Vx0Reference Vy Height ControllerYaw Controller

VXrefVYref Roll CmdPitch Cmd

Outer LoopController AttitudeController

W1W2W3W4W5W6CG_XCG_Y F & M

Forces &Moments TurbulenceModel

StatesWind

Wind(m/s)Reference HeightReference Height 00

Figure 5: Top-level diagram of the hexacopter control model

Thrust command, T -20-15-10-50 V e r t i c a l a cc e l e r a t i on , a z ( m / s ) -3m/s -2m/s -1m/s 0m/s1m/s2m/s 3m/s Figure 6: Nonlinear relationship between vertical velocity v z (-3 m/s to 3 m/s), thrust command T and vertical acceleration a z . On completion of training, the champion network with the bestplasticity rules will be deployed to drive the hexacopter model,which is a more true-to-life representation of the real plant.

We ﬁrst build a loose approximation to resemble the heave dynam-ics of the hexacopter. Essentially, this is to model the relationshipbetween the vertical velocity v z , collective thrust T and the verticalacceleration a z . Fig. 6 shows the nonlinear response of vertical ac-celeration with varying thrust command when the vertical speedis set as -3 m/s to 3 m/s. Note here that the acceleration is actuallythe net eﬀect of z-axis force acting on the body, which are gener-ated from the rotor thrust, vertical drag caused by rotor downwashand fuselage. The net acceleration a n would be a z plus the gravi-tational acceleration д . Thrust command, T -18-16-14-12-10-8-6-4 V e r t i c a l a cc e l e r a t i on , a z ( m / s ) a n = 0 a zid ( v z =0m/s) a z ( v z =0m/s) a z ( v z =1m/s) k v k T = tan -1 ( ) Figure 7: Acceleration curves of the identiﬁed model ( a idz )and the hexacopter model ( a z ) with varying thrust com-mand. The identiﬁed curve is tangent with that of the hex-acopter model at the point where the net acceleration a n is0. α is the slope angle of the identiﬁed linear curve, fromwhich k T is obtained. k v is calculated from the vertical dis-tance between the two nonlinear curves. In our identiﬁed model, vertical acceleration a z is approximatedas a linear combination of the thrust command T and vertical speed v z . v z , on the other hand, is obtained by integrating the net accel-eration of z-axis a n : a z = k T T + k v v z + ba n = a z + дv z = ∫ a n (8) ECCO ’20, July 8–12, 2020, CancÃžn, Mexico H. Qiu et al. where k T and k v are conﬁgurable coeﬃcients; b is a bias that is alsotunable to make sure that the linear function will be expanded atthe point where the net acceleration equals zero, i.e., a z = − д .We take two of the acceleration curves from Fig. 6 (i.e., for v z = 0 m/s and v z = 1 m/s) to model the linear function. The resultingidentiﬁed linear model is given in Fig. 7. k T is identiﬁed as the slopeof a z against T when v z = 0 at the point where a n = k v is thencalculated from the vertical distance between the two nonlinearcurves. Finally, b is set to shift the linear curve vertically, so thatthe identiﬁed model will be tangent with the hexacopter curve atthe point where a n = Time, t (s) -16-14-12-10-8-6 V e r t i c a l a cc e l e r a t i on , a z ( m / s ) hexa modelid'd model Figure 8: Validation of identiﬁed heave model. System re-sponse of the two models when fed with the same thrustcommand signal.

With the identiﬁed model we have developed according to Eq. 8,we begin to search for optimal network compositions by evolvingSNNs using our NEAT implementation. By ‘optimal,’ we mean theSNN controller is deﬁned to be able to drive the plant model tofollow a reference signal with minimal error in height during thecourse of ﬂight. Each simulation (ﬂight) lasts 80 s and is updatedevery 0.02 s.To speed up the evolution process, the whole simulation in thispart is implemented in C++, with our eSpinn package. At the be-ginning, a population of non-plastic networks are initialized andcategorized into diﬀerent species. These networks are feed-forward,fully-connected with random connection weights. The initial topol-ogy is 2-4-1 (input-hidden-output layer neurons), with an addi-tional bias neuron that is connected to all hidden and output layerneurons. The two inputs consist of error of position in z-axis e z and vertical velocity v z , other than which, the system’s dynamics are unknown to the controller. Output of the controller is thrustcommand that will be fed to the plant model.Encoding of sensing data is done by the encoding neurons inthe input layer. Input data are ﬁrst normalized within the rangeof [0,1], so that the standardized signal can be linearly convertedinto a current value (i.e., I in Eq. 1). This so-called ‘current coding’method is a common practice to provide a notional scale to theinput metrics.After initialization, each network will be iterated one by oneto be evaluated against the plant model. A ﬁtness value will beassigned to each of them based on their performance. Afterwards,these networks will be ranked within their species according totheir ﬁtness values in descending order. A newer generation willbe formed from the best parent networks using NEAT: only thetop 20% of parents in each species are allowed to reproduce, afterwhich, the previous generation is discarded and the newly createdchildren will form the next generation. During evolution, hiddenlayer neurons will increase with a probability of 0.005, connectionswill be added with a probability of 0.01. Connection weights willbe bounded within [-1, 1].The program terminates when the population’s best ﬁtness hasbeen stagnant for 12 generations or if the evolution has reached 50generations . During the simulation, outputs of the champion willbe saved to ﬁles for later visualization. The best ﬁtness will also besaved. Upon completion of simulation, data structure of the wholepopulation will be archived to a text ﬁle, which can be retrieved tobe constructed in our later work. Note the control system to be solved is a Constraint Problem [16],because the height of the UAV must be bounded within some cer-tain range in the real world. However, constraint handling is notstraightforward in NEAT – invalid solutions that violate the sys-tem’s boundary can be generated, even if their parents satisfy theseconstraints. Therefore, in this paper we use the feasibility-ﬁrst prin-ciple [16] to handle the constraints.We divide the potential solution space into two disjoint regions,the feasible region and the infeasible, by whether the hexacopteris staying in the bounded area during the entire simulation. Forinfeasible candidates, a penalized ﬁtness function is introduced sothat their ﬁtness values are guaranteed to be smaller than thosefeasible.We deﬁne the ﬁtness function of feasible solutions based on themean normalized absolute error during the simulation: f = − ¯ | e n | (9)where | e n | denotes the normalized absolute error between actualand reference position. Since the error is normalized, desired solu-tion will have a ﬁtness value close to 1.For infeasible solutions, we deﬁne the ﬁtness based on the timethat the hexacopter stays in the bounded region: f = k ( t i / t t ) (10)where t i is the steps that the hexacopter successively stays in thebounded region, and t t is the total amount of steps the entire sim-ulation has. Penalty is applied using a scalar k of 0.2. empirically determined owards Crossing the Reality Gap with Evolved Plastic Neurocontrollers GECCO ’20, July 8–12, 2020, CancÃžn, Mexico Table 1: Best Networks’ Mean Fitness Values in Progress

Non-plastic onid’d model Plastic on id’dmodel Plastic on hexamodelFitness 0.9189 0.9349 0.9298

Once the above step is done to discover the optimal network topol-ogy, we proceed to consider the plasticity rules. The champion net-work from the previous step is loaded from ﬁle, with the Hebbianrule activated. It is spawned into a NEAT population, where eachnetwork connection has randomly initialized Hebbian parameters(i.e., k m and k c in Eq. 7).Networks are evaluated as previously stated. The best parentswill be selected to reproduce. During this step, all evolution is dis-abled except for that of the plasticity rules, e.g. the EA is only usedto determine the optimal conﬁguration of the plasticity rules.Upon completion of the previous steps, the ﬁnal network con-troller is obtained and ready for deployment. To construct the con-troller in the Simulink hexacopter model, it is implemented as aC++ S-function block.

10 runs of the controller development process have been conductedto perform statistical analysis. Data are recorded to ﬁles and ana-lyzed oﬄine with MATLAB.

Table 1 shows the ﬁtness changes of the best controller duringthe course. From left to right are non-plastic networks control-ling the identiﬁed model, plastic networks controlling the identi-ﬁed model and plastic networks controlling the hexacopter model,respectively. The ﬁtness values are averaged among the 10 runs.As stated in 6.1, evolution would be terminated if the perfor-mance does not improve for 12 consecutive generations before thethreshold of 50. For non-plastic controllers, only one of the 10 runshas reached the threshold, and its ﬁtness has only increased by0.0034 in the last 15 generations. This indicates the evolutionaryruns of non-plastic controllers have plateaued and further evolu-tion is unlikely to ﬁnd better solutions. On the other hand, whenplasticity is enabled, an increase in ﬁtness can be clearly observedwhen controlling the same identiﬁed model. The plastic controllersdemonstrate better performance even when transferred to controlthe hexacopter model that has diﬀerent dynamics.

A second comparison is conducted between non-plastic and plasticcontrollers on the hexacopter model. Results are given in Table 2.For 9 out of the 10 runs, we can see a performance improvementwhen plasticity is enabled. The only one not being better, still hasa close ﬁtness value. Statistic diﬀerence is assessed using the two-tailed Mann-Whitney U -test between the two sets of data. The U -value is 21, showing the plastic controllers are signiﬁcantly betterthan the non-plastics at p < . Table 2: Fitness of Non-Plastic vs. Plastic Controllers on theHexacopter Model

Fitness Non-plastic PlasticRun 1 0.9188 0.9350Run 2 0.9074 0.9271Run 3 0.9261 0.9396Run 4 0.9280 0.9465Run 5 0.9053 0.9162Run 6 0.9046 0.9166Run 7 0.9174 0.9338Run 8 0.9188 0.9256Run 9 0.9219 0.9366Run 10 0.9210 0.9207Mean 0.9169 0.9298

Time, t (s) -2-10123 H e i gh t, z ( m ) refnon-plasticplastic Figure 9: Height control using the non-plastic and plasticSNNs

Fig. 9 shows a typical run using the non-plastic and plastic con-troller. We can see the plastic control system has a faster responseas well as smaller steady error. It is clear that plasticity is a keycomponent to bridge the gap between two models.

To verify the contribution of the proposed Hebbian plasticity, weextract the evolved best plastic rule and applied it to other net-works that have sub-optimal performance. With plasticity enabled,a sub-optimal network is selected to repetitively drive the hexa-copter model to follow the same reference signal. Fig. 10 showsthe progress of 4 consecutive runs when a) plasticity is disabled;b-d) plasticity is enabled.We can see that in Fig. a), there is a considerable steady sys-tem output error. When plasticity is turned on, connection weightsbegin to adjust themselves gradually. The system follows the ref-erence signal with a decreasing steady error until around 0.005 m.Meanwhile a ﬁtness increase is witnessed from a) 0.921296, b) 0.927559,c) 0.932286 to d) 0.933918.

ECCO ’20, July 8–12, 2020, CancÃžn, Mexico H. Qiu et al. a) -2-1012 refact b) -2-10120 20 40 60 80 c) -2-1012 0 20 40 60 80 d) -2-1012 H e i gh t, z ( m ) Time, t (s) Figure 10: Performance improvement during 4 consecutive runs when a) plasticity is disabled; b-d) plasticity is enabled

The same results can be obtained when the rule is assigned toother near-optimal networks, while for those with poor initial per-formance, plasticity learns worse patterns. This analysis has jus-tiﬁed our evolutionary approach to search for the optimal plasticfunction, demonstrating that plasticity narrows the reality gap forevolved spiking neurocontrollers.

PID control is a classic linear control algorithm that has been dom-inant in engineering. The aforementioned PID height controller istaken for comparison. Note here the PID controller is designed di-rectly based on the hexacopter model, whereas the SNN controlleronly relies on the identiﬁed model and utilizes Hebbian plasticityto adapt itself to the new plant model. System outputs of the twoapproaches is given in Fig. 11. Evidently our controller has smallerovershoot and steady output error. The PID controller has a meanabsolute error of 0.108 m during the course of ﬂight, while our plas-tic SNN controller has a value of 0.090 m.

When transferring the pseudo-optimal controllers to physical-worldapplications, one may argue we can still rely on evolution to tweakthe connection conﬁgurations. However, one main problem is thatlearning in evolution cannot be continuous because the ﬁtness sig-nal during the process is not immediately available. What we pro-pose here is to evolve in advance the adaptive characteristics of theneurocontroller, such that the controller can be self-organizing andadaptive to model changes during the entire lifetime in real-time.There is no guarantee that any Hebbian rules can perform synapticchanges on the desired direction. That is why we use evolution to

Time, t (s) -3-2-1012 H e i gh t, z ( m ) refpidsnn Figure 11: Height control using PID and plastic SNNs discover functional Hebbian rules in which synapses build up overtime in a meaningful manner.

Our work has presented a solution to applied evolutionary aerialrobotics, where evolution is used not only in network initial con-struction, but also to formulate plasticity rules which govern synap-tic self-modulation for online adaptation based on local neural ac-tivities. We have shown that plasticity can make the controllermore adaptive to model changes in a way that evolutionary ap-proaches cannot accommodate. We are currently in the process ofapplying this controller development strategy to a real hexacopterplatform, and expanding from height control to encompass all de-grees of freedom in the UAV. owards Crossing the Reality Gap with Evolved Plastic Neurocontrollers GECCO ’20, July 8–12, 2020, CancÃžn, Mexico

REFERENCES [1] L. F. Abbott, Brian DePasquale, and Raoul-Martin Memmesheimer. 2016. Build-ing functional networks of spiking model neurons.

Nature Neuroscience

19 (23Feb 2016), 350 EP –. http://dx.doi.org/10.1038/nn.4241 Perspective.[2] A. Alaimo, V. Artale, C. Milazzo, A. Ricciardello, and L. Treﬁletti. 2013. Mathe-matical modeling and control of a hexacopter. In . 1043–1050. https://doi.org/10.1109/ICUAS.2013.6564793[3] Maxence Bouvier, Alexandre Valentian, Thomas Mesquida, Francois Rummens,Marina Reyboz, Elisa Vianello, and Edith Beigne. 2019. Spiking Neural NetworksHardware Implementations and Challenges: A Survey.

J. Emerg. Technol. Com-put. Syst.

15, 2, Article 22 (April 2019), 35 pages. https://doi.org/10.1145/3304103[4] Howard B. Demuth, Mark H. Beale, Orlando De Jess, and Martin T. Hagan. 2014.

Neural Network Design (2nd ed.). Martin Hagan, USA. http://hagan.okstate.edu/NNDesign.pdf[5] Agoston E Eiben, James E Smith, et al. 2015.

Introduction to EvolutionaryComputing (second ed.). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44874-8[6] Matthew Garratt and Sreenatha Anavatti. 2012. Non-linear Control of Heavefor an Unmanned Helicopter Using a Neural Network.

Journal of Intelli-gent & Robotic Systems

66, 4 (01 Jun 2012), 495–504. https://doi.org/10.1007/s10846-011-9634-9[7] Wulfram Gerstner and Werner Kistler. 2002.

Spiking Neuron Models: An Intro-duction

The organization of behavior: A neuropsycholog-ical theory . John Wiley & Sons. http://pubman.mpdl.mpg.de/pubman/item/escidoc:2346268/component/escidoc:2346267/Hebb_1949_The_Organization_of_Behavior.pdf[9] Nathan V. Hoﬀer, Calvin Coopmans, Austin M. Jensen, and YangQuan Chen.2014. A Survey and Categorization of Small Low-Cost Unmanned Aerial VehicleSystem Identiﬁcation.

Journal of Intelligent & Robotic Systems

74, 1 (01 Apr 2014),129–145. https://doi.org/10.1007/s10846-013-9931-6[10] G. Howard, E. Gale, L. Bull, B. de Lacy Costello, and A. Adamatzky. 2012. Evolu-tion of Plastic Learning in Spiking Networks via Memristive Connections.

IEEETransactions on Evolutionary Computation

16, 5 (Oct 2012), 711–729. https://doi.org/10.1109/TEVC.2011.2170199[11] E. M. Izhikevich. 2003. Simple model of spiking neurons.

IEEE Transactions onNeural Networks

14, 6 (Nov 2003), 1569–1572. https://doi.org/10.1109/TNN.2003.820440[12] Eugene M. Izhikevich and Niraj S. Desai. 2003. Relating STDP toBCM.

Neural Computation

15, 7 (2003), 1511–1523. https://doi.org/10.1162/089976603321891783[13] Eric R. Kandel and Robert D. Hawkins. 1992. The Biological Basis of Learningand Individuality.

Scientiﬁc American

Journal of Field Robotics

29, 2 (2012), 315–378. https://doi.org/10.1002/rob.20414[15] Wolfgang Maass. 1997. Networks of spiking neurons: The third generation ofneural network models.

Neural Networks

10, 9 (1997), 1659–1671. https://doi.org/10.1016/S0893-6080(97)00011-7[16] Zbigniew Michalewicz and Marc Schoenauer. 1996. Evolutionary Algorithmsfor Constrained Parameter Optimization Problems.

Evolutionary Computation

Autonomous Inverted Helicopter Flight viaReinforcement Learning . Springer Berlin Heidelberg, Berlin, Heidelberg, 363–372.https://doi.org/10.1007/11552246_35[18] Andrew Y. Ng, H. J. Kim, Michael I. Jordan, and Shankar Sastry. 2004.Autonomous Helicopter Flight via Reinforcement Learning. In

Advancesin Neural Information Processing Systems 16 , S. Thrun, L. K. Saul, andP. B. Schölkopf (Eds.). MIT Press, 799–806. http://papers.nips.cc/paper/2455-autonomous-helicopter-ﬂight-via-reinforcement-learning.pdf[19] P. Pounds, R. Mahony, and P. Corke. 2010. Modelling and control of a largequadrotor robot.

Control Engineering Practice

18, 7 (2010), 691 – 699. https://doi.org/10.1016/j.conengprac.2010.02.008 Special Issue on Aerial Robotics.[20] H. Qiu, M. Garratt, D. Howard, and S. Anavatti. 2018. Evolving Spiking NeuralNetworks for Nonlinear Control Problems. In . 1367–1373. https://doi.org/10.1109/SSCI.2018.8628848[21] F. Santoso, M. A. Garratt, and S. G. Anavatti. 2017. A self-learning TS-fuzzysystem based on the C-means clustering technique for controlling the alti-tude of a hexacopter unmanned aerial vehicle. In . 46–51. https://doi.org/10.1109/ICAMIMIA.2017.8387555[22] A. Soltoggio, P. Durr, C. Mattiussi, and D. Floreano. 2007. Evolving neuro-modulatory topologies for reinforcement learning-like problems. In . 2471–2478. https://doi.org/10.1109/CEC.2007.4424781[23] Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural NetworksThrough Augmenting Topologies.

Evolutionary Computation

10, 2 (2002), 99–127. http://nn.cs.utexas.edu/?stanley:ec02[24] Paul Tonelli and Jean-Baptiste Mouret. 2011. On the Relationships betweenSynaptic Plasticity and Generative Systems. In

Proceedings of the 13th AnnualConference on Genetic and Evolutionary Computation (GECCO ’11) . Associationfor Computing Machinery, New York, NY, USA, 1531–1538. https://doi.org/10.1145/2001576.2001782[25] Joseba Urzelai and Dario Floreano. 2001. Evolution of Adaptive Synapses: Robotswith Fast Adaptive Behavior in New Environments.

Evolutionary Computation

9, 4 (2001), 495–524. https://doi.org/10.1162/10636560152642887[26] Madhavun Candadai Vasu and Eduardo J. Izquierdo. 2017. Evolution and Anal-ysis of Embodied Spiking Neural Networks Reveals Task-speciﬁc Clusters ofEﬀective Networks. In