Parameter Optimization and Learning in a Spiking Neural Network for UAV Obstacle Avoidance targeting Neuromorphic Processors
Llewyn Salt, David Howard, Giacomo Indiveri, Yulia Sandamirskaya
JJOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Parameter Optimization and Learning in a SpikingNeural Network for UAV Obstacle Avoidancetargeting Neuromorphic Processors
Llewyn Salt, David Howard,
Member, IEEE,
Giacomo Indiveri,
Senior Member, IEEE,
Yulia Sandamirskaya,
Member, IEEE,
Abstract —The Lobula Giant Movement Detector (LGMD) isan identified neuron of the locust that detects looming objectsand triggers the insect’s escape responses. Understanding theneural principles and network structure that lead to these fastand robust responses can facilitate the design of efficient obstacleavoidance strategies for robotic applications. Here we presenta neuromorphic spiking neural network model of the LGMDdriven by the output of a neuromorphic Dynamic Vision Sensor(DVS), which incorporates spiking frequency adaptation andsynaptic plasticity mechanisms, and which can be mapped ontoexisting neuromorphic processor chips. However, as the modelhas a wide range of parameters, and the mixed signal analogue-digital circuits used to implement the model are affected byvariability and noise, it is necessary to optimise the parametersto produce robust and reliable responses. Here we propose touse Differential Evolution (DE) and Bayesian Optimisation (BO)techniques to optimise the parameter space and investigate theuse of Self-Adaptive Differential Evolution (SADE) to amelioratethe difficulties of finding appropriate input parameters for the DEtechnique. We quantify the performance of the methods proposedwith a comprehensive comparison of different optimisers appliedto the model, and demonstrate the validity of the approachproposed using recordings made from a DVS sensor mountedon a UAV.
Index Terms —Differential Evolution, Bayesian Optimisation,Self-adaptation, STDP, Neuromorphic Engineering
I. I
NTRODUCTION
State-of-the-art robotic unmanned aerial vehicle (UAV)systems are achieving impressive results for compact andagile flight and manoeuvring [1]. However these systems aretypically less power efficient and robust than their naturalcounterparts (e.g., bees are capable of robust flight, obstacleavoidance, and cognitive capabilities with a neural processingtechnology that consumes approximately 10 µ W of power, andthat occupies a volume of less than 1 mm ). Using nature asinspiration, neuromorphic engineers have attempted to bridgethe power-consumption gap through hardware solutions [2].Recently, a range of different neuromorphic processors hasbeen proposed to allow for the hardware implementation ofspiking neural networks (SNNs) [3]–[8]. These mixed-signalanalog/digital chips are ultra low power (on the order of mW) L. Salt is with the School of Information Technology and ElectricalEngineering, University of Queensland, Queensland, Australia.D. Howard is with the Robotics and Autonomous Systems Group in theCyberphysical System Program, CSIRO, Queensland, Australia.G. Indiveri and Y. Sandamirskaya are with the Institute of Neuroinformatics,University of Zurich and ETH Zurich, Zurich, Switzerland. and provide an attractive alternative to current digital hardwareused in mobile applications such as robotics.Another successful neuromorphic example is the recentdevelopment of silicon retinas and event-based sensors such asthe Dynamic Vision Sensor (DVS) [9], [10]. The DVS operatesdifferently compared to conventional video cameras: insteadof integrating light in a pixel array for a period of time andthen converting it to an image, it detects local changes inluminance at each pixel and transmits these change eventsasynchronously as they are detected and with microsecondlatency [11]. Compared to standard frame-based cameras, aDVS offers (i) faster response times through asynchronoustransmission, (ii) much lower bandwidth, and (iii) no motionblur [12].A system with sensors and image processing in-situ onthe UAV is an essential step for autonomous UAV systems.Typically, high-speed agile manoeuvres such as juggling,pole acrobatics, or flying through thrown hoops, use exter-nal motion sensors and high powered CPUs to control theUAVs [13]–[15]; a combination of DVS and Neuromorphicspiking networks provides low-power and high response ratestogether with the potential for adaptation from the SNNs, andas such are promising technologies for autonomous UAVs.A model that has shown promise for fast and robust collisionavoidance in UAV robotics applications is the locust LobulaGiant Motion Detector (LGMD) [16]–[21]. The LGMD in alocust is capable of responding to an object looming at speedsranging from . m / s to m / s [22]; our model was testedon stimuli that loomed at rates of 266 pixels per second to1478 pixels per second. The locust uses the LGMD to escapefrom predators by detecting whether a stimulus is looming(increasing in size in the field of view) or not [23]. Thisneuronal looming detection mechanism is robust to translation,which is why it is an ideal candidate for obstacle avoidance.Previous implementations of this model used frame basedcameras and simplified neural models for embedded roboticapplications [23]–[25]. Salt et al. [26] modified the LGMDmodel from [16] first mentioned in [27] to use AdaptiveExponential Integrate and Fire (AEIF) neuron equations [28]which model faithfully the behavior of silicon neurons presentin hardware neuromorphic processors [3]. The LGMD NeuralNetwork (LGMDNN) was, in particular, modified to make itcompatible with the Reconfigurable On-Line Learning Spiking(ROLLS) neuromorphic processor [29]. In this previous workwe have presented a proof of concept demonstration that a a r X i v : . [ c s . N E ] O c t JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
LGMDNN network can be used for obstacles avoidance on aUAV. Coupling the LGMDNN with the AEIF neural equationsyields 11 user-defined parameters after making simplifyingassumptions based on the constraints of the neuromorphicprocessor. Optimising this parameter space is challenging asit contains complex inter-dependencies. Moreover, these pa-rameters are shared by neurons and synapses in different partsof the structured neuronal network and thus have influenceon the overall performance of the network that does not lenditself to any simple description of the role of each parameter.Here we demonstrate the method by which the appropriateparameters were found in [26] and show the extension ofthis work to incorporate synaptic plasticity and neural spike-frequency adaptation mechanisms.Identifying acceptable parameter sets for robust functionaloperation of this model is the focus of this work. Due tothe computational resources and time requirements involved(approximately 30 seconds to 4 minutes per evaluation), abrute force exhaustive search is unfeasible. Using granularityof 1 for parameters with an upper bound greater than 100,0.1 for an upper bound greater than 2 but less than 100,0.01 otherwise, and the optimistic estimate of 30 seconds foreach evaluation on an eight core computer yields an expected . × years to evaluate. We set the goal to investigate theuse of efficient stochastic optimisation algorithms. DifferentialEvolution (DE) [30] is particularly suited to our application.DE is a simple and efficient stochastic vector-based real-valueparameter optimisation algorithm with performance (at least)comparable to other leading optimisation algorithms [31], [32].DE has only two user-defined rates [30], [33], [34], howevertheir optimal values are problem specific and can drasticallyaffect algorithmic performance [35]. This has prompted re-search into Self-Adaptation (SA), which allows the rates tovary autonomously in a context-sensitive manner throughoutan optimisation run. Self-Adaptive DE (SADE) has beenshown to perform at least as well as DE on benchmarkingproblems [35], [36]. Importantly, SA has been shown toreduce the number of evaluations required per optimisationin resource-constrained scenarios with protracted evaluationtimes [37], compared to non-adaptive solutions. Additionally,we implemented Bayesian optimisation (BO) which modelsthe optimisation space as a Gaussian process and uses a utilityfunction to determine which points to select for optimisation[38]. We compare these optimisers to a uniform random searchas a baseline as it has been shown to outperform grid searchin some problems [39].Spiking networks are particularly amenable to a form ofunsupervised learning called Spike-Timing Dependent Plas-ticity (STDP) [40], which allows synaptic weights to changeautonomously in response to environmental inputs. STDP hasbeen shown to provide faster responses compared to non-plastic networks in dynamic environments [41], which mo-tivates our investigations into its use in our LGMD networks.The combination of Evolutionary optimisation with learn-ing, e.g., the Baldwin Effect, is known to be beneficialin artificial and natural systems [42]. In our case off-lineoptimization (DE or BO) sets up network parametrizations forSTDP (learning) to exploit on-line. Specific use of STDP with meta-optimization is a promisingand entirely new area for LGMD networks, which motivatesour work in this area. Our hypothesis is that these adaptivitymechanisms are beneficial to the performance of the LGMDnetwork. To test this hypothesis, we evaluate the performanceof our optimizers (DE, SADE, and BO, with and withoutSTDP) when optimising looming responses in LGMD net-works which are stimulated by (i) simple and (ii) complexDVS recordings on the UAV. Our finding is, however, that theSTDP can be both beneficial and destructive for performanceof the network.The original contributions of this work are (i) the de-velopment of an objective function that accurately describesthe desired LGMD behaviour, (ii) a comprehensive statisticalcomparisons of three leading algorithms in optimising LGMDnetworks, and (iii) the first optimisation-based study on theeffect of STDP and spike-frequency adaptation in LGMDnetworks. II. T HE M ODEL
This section describes the background for the model set-up,and the specific equations that were used in the experiment.
A. LGMD
We implement the model as described by Salt et al. [26].The LGMD model consists of a photoreceptor layer (P),a summing layer (S), an intermediate photoreceptor layer(IP), an intermediate summing layer (IS), and an LGMDneuron layer. The intermediate layers can be seen as anal-ogous to sum-pooling layers in deep convolutional neuralnetworks [43]–[45]. These layers are modelled as populationsof AEIF neurons, connected by excitatory (E), slow inhibitory(SI), and fast inhibitory (FI) connections. Fig. 1 shows thetopology of the network.The P to IP to LGMD connection inhibits the spikingresponse of the LGMD to translational motion across the fieldof vision, and the inhibitory connections (SI and FI) fromthe photoreceptor to the summing layer inhibit the outputneuron from spiking in response to non-looming stimuli. Theweights of the inhibitory connections are assigned based ontheir distance from the central excitatory neuron similarly tothat described in [16]. This connection configuration spans theP layer like a kernel.The intermediate layers were added to make the model com-patible with the dynap-se neuromorphic processor describedin [4], [6]. However, Salt et al. [26] found that the addition ofthe intermediate (sum-pooling) layer before the LGMD neuronalso increased the performance of the network on all but slowcircular stimuli.
1) Adaptive Exponential Integrate and Fire Spiking Net-works:
We use Adaptive Exponential Integrate and Fire(AEIF) model neurons in the network; the respective neuronequations follow (1) and (2): dVdt = − g L ( V − E L ) + g L ∆ T exp( V − V T ∆ T ) + IC , (1)
HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 3
Fig. 1: The neuromorphic LGMD model, which consists of aphotoreceptor layer (P), a summing layer (S), an intermediatephotoreceptor layer (IP), an intermediate summing layer (IS),and an LGMD neuron layer. Edges show connections that areeither Excitatory (E), Slow Inhibitory (SI), or Fast Inhibitory(FI). I = I e − I iA − I iB − I adapt , (2)where C is the membrane capacitance, g L is the leak con-ductance, E L is the leak reversal potential, V T is the spikethreshold, ∆ T is the slope factor, V is the membrane potential, I e is an excitatory current, I adapt is the adaptation current,and I iA and I iB describe fast and slow inhibitory current,respectively [28]. When a spike is detected ( V > V T ) thevoltage resets ( V = V r ), and the post-synaptic neuron receivesa current injection from the pre-neuron firing given by: I e/i,l = I e/i,l + q e/i,l , (3) I adapt = I adapt + b, (4)where the subscript l corresponds to the post-synaptic layer, q e/i,l is the injected current, b is the spike-triggered contri-bution to adaptation, and the subscript e/i refers to eitherexcitation or inhibition. To simplify the model for embeddedimplementation, inhibitory currents were set as a ratio of theexcitatory current: q i,l = inh l · q e,l , (5)where inh l is a constant parameter. This equation holds forboth types of inhibition (slow and fast). The decay of the excitatory or inhibitory currents is described by: dI e/i dt = − I e/i τ e/i , (6)where I e/i is the current and τ e/i is the time constant for thedecay. The subscript e/i refers to either excitation or inhibitionrespectively. Finally, the decay of the adaptation current isdescribed by: dI adapt dt = a ( V − E L ) − I adapt τ adapt , (7)where a is the sub-threshold adaptation and τ adapt is the timeconstant for the decay.Section IV-B explains how these were implemented and thebounds of all of the values. B. Spike Timing Dependent Plasticity
Spike Timing Dependent Plasticity (STDP) is a realisationof Hebbian learning based on the temporal correlations be-tween pre- and post-synaptic spikes. This synaptic plasticityis thought to be fundamental to adaptation, learning, andinformation storage in the brain [46], [47].Arrival of a pre-synaptic spike closely before a post-synapticspike increases the efficacy of the synapse, while if a post-synaptic spike is received in close proximity to and before apre-synaptic spike, the efficacy of the synapse is decreased.Long term potentiating (LTP, synaptic weight increase) of thesynapse occurs in the former case, long term depression (LTD,synaptic weight decrease) occurs in the latter case. Fig. 2shows a typical dependence of the synaptic weight change onthe difference in arrival times of the post- and pre- synapticspikes.Fig. 2: The impact of STDP on the synaptic weights. If the pre-synaptic spike arrives before the post synaptic spike, then thestrength of the weights is increased. If the post synaptic spikearrives first, then the strength of the synapse is weakened.STDP modifies the synaptic current injection given in (3) bymultiplying it by a weight w , which is modified according tothe plasticity rule. In particular, if a pre-synaptic spike occurs,then: I e/i,l = I e/i,l + wq e/i,l , (8) A pre = A pre + ∆ pre, (9) w = w + A pre . (10) JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
If a post-synaptic spike occurs then: A post = A post + ∆ post , (11) w = w − A post . (12) A pre | post are the amount by which the weight w is strength-ened or weakened, and ∆ pre | post is a user-defined value forincreasing A pre | post each time a spike occurs. At each spikeevent, the variables A pre | post decay: dA pre | post dt = − A pre | post τ pre | post . (13)This learning rule leads to potentiation of synapses that aresupported by temporal sequence of pre- and post-synapticspikes and depression of synapses that connect neurons thatfire in a reverse order. In other words, the connection strengthsvary depending on the activity of the neurons they are con-nected to. III. O PTIMISATION T ECHNIQUES
In this Section, we describe the three optimisation tech-niques that we compare: DE, SADE, and BO, and how theyare applied to optimising the LGMDNN parameter space. Eachindividual, x i , is a parametrisation of the LGMDNN, givenby: x i = [ τ e , τ iA , τ iB , q eP , q eS , q eIP , q eIS , q eL , inhA S , inhB S , inhA L , [ a , b , τ w adapt ], ( τ pre , τ post , ∆ pre , ∆ post )].The bounds for each element of x i can be found in Table I inSection IV-B, as well as a brief explanation of the meaning ofeach parameter. A. Differential Evolution
DE is an efficient and high performing optimiser for real-valued parameters [30], [33]. As it is based on evolutionarycomputing, it performs well on multi-modal, discontinuousoptimisation landscapes. Storn and Price [30] showed thattheir original DE outperformed several other stochastic op-timization techniques in benchmarking tests whilst requiringthe setting of only two parameters, crossover probability CR and differential weight F . It also requires a mutation function,which determines how individuals in the population are mixed.Many variants of the mutation function have been suggested,these follow the naming convention DE/x/y/z . Here we useDE rand/1/bin where x denotes the vector to be mutated (inthis case a random vector), y denotes the number of vectorsused, and z denotes the crossover method (bin corresponds tobinomial).The initial population, X = { x , , x , , ..., x NP, } , where N P is the size of the population and x i, ∈ R D is anindividual that contains the D parameters to be optimised,is generated from random samples drawn from a uniformprobability distribution of the parameter space, bounded tothe range of the respective variable. These bounds are shownin Section IV-B3. The fitness of each vector in the popula-tion is calculated by the objective function, as described inSection IV-A. In each generation, each parent generates one offspring byway of a ‘donor’ vector, created following Eq. (14): v i,G +1 = x r ,G + F · ( x r ,G − x r ,G ) , (14)where r (cid:54) = r (cid:54) = r (cid:54) = i ∈ [1 , N P ] index randomunique population members, the subscript G indicates thecurrent generation, and differential weight F ∈ [0 , de-termines the magnitude of the mutation. The final offspringis generated by probabilistically merging elements of theparent with elements of the donor vector. The new vector u i,G +1 = ( u i,G +1 , . . . , u Di,G +1 ) is found by: u ji,G +1 = (cid:40) v ji,G +1 , if rand ( j ) ≤ CR or j = R,x ji,G , otherwise , (15)where j ∈ (1 , . . . , D ) , CR ∈ [0 , is the crossover rate, rand ( j ) ∈ [0 , is a uniform random number generator, and R ∈ (1 , . . . , D ) is a randomly chosen index to ensure thatat least one parameter changes. The value of offspring withindex i is then calculated as: x i,G +1 = (cid:40) u i,G +1 , if f ( u ji,G +1 ) > f ( x i,G ) ,x i,G , otherwise , (16)where f ( · ) is the fitness function. Once all offspring aregenerated, they are evaluated on the fitness function, andselected into the next generation if they score better than theirparent. Otherwise, the parent remains in the population. B. Self-Adaptive DE
Brest et al. [35] present the first widely-used self-adaptiverate-varying DE, which is expanded by Qin et al., to allowthe mutation scheme to be selected (from four predeterminedschemes) alongside the rates [36], based on previously suc-cessful settings. Different rates/schemes are shown to workbetter on different problems, or in different stages of a singleoptimisation run. The strategy for a given candidate is selectedbased on a probability distribution determined by the successrate of a given strategy over a learning period LP . A strategy isconsidered successful when it improves the individual’s value.In the interest of brevity, we refer the interested reader to [36]for a full algorithmic description.Rates are adapted as follows. Before G > LP (where G is number of generations, and LP is the number of gener-ations needed before the learned CR values are used), CRis calculated by randomly selecting a number from a normaldistribution, N (0 . , . , with a mean of 0.5 and a standarddeviation of 0.3. Afterwards it is calculated by a randomnumber from N ( CR mk , . where CR mk is the median valueof the successful CR values for each strategy K . F is simplyselected from a normal distribution N (0 . , . , which willcause it fall on the interval [ − . , . with a probability of0.997 [36]. C. Bayesian Optimisation
Bayesian optimisation (BO), e.g. [38], is a probabilisticoptimisation process that typically requires relatively few
HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 5 evaluations [48]–[50], although the evaluations themselves arecomputationally expensive. When parallelised, BO is shown tolocate hyper-parameters within set error bounds significantlyfaster than other state-of-the-art methods on four challengingML problems [51], in one case displaying 3% improvedperformance over state-of-the-art expert results. As such, BOcan be considered a competitive optimiser to which we cancompare DE and SADE.BO assumes the network hyper-parameters are sampledfrom a Gaussian process (GP), and updates a prior distributionof the parameterisation based on observations. For LGMDNN,observations are the measure of generalization performanceunder different settings of the hyper-parameters we wish tooptimise. BO exploits the prior model to decide the next setof hyper-parameters to sample.BO comprises three parts: (i) a prior distribution, (ii) anacquisition function, and (iii) a covariance function.
1) Prior:
We use a Gaussian Process (GP) prior, as itis particularly suited to optimisation tasks [48]. A GP isa distribution over functions specified by its mean, m , andcovariance, k , which are updated as hyper-parameter sets areevaluated. The GP returns m and k in place of the standardfunction f : f ( x ) ∼ GP ( m ( x ) , k ( x, x (cid:48) )) . (17)
2) Covariance Function:
The covariance function deter-mines the distribution of samples drawn from the GP [38],[51]. Following [51], we select the 5/2 ARD Mat´ern kernel(18), where θ is the covariance amplitude. k m ( x i , x j ) = ηexp ( − (cid:113) r ( x i , x j )) , (18)where: η = θ (1 + (cid:113) r ( x i , x j ) + 53 r ( x i , x j )) , (19)where: r ( x i , x j ) = x i − x j θ . (20)
3) Acquisition Function:
An acquisition function is a func-tion that selects which point in the optimisation space toevaluate next. We evaluate the three acquisition functions,which select the hyper-parameters for the next experiment:Probability of Improvement (PI), Expected Improvement (EI)[48], and Upper Confidence Bound (UCB) [52] — see [38]for full implementation details. µ ( · ) and σ ( · ) refer to the meanand standard deviation functions.Briefly, the PI can be calculated, given our current maxi-mum observation of the GP, x + , by: P I ( x ) = P ( f ( x ) ≥ f ( x + ) + ζ )= Φ( µ ( x ) − f ( x + ) − ζσ ( x ) ) . (21)Where, Φ( . ) is the normal cumulative distribution function.Here, ζ ≥ is a user-defined trade-off parameter that balancesexploration and exploitation [53].Similarly, EI is evaluated by [54]: EI ( x ) = (cid:40) ei + σ ( x ) φ ( Z ) , if σ ( x ) > , , otherwise ; (22) ei = ( µ ( x ) − f ( x + ) − ζ )Φ( Z ); (23) Z = (cid:40) µ − f ( x + ) − ζσ ( x ) , if σ ( x ) > , , otherwise , where φ and Φ correspond to the probability and cumulativedistribution functions of the normal distribution, respectively. UCB maximises the upper confidence bound:
U CB ( x ) = µ ( x ) + κσ ( x ) , (24)where κ ≥ balances exploration and exploitation [51], andis calculated per evaluation as: κ = √ ντ t , (25)where ν is the user tunable variable and: τ t = 2 log( t d +2 π δ ) . (26) δ ∈ { , } , d is the number of dimensions in the function and t is the iteration number.IV. T EST P ROBLEM
This section outlines the rationale of the objective function,the experimental set-up, and assumptions. It is important tonote that the motivation behind the model simplifications andobjective function is for the work to be directly transferable tothe neuromorphic processors described in [29] once they arereadily available.
A. Objective Function
Initially the optimisation function was formulated as: F init ( λ ) = Acc − || V || , (27)where Acc is the accuracy given by Eq. (31), || V || is theregularisation term, in particular the l2-norm of the voltagesignal, used to regularise the voltage signal, and λ is acandidate solution. However, this objective function resulted inall optimisers producing 50% accuracy with looming detectedat any time in the experiments.To improve the accuracy, the function to optimise wasformulated as a weighted multi-objective function [55]. Theobjective function has three distinct parts: accuracy (Eq. (31)),sum squared error of the membrane voltage signal (Eq. (36)),and the reward of the spiking trace (Eq. (34)). The accuracyalone could not be used as there were only eight discreteevents in the input stimuli during the optimisation phase whichwas not granular enough for optimisation. Eq. (36) acts as aregularising term to prevent the voltage trace from becomingtoo large. Eq. (34) is used to rate the spiking behaviour withmore granularity than is possible with accuracy. CombiningEq. (36) and (34) resulted in spiking behaviour with a realisticvoltage trace. The accuracy was then used to account for sub-optimal regions of (34) that still resulted in a high score. JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
This resulted in the final formulation of the objective functionmodified by accuracy, F Acc ( λ ) , which is calculated by: F Acc ( λ ) = · F ( λ ) , if F ( λ ) > and Acc = 1 ,Acc · F ( λ ) , if F ( λ ) < , , if Acc = 1 and F ( λ ) < ,F ( λ ) , otherwise . (28)Here, Acc is the accuracy of the LGMDNN output and F ( λ ) is the fitness function. The LGMD network is said to havedetected a looming stimulus if the output neuron’s spike rateexceeds a threshold SL . This can be formalised by: Looming = (cid:40) True , if SR > SL,
False , otherwise , (29)where SR can be calculated by: SR = t +∆ T (cid:88) i = t S i , (30)where ∆ T is the time over which the rate is calculated and S i is whether or not there is a spike at time i ; a spike is definedto occur if at time i the membrane potential exceeds V T ( V T has the same meaning as in Eq. (1)).The looming outputs are categorised into true positives( T P ), false positives (
F P ), true negatives (
T N ), and falsenegatives (
F N ). Output accuracy is then:
Acc = T P + T NT P + T N + F P + F N . (31)The fitness function without accuracy, F ( λ ) , can be calcu-lated by: F ( λ ) = Score ( λ ) + SSEOS ( λ )2 , (32)where Score is a scoring function based on the timing ofspiking outputs and
SSEOS is the sum squared error of theoutput signal.The score is calculated by difference of the penalties’ andreward functions’ sums over the simulation:
Score ( λ ) = N (cid:88) i =1 R i − N (cid:88) i =1 P i . (33)The reward at a given time can be calculated by: R ( t ) = (cid:40) k exp( t ∆ t ) + 1 , if looming and spike , , otherwise . (34)The punishment can be calculated by: P ( t ) = ( l − c ) t ∆ t + c, if not looming andspike and t < ∆ t ;( l − c ) − ( t − ∆ t ) ∆ t + c, if not loomingand spike and t > ∆ t ;0 , otherwise . (35)In these equations, t and ∆ t remain consistent with the otherobjective functions and k , l , and c are all adjustable constants to change the level of punishment or reward.To calculate SSEOS ( λ ) , the signal was first processed sothat every spike had the same value. This was done so that theideal voltage and the actual voltage would match in loomingregions, as the voltage can vary for a given spike. Ultimately,the only criterion is that the voltage has crossed the spikingthreshold. In the non-looming region the ideal signal was takento be the resting potential, which was negative for the AEIFmodel equation. The signal error was calculated at every timestep as: SSEOS ( λ ) = − N (cid:88) i =1 ( V iactual − V iideal ) . (36) V actual could be obtained directly from the state monitorobject of the LGMD output neuron in the SNN simulator(Brian2). N in this case is the length of the simulation and i indicates each recorded data point at each time step of thesimulation. V ideal was given by: V ideal = (cid:40) V spk , if looming ,V r , otherwise , (37)where V spk is the normalised value given to each spike and V r is the resting potential.Overall, this gives an objective function that takes intoaccount the expected spiking behaviour, whilst penalisingthe system for deviating from plausible voltage values andrewarding it for accurately categorising looming and non-looming stimuli. The voltage signal was kept to realisticbounds as the model was designed to target neuromorphicprocessors such as the ROLLS chip [4]. B. Experimental Set-up
The model was set-up using the Python Brian2 spikingneural network simulator [56].
1) Data Collection:
Data was collected using a DVS in-situ mounted on a quadrotor UAV (QUAV). Two types ofdata were collected: simple and real world. The simple datawas synthesised using PyGame to generate black shapes on awhite background that increased in area in the field of viewof the DVS. This included: a fast and slow circle, a fast andslow square, and a circle that loomed then translated whileincreasing in speed (composite). The laptop playing the stimuliwas placed in front of the hovering QUAV and the stimuli wererecorded. This was done to maintain any noise that might begenerated by the propellers of the QUAV.To challenge the model, real stimuli were also recorded:a white ball on a black slope was rolled towards the DVSfrom 3 different directions; a cup was suspended in the air infront of the hovering QUAV and moving towards and awayfrom the QUAV; and a hand was moved towards and awayfrom the DVS on the hovering QUAV. These are increasing incomplexity in terms of the shapes that are presented.Four looming and non-looming events ( ∼ s) from thecomposite stimulus were used to optimise the model and thenthe optimised model was evaluated on the other stimuli. Thestimuli were chosen to show that the model generated is bothshape and speed invariant. HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 7
2) Experimental Constants: ∆ T from Eq. (30) was set tobe ms. A loom was said to be detected if SR from Eq. (30)exceeded 13. This had to occur before the last 10% of thelooming sequence to allow enough time for the UAV to react.The clock in Brian was set to have . ms granularity. Thismeant that the model could react after . ms if the loom wasintense, at most it would take ms.
3) Hyper-parameter constraints:
The hyper-parameterswere all continuous and could range from zero to infinity.There were many regions of the parameter space that were notcomputable even when using a cluster with 368GB of RAM.To mitigate some of the computational difficulties, Bayesianoptimisation using the expected improvement utility function(BO-EI) was used over 20 eight hour runs to find feasibleregions of the hyper-parameter space allowing us to constrainthe optimisation space. C , g L , E L , V T , and ∆ T are parameters of the neuronequation, not the model, and were set as constants; perfor-mance was not impacted by setting these values and appro-priately optimising the other parameters [57]: C = 124 . pF , g L = 60 . nS , E L = − . mV , V T = − . mV , and ∆ T = 6 . mV .Table I shows the constraints found for the rest of the hyper-parameters.TABLE I: Parameters of the optimisation space and theirconstraints. Param. Min Max Description τ e Decay time const, exc. τ iA Decay time const, A inh. τ iB Decay time const, B inh. q eP Exc. current inj. to P q eS Exc. current inj. to S q eIP Exc. current inj. to IP q eIS Exc. current inj. to IS q eL Exc. current inj. to L inhA S Inh.A/Exc. ratio for S inhB S Inh.B/Exc. ratio for S inhA L Inh.A/Exc. ratio for L a Sub-thresh. adaptation b
40 141
Spike-contrib. to adapt. τ w adapt Time const, adapt. τ pre Time const of A pre τ post Time const of A post ∆ pre Current+ at pre-syn. spike ∆ post Current+ at post-syn. spike
4) Comparing optimisers:
SADE, DE, BO with EI(BEI),BO with POI (BPOI), and BO with UCB (BUCB) wereevaluated thirty times on the same input stimulus, so that theycould be statistically compared using a Mann-Whitney U test.A random search (RNG) was also ran in the same manner asa benchmark for the algorithms, which has been shown to bea natural baseline with which to compare other optimisationalgorithms as it can outperform grid searches in terms ofresults and number of calculations [39]. The input stimulusincluded a black circle on a white background performing a eight looming and eight non-looming events. The non-loomingevents contained a combination of a shrinking circle and atranslating circle from left to right or right to left. The valuesof the user defined parameters were selected as: • BEI and BPOI: ζ = 0 . ; • BUCB: κ = 2 . ; • DE:
N P = dim , F = 0 . , CR = 0 . ; • SADE: LP = 3 , N P = dim , where dim is the numberof hyper parameters; • RNG: Individuals were selected from a uniform distribu-tion.Note, that we have chosen all user-defined parameters basedon values previously used in the literature.The tests were run using the non-adaptive and non-plasticmodel with the bounds from Table I. They were defined ashaving converged if they had not improved for × N P evaluations. This meant ten generations for the DE algorithmsand the same number of BO or RNG evaluations evaluations.The population size was two more than what is recommendedby [34] for the DE algorithm. This size was chosen as it isrelatively small and time was an issue. The short convergencemeant that the SADE algorithm needed to have a short LP.The processor time was not included as a metric for this asthe tests were run on three different computers so the resultswould not have been comparable.
5) Comparing Models:
Once the best optimiser was found(a comparison of optimisers can be found in Subsection V-A),the best performing optimiser, SADE, was used to optimisethe following models:
LGMD:
Neuromorphic LGMD; A: LGMD with adaptation; P: LGMD with plasticity;
AP:
LGMD with adaptation and plasticity.The SADE variables were set to: LP = 3 and N P = 10 .The optimisation process was run 10 times and the bestoptimiser from these ten runs was selected. The model wasthen tested on each input case for ten looming to non-loomingor non-looming to looming transitions. The performance ofeach model is reported in Subsection V-C.Plasticity was found to degrade the performance sometimesso we experimented clamping it from 0% to 100% of theoriginal synaptic strength. This allowed the weights to rangefrom zero to double the original values when at 100% STDPto no variation at 0% STDP.V. R
ESULTS AND D ISCUSSION
The results are split into two subsections. First, we will com-pare the optimisers and then we will compare the addition ofadaptation, plasticity, and adaptation and plasticity combinedto the baseline model.The models are evaluated on their accuracy (Acc), sensitiv-ity (Sen), Precision (Pre), and Specificity (Spe). Acc is definedin Subsection IV-A. The other metrics can be found in [58].
A. Optimiser comparison and statistical analysis
Table II shows that the SADE algorithm achieved the bestfitness. A good fitness value is one that is greater than 0, this
JOURNAL OF L A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 means that it is either 100% accurate or exhibits a desirablevoltage trace The optimisers all achieved a negative fitnessvalues due to decreased population size and convergenceconditions to enable us to run enough experiments to beable to perform statistical analysis. Section V-C shows theresults of SADE when run with a less restrictive convergencecondition. The DE algorithm converged on the worst solutionin the least number of objective function evaluations, andachieved the best specificity but the worst fitness, precision,and sensitivity. RNG achieved the highest average accuracy,sensitivity, and precision. BPOI and SADE still had greaterfitness. As the fitness function is negative it is likely that itwas rarely modified by the accuracy as per Eq. (28). All of themodels were able to find locations in the optimisation spacewith 100% accuracy/sensitivity given enough time, however asevaluations are time-expensive, not all these algorithms maybe viable and a combination of performance metrics shouldbe considered.TABLE II: Optimisation algorithm metrics.
Meth Fit Eva Acc Sen Pre SpeDE -3779.96 -2653.45 761.29 0.65 0.39 0.70 0.90
SADE -1749.83
RNG -2328.15 754.80
BUCB -2433.38 832.40 0.60 0.43 0.63 0.77
BPOI -1968.19 794.27 0.63 0.40 0.71 0.86Table III shows the statistical significance of the results fromTable II. The method in the comparison column is comparedto each method in the subsequent column. A + indicatesstatistically significant values and a . indicates no statisticalsignificance. Statistical significance was defined as p ≤ . .The Mann-Whitney U test was used to determine statisticalsignificance because it does not require normally distributedsamples.SADE’s fitness was significantly better compared to alloptimisers, but it also performed the most evaluations. Thisdifference is statistically significant. SADE had the secondbest accuracy but this is only significantly different to RNGand DE. Its sensitivity was significantly worse than BUCBand RNG. It had significantly worse precision than RNG, butsignificantly better specificity.BPOI was significantly better than DE, BEI, and BUCB forfitness. It also had significantly less evaluations than SADE.BEI had significantly worse fitness and sensitivity whencompared to BOPI, RNG and SADE. Interestingly, eventhough it had the second best accuracy the difference is onlysignificant between it and BUCB and SADE.DE took significantly fewer evaluations to converge whencompared to all algorithms but also had significantly worsefitness, accuracy, sensitivity, and precision. However it hadsignificantly better specificity than all other algorithms. DEwas statistically different to every algorithm for every metric.BUCB had slightly worse fitness than RNG but it is notstatistically significant. Its accuracy was significantly worsethan BEI but significantly better than DE. TABLE III: Comparison of the statistical significance of theresults. Meth Fit Eva Acc Sen Pre SpeBUCB
DE + + + + + +BEI . . + . . +SADE + + . + . +RNG . + + . + +BPOI + . . . . + DE BEI + + + + + +SADE + + + + + +RNG + + + + + +BUCB + + + + + +BPOI + + + + + +
BEI
DE + + + + + +SADE + + . . . .RNG + . . + . .BUCB . . + . . +BPOI + . . . . .
SADE
DE + + + + + +BEI + + . . . .RNG + + + + + .BUCB + + . + . +BPOI + + . . . .
BPOI
DE + + + + + +BEI + . . . . .SADE + + . . . .RNG . . . + . .BUCB + . . . . +
RNG
DE + + + + + +BEI + . . + . .SADE + + + + + .BUCB . + + . + +BPOI . . . + . .A possible reason that DE underperformed is that the F val-ues provided in [34] are not appropriate for this problem. Thepopulation size may have also been too small, as populationswere truncated to one third of the size recommended in [34].A smaller population was used for a fair comparison to theBayesian optimisation algorithms which have a higher com-putational overhead than DE and SADE. SADE outperformedDE with the same sized population and may have performedbetter given a larger population. Increasing the population sizemeant that BPOI and BUCB were not able to complete onerun in the time it took SADE, DE, and RNG to do thirty.EI was only able to complete 25 of 30 runs with the largerpopulation. The Bayesian optimisers don’t have a populationsize but the stopping condition was based on it.SADE removes the need to find control parameters and hasbeen shown to perform as well or better than DE even when thecontrol parameters are well selected [36]. The generalisabilitythat comes with finding the right control parameters on-the-flyis also appealing.The addition of the various mutation functions to SADEalso seems to help it find better results. This is likely due to HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 9 the desirable properties of each mutation function cancellingout the undesirable properties of other mutation functions.A surprising result was that of the BO algorithms BPOIseemed to perform the best. This contrasts to previous stud-ies [38] that ranked it last compared to BEI and BUCB. BPOItends to focus more on exploitation rather than exploration,choosing regions in the GP with a higher mean rather thanvariance.Fig. 3 also shows that SADE’s CR and F values convergedon small values indicating that it also preferred exploitation toexploration. We postulate that the surface has multiple sharppeaks making it difficult for the optimisers to find good values,but finding one of these peaks and climbing it yields betterresults than being overly exploratory. For example, if BOlanded near a sharp peak using UCB or EI it would never beinclined to evaluate nearby points, because the utility functiontends to prioritise regions of higher uncertainty and uncertaintymay be lower near evaluated points. RNG offers an unbiasedsearch through the space and performs worse, in terms offitness, than the exploitative algorithms but better than theexploratory ones.
B. SADE Averages
The SADE algorithm performed the best, in terms of fitness,out of all of the algorithms. Fig. 3a shows the average F acc of the population over 19 generations. The average F acc converged by five iterations. The maximum F acc starts off at0. This indicates that a 100% accuracy candidate was foundin the initialisation period. The maximum F acc then rises to400 which is not visible as the range of the average score is-50000 to -1500.The F average results in Fig. 3b are quite interesting. Theystart off at 1 as they are selected from U ([0 , and thendrop down to 0.5 as they are selected from U ([0 , afterthe first generation. Once the learning period has finishedall of the F values have converged to less than 0.1. Thisindicates that the F values that are having the most success aresmall and therefore taking advantage of exploration rather thanexploitation. It was unexpected that the algorithm would finda min/max within so few generations. This could be why theauthors select initial F from N (0 . , . with range [-0.4,1.4].Fig. 3c shows how the crossover probability CR for eachfunction changes over time. For the first nine generations,the CR values are selected from U ([0 , and so the meanstays at 0.5. However, as with the F mean values once thelearning period is over, all of the CR values go down to lessthan 0.1. This means that less than 10% of the mutations willgenerally take place. From a set of 11 hyper-parameters thismeans that probabilistically one value will change in additionto the random index that is chosen.The probability of each function being chosen is shownin Fig. 3d. The probabilities are fixed at 0.25 for the first 9generations and then they vary based on their success. It is in-teresting to see that in spite of the F and CR values suggestingthat the algorithm is converging on a solution, the DE/Rand-to-Best/2/Bin algorithm is the least successful. The DE/Curr-to-Rand/1 algorithm performs relatively well until about 12 (a)(b)(c)(d) Fig. 3: Averages F acc ( λ ) , F , CR , and p for the SADE pop-ulation over 19 generations. The dotted vertical line indicatesthat the learning period has ended. Note description of theSADE algorithm in Section III-B. A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
TABLE IV: Parameters used by each model.
Parameter LGMD A P AP τ e ( ms ) τ iA ( ms ) τ iB ( ms ) q eP ( pA ) q eS ( pA ) q eIP ( pA ) q eIS ( pA ) q eL ( pA ) inhA S ( ) inhB S ( ) inhA L ( ) a ( ) - 0.79 - 0.79 b ( ) - 14.51 - 14.51 τ w adapt ( ms ) - 30.00 - 30.00 τ pre ( ms ) - - 1.56 1.56 τ post ( ms ) - - 10.03 10.03 ∆ pre ( ) - - 0.031 0.031 ∆ post ( ) - - 0.027 0.027 c ( ) - - 0.05 0.05generations where it tapers off. The DE/Rand/2/bin algorithmdips initially but then increases as DE/Curr-to-Rand/1 starts todrop off. The DE/Rand/1/bin remains relatively high during theentire algorithm only to be overtaken by The DE/Rand/2/binin the last generation. C. Comparison of models
Table IV shows the selected final parameters of each model.These values were all found by the SADE algorithm, due tothe superior quality of its results. The (1) tag in the parametercolumn indicates that the variable is unit-less.In both models with plasticity, the clamping value c was setto 0.05, or 5%.As expected, all of the models have a τ iA < τ iB whichmeans that the B inhibitions will persist for longer andhave slower dynamics relative to the A inhibitions. Whatis unexpected is that the B inhibitions also have strongercurrent injection than the A inhibitions. On top of this,both of the inhibitory current injections are actually strongerthan the excitatory connections. Whereas the model in [24]with discrete dynamics had relatively low inhibitory currentinjections, with inhA S = 0 . and inhB S = 0 . of theexcitation strength. Clearly, there is a difference between theneuron models that are used, but this is an interesting outcomenonetheless.Table V shows the accuracy, sensitivity, precision, andspecificity for each LGMD model for a given simple stimulus.The stimuli can be described as follows: composite: A standard test bench stimulus that consists of ablack circle on a white background that translates andlooms at increasing speeds. Fig. 4a shows the compositeinput. TABLE V: Quality metrics of the performance of differentLGMD models for different simulated looming stimuli.
Stimulus Model Acc Sen Pre Specomp LGMD A P AP circleSlow LGMD A P AP circleFast LGMD A P AP squareSlow LGMD A P AP squareFast LGMD A P AP circleFast/Slow: A purely looming black circle on whitebackground at high or low speeds. Collected on hoveringQUAV. Fig. 4b shows the circleFast/Slow stimulus. squareFast/Slow:
A purely looming black square on a whitebackground at high/low speeds. Fig. 4c shows the square-Fast/Slow stimulus.The results in Table V show that the models performedwell (
Accuracy ≥ . ) on most of the stimuli. LGMD and A perform poorly on the circleSlow test, missing two out offive of the looming stimuli. P misses one looming stimulus,and AP detects all stimuli accurately. The plasticity increasesthe weights of important connections and the adaptation filtersout over excited neurons.These results show that the models are capable of detect-ing looming stimuli of varying speeds and of differentiatingbetween translation and looming stimuli for the most part. AP scored 100% in every test besides the composite stimuluswhere it misclassified the first short translation as a loom.It is likely that this is due to the network not starting in itsresting/equilibrium state. Inspecting the output trace of LGMD on the composite stimulus it takes the model ˜150 ms to stopspiking after the looming phase has finished.After performing the simulated experiments of computergenerated shapes, real objects moving towards and away fromthe camera were recorded. These stimuli can be described as: ballRoll[1-3]: Three different runs of a white ball rollingtowards the camera on a black platform at different anglesand speeds. This is a purely looming stimulus. Fig. 5ashows one of the three ball rolls. cupQUAV:
A QUAV flying towards a cup suspended infront of it with a white wall behind it. This is a selfstimulus.Fig. 5b shows the QUAV cup stimulus.
HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 11 (a) Filtered Composite Input (P Layer Raster Plot).(b) Filtered circleSlow Input (P Layer Raster Plot).(c) Filtered squareFast Input (P Layer Raster Plot).
Fig. 4: The input layer for the simple stimuli.The whiteand coloured backgrounds indicate non-looming and loomingrespectively.
Hand:
A Hand moving towards and away from the hoveringQUAV. Fig. 5c shows the looming hand stimulus.Fig. 5a, Fig. 5b, and Fig. 5c show that the real stimuli tendto have more noise and do not adhere to a strong patternwhen compared to Fig. 4a, Fig. 4b, and Fig. 4c. Table VI showsthat the models do not perform as well on real world stimuli.ballRoll[1-3] is the simplest real stimulus, and as such P and AP achieved full accuracy. LGMD and A missed one roll.Surprisingly good results come from the cupQUAV stimu-lus: 70% accuracy for all models except for AP , which had80%. It is worth noting that AP performed consistently wellwhen compared with the other models.The real world stimuli tended to contain more activationsdue to the irregularity of the shapes and increased noise. Thepossibility of detecting the hand by stochastically droppingpixel-events, was investigated. Dropping 50% of the DVSevents and re-optimising the network gave 100% accuracyfor the hand and cupQuad stimulus. However, in doing this, (a) Filtered ballRoll2 Input (P Layer Raster Plot).(b) Filtered cupQUAV Input (P Layer Raster Plot).(c) Filtered Hand Input (P Layer Raster Plot). Fig. 5: Complex real stimuli. The white and coloured back-grounds indicate non-looming and looming respectively.TABLE VI: Quality metrics of the performance of differentLGMD models for different real looming stimuli
Stim Model Acc Sen Pre SpeballRoll[1-3] LGMD A P AP cupQUAV LGMD A P AP hand LGMD A P AP A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 the network was no longer robust to the speed changes inthe composite benchmark test. Indeed, even using all of thepixels, the network could be optimised to work on the realworld stimuli, but would no longer be as accurate for inputsthat contained less activations. The inhibition values went upand the gain values went down, meaning the network struggledto spike on stimuli that weren’t noisy or event heavy. Somesort of additional pre-filtering could be useful in getting thelooming network to be fully robust in all situations, such asa sub-sampling filter that allows less than some maximumnumber of input pixels to be active at a given time steplsto be active at a given time step. (a) Effect of changing the c clamping value on the learningweight w for the composite stimulus(b) Effect of changing the c clamping value on the learningweight w for the circleSlow stimulus(c) Effect of changing the c clamping value on the learningweight w for the hand stimulus Fig. 6: The effect of changing the clamping value on variousstimuli
1) The effect of changing c on plasticity: Fig. 6a, Fig. 6b,and Fig. 6c show how changing the bounds of the plasticityclamping changes the LGMD ( P model) accuracy for thecomposite, cicleSlow, and hand stimuli respectively. Interestingly, for the two simulated stimuli increasing theclamping to beyond 25% caused the accuracy to drop to 50%.The sensitivity dropped to 0% indicating that it was no longerdetecting looms and that the synaptic weights were no longercausing the LGMD neuron to fire.Increasing the clamping to 45% increases the accuracyfor both the P and AP models on the hand stimulus. Thisshows that plasticity is a double edged sword that can bothimprove and degrade the performance of the model. In thesimulated stimuli lower clamping, c = 0 . − . , tended toperform better but in the hand stimulus larger, c = 0 . , gavebetter results. This could suggest that too much plasticitycan cause the weights to deviate too far from their goodvalues on well formed stimuli but help to reduce noise in realworld stimuli. Knowledge about the nature of your input canhelp to determine what level of plasticity you require. In allsimulated and real cases except for the hand stimulus, a smallcontribution of plasticity improved the performance.VI. C ONCLUSIONS
We implemented a neuromorphic model of the locustLGMD network using recordings from a UAV equipped witha DVS sensor as inputs. The neuromorphic LGMDNN wascapable of differentiating between looming and non-loomingstimuli. It was capable of detecting the black and white simplestimuli correctly regardless of speed and shape. Real-worldstimuli performed relatively well using the parameters foundby the optimiser for synthesised stimuli. However, when re-optimised, the real-world stimuli performed comparably tothe synthesised stimuli. This was mainly because real-worldstimuli tend to contain a higher number of luminance changesand therefore the magnitude parameters needed to be reduced.We showed that BO, DE, and SADE are capable of findingparameter values that give the desired performance in theLGMDNN model. It can be seen that SADE statistically sig-nificantly outperformed DE on all metrics besides specificityand the number of evaluations, although the only metrics thatformed part of the objective function were fitness and accuracy.Once a suitable objective function was found that accuratelydescribed the desired output of the LGMDNN, BO, DE andSADE outperformed hand-crafted attempts, but a uniformrandom search also performed well. The algorithms were ableto achieve 100% accuracy on black and white simple stimuli ofvarying shapes and speeds. SADE performed well in this taskand we have shown that it is suitable for the optimisation of amulti-layered LGMD spiking neural network. This could savetime when developing biologically plausible SNNs in relatedapplications.We have also studied effect of synaptic plasticity andneuronal spike-frequency adaptation on the performance of theLGMDNN, using the most successful parameter optimisationmethod. Our conclusion is that plasticity plays an importantrole in increasing (and decreasing) performance, depending onhow its parameters are selected.In the future, we plan to apply the optimisation algorithmsdirectly to tuning the neuromorphic processors implementationof the model, with the end goal being a closed loop control
HELL et al. : BARE DEMO OF IEEETRAN.CLS FOR IEEE JOURNALS 13 system on a UAV. Showing that the optimisation approach pro-posed is effective for selecting parameters directly on closed-loop neuromorphic hardware set-ups will greatly increase theirusability. A
CKNOWLEDGMENT
We are grateful to Prof. Claire Rind, who provided valuablecomments and feedback on the definition of the neuromorphicmodel, and acknowledge the CapoCaccia Cognitive Neuro-morphic Engineering workshop, where these discussions andmodel developments took place.We would also like to thank iniLabs for use of the DVSsensor and the Institute of Neuroinformatics (INI), Universityof Zurich and ETH Zurich for its neuromorphic processordevelopments. Part of this work was funded by the EU ERCGrant “NeuroAgents” (724295) and SNSF Ambizione grant(PZOOP2 168183).We would also like to thank Associate Professor MarcusGallagher who read the paper and provided some insightas to why the uniform random walk may outperform otheroptimisers. R
EFERENCES[1] G. Loianno, D. Scaramuzza, and V. Kumar, “Special issue onhighspeed visionbased autonomous navigation of uavs,”
Journalof Field Robotics , vol. 35, no. 1, pp. 3–4. [Online]. Available:https://onlinelibrary.wiley.com/doi/abs/10.1002/rob.21773[2] S. C. Liu and T. Delbruck, “Neuromorphic sensory systems,”
CurrentOpinion in Neurobiology , vol. 20, no. 3, pp. 288–295, 2010.[3] E. Chicca, F. Stefanini, C. Bartolozzi, and G. Indiveri, “Neuromorphicelectronic circuits for building autonomous cognitive systems,”
Proceed-ings of the IEEE , vol. 102, no. 9, pp. 1367–1388, 2014.[4] G. Indiveri, F. Corradi, and N. Qiao, “Neuromorphic architectures forspiking deep neural networks,” in . IEEE, 2015, pp. 4–2.[5] M. Davies, N. Srinivasa, T. H. Lin, G. Chinya, Y. Cao, S. H. Choday,G. Dimou, P. Joshi, N. Imam, S. Jain, Y. Liao, C. K. Lin, A. Lines,R. Liu, D. Mathaikutty, S. McCoy, A. Paul, J. Tse, G. Venkataramanan,Y. H. Weng, A. Wild, Y. Yang, and H. Wang, “Loihi: A neuromorphicmanycore processor with on-chip learning,”
IEEE Micro , vol. 38, no. 1,pp. 82–99, January 2018.[6] S. Moradi, N. Qiao, F. Stefanini, and G. Indiveri, “A scalable multicorearchitecture with heterogeneous memory structures for dynamicneuromorphic asynchronous processors (DYNAPs),”
BiomedicalCircuits and Systems, IEEE Transactions on , pp. 1–17, 2017.[Online]. Available: http://ncs.ethz.ch/pubs/pdf/Moradi etal17.pdf[7] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada,F. Akopyan, B. L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo,I. Vo, S. K. Esser, R. Appuswamy, B. Taba, A. Amir, M. D. Flickner,W. P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network andinterface,”
Science
Proceedings of the IEEE , vol. 102, no. 5, pp. 652–665, May2014.[9] P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128x128 120 dB 15 µ slatency asynchronous temporal contrast vision sensor,” IEEE Journal ofSolid-State Circuits , vol. 43, no. 2, pp. 566–576, Feb 2008.[10] T. Serrano-Gotarredona and B. Linares-Barranco, “A 128 ×
128 1.5%contrast sensitivity 0.9% FPN 3 µ s latency 4 mW asynchronous frame-free dynamic vision sensor using transimpedance preamplifiers,” IEEEJournal of Solid-State Circuits , vol. 48, no. 3, pp. 827–838, 2013.[11] T. Delbruck, “Frame-free dynamic digital vision,” in
Proceedings of Intl.Symp. on Secure-Life Electronics, Advanced Electronics for Quality Lifeand Society , 2008, pp. 21–26. [12] E. Mueggler, B. Huber, and D. Scaramuzza, “Event-based, 6-dof posetracking for high-speed maneuvers,” in . IEEE, 2014, pp. 2761–2768.[13] M. M¨uller, S. Lupashin, and R. D’Andrea, “Quadrocopter ball juggling,”in . IEEE, 2011, pp. 5113–5120.[14] D. Brescianini, M. Hehn, and R. D’Andrea, “Quadrocopter pole acrobat-ics,” in . IEEE, 2013, pp. 3472–3479.[15] D. Mellinger and V. Kumar, “Minimum snap trajectory generation andcontrol for quadrotors,” in
Robotics and Automation (ICRA), 2011 IEEEInternational Conference on . IEEE, 2011, pp. 2520–2525.[16] M. Blanchard, F. C. Rind, and P. F. Verschure, “Collision avoidanceusing a model of the locust LGMD neuron,”
Robotics and AutonomousSystems , vol. 30, no. 1, pp. 17–38, 2000.[17] S. Yue and F. C. Rind, “A collision detection system for a mobile robotinspired by the locust visual system,”
Proceedings - IEEE InternationalConference on Robotics and Automation , vol. 2005, no. April, pp. 3832–3837, 2005.[18] S. Yue, F. C. Rind, M. S. Keil, J. Cuadri, and R. Stafford, “A bio-inspired visual collision detection mechanism for cars: Optimisation ofa model of a locust neuron to a novel environment,”
Neurocomputing ,vol. 69, no. 13-15, pp. 1591–1598, 2006.[19] S. Yue and F. C. Rind, “Collision detection in complex dynamic scenesusing an LGMD-based visual neural network with feature enhancement,”
IEEE Transactions on Neural Networks , vol. 17, no. 3, pp. 705–716,2006.[20] C. Hu, F. Arvin, C. Xiong, and S. Yue, “A Bio-inspired EmbeddedVision System for,”
IEEE Transactions on Autonomous Mental Devel-opment , vol. 9, no. 3, p. 241, 2017.[21] M. Hartbauer, “Simplified bionic solutions: A simple bio-inspired vehi-cle collision detection system,”
Bioinspiration and Biomimetics , vol. 12,no. 2, 2017.[22] F. C. Rind, S. Wernitznig, P. P¨olt, A. Zankel, D. G¨utl, J. Sztarker, andG. Leitinger, “Two identified looming detectors in the locust: ubiquitouslateral connections among their inputs contribute to selective responsesto looming objects,”
Scientific reports , vol. 6, p. 35525, 2016.[23] R. D. Santer, R. Stafford, and F. C. Rind, “Retinally-generated saccadicsuppression of a locust looming-detector neuron: investigations using arobot locust,”
Journal of The Royal Society Interface , vol. 1, no. 1, pp.61–77, 2004.[24] S. Yue, R. D. Santer, Y. Yamawaki, and F. C. Rind, “Reactive directioncontrol for a mobile robot: a locust-like control of escape directionemerges when a bilateral pair of model locust visual neurons areintegrated,”
Autonomous Robots , vol. 28, no. 2, pp. 151–167, 2010.[25] R. Stafford, R. D. Santer, and F. C. Rind, “A bio-inspired visual collisiondetection mechanism for cars: combining insect inspired neurons tocreate a robust system,”
BioSystems , vol. 87, no. 2, pp. 164–171, 2007.[26] L. Salt, G. Indiveri, and S. Yulia, “Obstacle avoidance with LGMD neu-ron : towards a neuromorphic UAV implementation .” in
InternationalSymposium on Circuits and Systems , 2017.[27] F. C. Rind and D. Bramwell, “Neural network based on the inputorganization of an identified neuron signaling impending collision,”
Journal of Neurophysiology , vol. 75, no. 3, pp. 967–985, 1996.[28] R. Brette and W. Gerstner, “Adaptive exponential integrate-and-firemodel as an effective description of neuronal activity,”
Journal ofneurophysiology , vol. 94, no. 5, pp. 3637–3642, 2005.[29] N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sum-islawska, and G. Indiveri, “A reconfigurable on-line learning spikingneuromorphic processor comprising 256 neurons and 128K synapses,”
Frontiers in Neuroscience , vol. 9, pp. 1–17, 2015.[30] R. Storn and K. Price, “Differential evolution–a simple and efficientheuristic for global optimization over continuous spaces,”
Journal ofglobal optimization , vol. 11, no. 4, pp. 341–359, 1997.[31] R. Dong, “Differential evolution versus particle swarm optimization forpid controller design,” in
Natural Computation, 2009. ICNC ’09. FifthInternational Conference on , vol. 3, Aug 2009, pp. 236–240.[32] N. Karaboga and B. Cetinkaya, “Performance comparison of geneticand differential evolution algorithms for digital fir filter design,” in
Advances in Information Systems , ser. Lecture Notes in ComputerScience, T. Yakhno, Ed. Springer Berlin Heidelberg, 2005, vol. 3261,pp. 482–488.[33] S. Das and P. N. Suganthan, “Differential evolution: A survey ofthe state-of-the-art,”
IEEE transactions on evolutionary computation ,vol. 15, no. 1, pp. 4–31, 2011. A TEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 [34] M. E. H. Pedersen, “Good parameters for differential evolution,”
MagnusErik Hvass Pedersen , 2010.[35] J. Brest, S. Greiner, B. Boskovic, M. Mernik, and V. Zumer, “Self-adapting control parameters in differential evolution: a comparativestudy on numerical benchmark problems,”
IEEE transactions on evo-lutionary computation , vol. 10, no. 6, pp. 646–657, 2006.[36] A. K. Qin, V. L. Huang, and P. N. Suganthan, “Differential evolutionalgorithm with strategy adaptation for global numerical optimization,”
IEEE transactions on Evolutionary Computation , vol. 13, no. 2, pp.398–417, 2009.[37] G. D. Howard, “On self-adaptive mutation restarts for evolutionaryrobotics with real rotorcraft,” in
Proceedings of the 17th Annual Con-ference on Genetic and Evolutionary Computation . ACM, 2017, p. Inpress.[38] E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on bayesianoptimization of expensive cost functions, with application to activeuser modeling and hierarchical reinforcement learning,” arXiv preprintarXiv:1012.2599 , 2010.[39] J. Bergstra and Y. Bengio, “Random search for hyper-parameter opti-mization,”
Journal of Machine Learning Research , vol. 13, no. Feb, pp.281–305, 2012.[40] G.-Q. Bi and M.-M. Poo, “Synaptic modifications in cultured hip-pocampal neurons: Dependence on spike timing, synaptic strength, andpostsynaptic cell type.”
J. Neurosc , vol. 77, no. 1, pp. 551–555, 1998.[41] G. Howard, E. Gale, L. Bull, B. de Lacy Costello, and A. Adamatzky,“Evolution of plastic learning in spiking networks via memristiveconnections,”
IEEE Transactions on Evolutionary Computation , vol. 16,no. 5, pp. 711–729, 2012.[42] P. Turney, D. Whitley, and R. W. Anderson, “Evolution, learning, andinstinct: 100 years of the baldwin effect,”
Evolutionary Computation ,vol. 4, no. 3, pp. iv–viii, 1996.[43] L. Liu, C. Shen, and A. van den Hengel, “The treasure beneathconvolutional layers: Cross-convolutional-layer pooling for image clas-sification,” in
Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , 2015, pp. 4749–4757.[44] A. Babenko and V. Lempitsky, “Aggregating local deep features forimage retrieval,” in
Proceedings of the IEEE international conferenceon computer vision , 2015, pp. 1269–1277.[45] B. Fernando, E. Gavves, J. Oramas, A. Ghodrati, and T. Tuytelaars,“Rank pooling for action recognition,”
IEEE transactions on patternanalysis and machine intelligence , vol. 39, no. 4, pp. 773–787, 2017.[46] S. Song, K. D. Miller, and L. F. Abbott, “Competitive hebbian learn-ing through spike-timing-dependent synaptic plasticity,”
Nature neuro-science , vol. 3, no. 9, pp. 919–926, 2000.[47] J. Sj¨ostr¨om and W. Gerstner, “Spike-timing dependent plasticity,”
Spike-timing dependent plasticity , p. 35, 2010.[48] J. Mockus, “Application of bayesian approach to numerical methodsof global and stochastic optimization,”
Journal of Global Optimization ,vol. 4, no. 4, pp. 347–365, 1994.[49] D. R. Jones, “A taxonomy of global optimization methods based onresponse surfaces,”
Journal of global optimization , vol. 21, no. 4, pp.345–383, 2001.[50] M. J. Sasena, “Flexibility and efficiency enhancements for constrainedglobal design optimization with kriging approximations,” Ph.D. disser-tation, General Motors, 2002.[51] J. Snoek, H. Larochelle, and R. P. Adams, “Practical bayesian optimiza-tion of machine learning algorithms,” in
Advances in neural informationprocessing systems , 2012, pp. 2951–2959.[52] N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, “Gaussian processoptimization in the bandit setting: No regret and experimental design,” arXiv preprint arXiv:0912.3995 , 2009.[53] D. J. Lizotte,
Practical bayesian optimization . University of Alberta,2008.[54] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimiza-tion of expensive black-box functions,”
Journal of Global optimization ,vol. 13, no. 4, pp. 455–492, 1998.[55] K. Deb,
Multi-Objective Optimization Using Evolutionary Algorithms .Wiley, 2001.[56] D. F. Goodman, M. Stimberg, P. Yger, and R. Brette, “Brian 2:neural simulations on a variety of computational hardware,”
BMCNeuroscience , vol. 15, no. 1, p. 1, 2014.[57] L. Salt, “Optimising a Neuromorphic Locust Looming Detector for UAVObstacle Avoidance,” Master’s thesis, The University of Queensland,2016.[58] E. Alpaydin,
Introduction to Machine Learning. , ser. Adaptive Compu-tation and Machine Learning. The MIT Press, 2014, vol. Third edition.
Llewyn Salt is a PhD candidate in ReinforcementLearning at the University of Queensland, Australia.His research is focused on utilising goals in contin-uous control problems in reinforcement learning.He received a bachelor’s (first class honours) andmaster’s degree in mechatronic engineering from theUniversity of Queensland. The work in this paperwas undertaken at the Institute of Neuroinformaticsat the University of Zurich and ETH Zurich as partof his master’s degree program.
David Howard
David is a Senior Research Sci-entist at CSIRO, Australia’s national science body.He works and leads multiple projects at the inter-section of robotics, evolutionary machine learning,and robotic materials. His interests include nature-inspired algorithms, learned autonomy, soft robotics,the reality gap, and evolution of form. His work hasbeen featured in local and national media, TechX-plore, and Wired.He received his BSc in Computing from theUniversity of Leeds in 2005, and completed a MScin Cognitive Systems at the same institution in 2006. In 2011 he receivedhis PhD from the University of the West of England. He is a member of theIEEE and ACM, and an avid proponent of education, STEM, and outreachactivities.
Giacomo Indiveri
Giacomo Indiveri is a Profes-sor at the University of Zurich and ETH Zurich,Switzerland. He obtained an M.Sc. degree in Elec-trical Engineering and a Ph.D. degree in ComputerScience from the University of Genoa, Italy. Indiveriwas a post-doctoral research fellow in the Divisionof Biology at the California Institute of Technology(Caltech) and at the Institute of Neuroinformatics ofthe University of Zurich and ETH Zurich. He wasawarded three ERC grants and is an IEEE Seniormember. His research interests lie in the study of realand artificial neural processing systems, and in the hardware implementationof neuromorphic cognitive systems, using full custom analog and digital VLSItechnology.