Evolving to learn: discovering interpretable plasticity rules for spiking networks
Jakob Jordan, Maximilian Schmidt, Walter Senn, Mihai A. Petrovici
EEvolving to learn:discovering interpretable plasticity rules for spiking networks
Jakob Jordan ∗ , Maximilian Schmidt ∗ , Walter Senn , and Mihai A. Petrovici Department of Physiology, University of Bern, Bern, Switzerland Ascent Robotics, Tokyo, Japan RIKEN Center for Brain Science, Tokyo, Japan Kirchhoff-Institute for Physics, Heidelberg University, Heidelberg, Germany
Abstract
Continuous adaptation allows survival in an ever-changing world. Adjustments in thesynaptic coupling strength between neurons are essential for this capability, setting us apartfrom simpler, hard-wired organisms. How these adjustments come about is essential both forunderstanding biological information processing and for developing cognitively performantartificial systems. We suggest an automated approach for discovering biophysically plausibleplasticity rules based on the definition of task families, associated performance measuresand biophysical constraints. This approach makes the relative weighting of guiding factorsexplicit, explores large search spaces, encourages diverse sets of hypotheses, and can discoverdomain-specific solutions. By evolving compact symbolic expressions we ensure the discoveredplasticity rules are amenable to intuitive understanding. This is fundamental for successfulcommunication and human-guided generalization, for example to different network architecturesor task domains. We demonstrate the flexibility of our approach by discovering efficient plasticityrules in typical learning scenarios.
Keywords: metalearning, learning to learn, synaptic plasticity, spiking neuronal networks,genetic programming
How do we learn? Whether we are memorizing the way to the lecture hall at a conference ormastering a new sport, somehow our central nervous system is able to retain the relevant informationover extended periods of time, sometimes with ease, other times only after intense practice. Thisacquisition of new memories and skills manifests at various levels of the system, with changes of theinteraction strength between neurons being a key ingredient. Uncovering the mechanisms behindthis synaptic plasticity is a key challenge in understanding brain function. Most studies approachthis monumental task by searching for symbolic expressions that map local biophysical quantitiesto changes of the connection strength between cells (Fig. 1A, B).Approaches to deciphering synaptic plasticity can be broadly categorized into bottom-up andtop-down. Bottom-up approaches typically rely on experimental data (e.g., Artola et al., 1990;Dudek and Bear, 1993; Bi and Poo, 1998; Ngezahayo et al., 2000) to derive dynamic equationsfor synaptic parameters that lead to functional emergent macroscopic behavior if appropriatelyembedded in networks (e.g., G¨utig et al., 2003; Izhikevich, 2007; Clopath et al., 2010). Top-downapproaches proceed in the opposite direction: from a high-level description of network function,e.g., in terms of an objective function (e.g., Toyoizumi et al., 2005; Deneve, 2008; Kappel et al.,2015; Kutschireiter et al., 2017; Sacramento et al., 2018; G¨oltz et al., 2019), dynamic equations forsynaptic changes are derived and biophysically plausible implementations suggested. Evidently, thisdemarcation is not strict, as most approaches seek some balance between experimental evidence, ∗ These authors contributed equally to this work. a r X i v : . [ q - b i o . N C ] J un CB Time P S P B D O ff spring production via mutationExperimentalsetup Evolution Hypotheses T a s k f a m il y N e t w o r k a r c h i t e c t u r e F i t n e s s f u n c t i o n O ff spring production Selection
Figure 1:
Artificial evolution of synaptic plasticity rules in spiking neuronal networks.(A)
Sketch of cortical microcircuits consisting of pyramidal cells (orange) and inhibitory interneurons(blue). Stimulation elicits action potentials in pre- and postsynaptic cells, which, in turn, influencesynaptic plasticity. (B)
Synaptic plasticity leads to a weight change (∆ w ) between the two cells,here measured by the change in the amplitude of post-synaptic potentials. The change in synapticweight can be expressed by a function f that in addition to spike timings ( t pre , t post ) can take intoaccount additional local quantities, such as the concentration of neuromodulators ( ρ , green dots inA) or postsynaptic membrane potentials. (C) For a specific experimental setup, an evolutionaryalgorithm searches for individuals representing functions f that maximize the corresponding fitnessfunction F . An offspring is generated by modifying the genome of a parent individual. Several runsof the evolutionary algorithm can discover phenomenologically different solutions ( f , f , f ) withcomparable fitness. (D) An offspring is generated from a single parent via mutation. Mutations ofthe genome can, for example, exchange mathematical operators, resulting in a different function f .functional considerations and model complexity. However, the relative weighting of each of theseaspects is usually not made explicit in the communication of scientific results, making it difficult totrack by other researchers. Furthermore, the selection of specific tasks to illustrate the effect of asuggested learning rule is usually made only after the rule was derived based on other considerations.Hence, this typically does not consider competing alternative solutions, as an exhaustive comparisonwould require significant additional investment of human resources. A related problem is thatresearchers, in a reasonable effort to use resources efficiently, tend to focus on promising parts ofthe search space around known solutions, leaving large parts of the search space unexplored (Radiand Poli, 2003).We suggest an automated approach to discover learning rules in spiking neuronal networksthat explicitly addresses these issues. Automated procedures interpret the search for biologicalplasticity mechanisms as an optimization problem (Bengio et al., 1992), an idea typically referredto as meta-learning or learning-to-learn. These approaches make the emphasis of particular aspectsthat guide this search explicit and place the researcher at the very end of the process, supportingmuch larger search spaces and the generation of a diverse set of hypotheses. Furthermore, theyhave the potential to discover domain-specific solutions that are more efficient than general-purposealgorithms. Early experiments focusing on learning in artificial neural networks (ANNs) made useof gradient descent or genetic algorithms to optimize parameterized learning rules (Bengio et al.,1990, 1992, 1993) or genetic programming to evolve less constrained learning rules (Bengio et al.,1994; Radi and Poli, 2003), rediscovering mechanisms resembling the backpropagation of errors(Ivakhnenko, 1971; Rumelhart et al., 1985). Recent experiments demonstrate how optimizationmethods can design optimization algorithms for recurrent ANNs (Andrychowicz et al., 2016).2e extend these meta-learning ideas to spiking neuronal networks. The discrete nature of spike-based neuronal interactions endows these networks with rich dynamical and functional properties(e.g., Dold et al., 2019; Jordan et al., 2019; Keup et al., 2020). In addition, with the advent ofnon-von Neumann computing systems based on spiking neuronal networks with online learningcapabilities (Moradi et al., 2017; Davies et al., 2018; Billaudelle et al., 2019), efficient learningalgorithms for spiking systems become increasingly relevant for non-conventional computing. Here,we employ genetic programming (Fig. 1C, D; Koza, 2010) as a search algorithm for two mainreasons. First, genetic programming can operate on analytically tractable mathematical expressionsdescribing synaptic weight changes that are interpretable. Second, an evolutionary search doesnot need to compute gradients in the search space, thereby circumventing the need to estimate agradient in non-differentiable systems.We apply our approach to three different learning paradigms for spiking neuronal networks:reward-driven, error-driven and correlation-driven learning. We discover efficient plasticity rulesfor these task families that perform competitively with known solutions. For the reward-driventask, we reliably reproduce a previously suggested solution that achieves high performance. Inthe error-driven learning scenario, the evolutionary search discovers a variety of solutions which,with appropriate analysis of the corresponding expressions, can be shown to effectively implementstochastic gradient descent. Finally, in the correlation-driven task, our method generates varioushypotheses for STDP-like plasticity rules with diverse weight dynamics that match the performanceof a rule derived under optimality considerations. Our results demonstrate the significant potentialof automated procedures in the search for plasticity rules in spiking neuronal networks, analogous tothe the transition from hand-designed to learned features that lies at the heart of modern machinelearning. We introduce the following recipe to search for biophysically plausible plasticity rules in spikingneuronal networks. First we determine a task family of interest and an associated experimentalsetup which includes specification of the network architecture, e.g., neuron types and connectivity,as well as stimulation protocols or training data sets. Crucially, this step involves defining a fitnessfunction to guide the evolutionary search towards promising regions of the search space. It assignshigh fitness to those individuals, i.e., learning rules, that solve the task well and low fitness to others.The fitness function may additionally contain constraints implied by experimental data or arisingfrom computational considerations. We determine each individual’s fitness on various examples fromthe given task family, e.g., different input spike train realizations, to discover plasticity rules thatgeneralize well (Chalmers, 1991; Soltoggio et al., 2018). Finally, we specify the neuronal variablesavailable to the plasticity rule, such as low-pass-filtered traces of pre- and postsynaptic spikingactivity or neuromodulator concentrations. This choice is guided by biophysical considerations,e.g., which quantities are locally available at a synapse, as well as by the task family, e.g., whetherreward or error signals are provided by the environment. We write the plasticity rule in the generalform ∆ w = ηf ( . . . ), where η is a fixed learning rate, and employ an evolutionary search to discoverfunctions f that lead to high fitness.We propose to use genetic programming (GP) as an evolutionary algorithm to discover plasticityrules in spiking neuronal networks. GP applies mutations and selection pressure to an initiallyrandom population of computer programs to artificially evolve algorithms with desired behaviors(e.g., Koza, 1992, 2010). Here we consider the evolution of mathematical expressions. We employ aspecific form of GP, Cartesian genetic programming (CGP; e.g, Miller and Thomson, 2000; Miller,2011), that uses an indexed graph representation of programs. The genotype of an individual is atwo-dimensional Cartesian graph (Fig. 2A, top). Over the course of an evolutionary run, this graphhas a fixed number of input, output, and internal nodes. The operation of each internal node isfully described by a single function gene and a fixed number of input genes. A function table mapsfunction genes to mathematical operations (Fig. 2A, bottom), while input genes determine fromwhere this node receives data. A given genotype is decoded into a corresponding computationalgraph (the phenotype, Fig. 2B) which defines a function f . During the evolutionary run, mutations3 utationsPhenotype connectivitygenefunctiongene GenotypeFunction table A B C
02 345 678 91 02 34 6 9
02 4 9 01 4 902 4 9 02 4 9
Inputs Internal nodes Outputs
Figure 2:
Representation and mutation of mathematical expressions in Cartesian ge-netic programming. (A)
The genotype of an individual is a two-dimensional Cartesian graph(top). In this example, the graph contains three input nodes (0 − −
8) anda single output node (9). In each node the genes of a specific genotype are shown, encoding theoperator used to compute the node’s output and its inputs. Each operator gene maps to a specificmathematical function (bottom). Special values ( − , −
2) represent input and output nodes. Forexample, node 4 uses the operator 1, the multiplication operation “ ∗ ”, and receives input fromnodes 0 and 2. This node’s output is hence given by x ∗ x . The number of input genes per nodeis determined by the operator with the maximal arity (here two). Red genes highlight fixed genesthat can not be mutated. ∅ denotes non-coding genes. (B) The computational graph (phenotype)generated by the genotype in A. Input nodes ( x , x , x ) represent the arguments of the function f .Each output node selects one of the other nodes as a return value of the computational graph, thusdefining a function from input x to output y = f ( x ). Here, the output node selects node 4 as areturn value. Some nodes defined in the genotype are not used by a particular realization of thecomputational graph (light gray nodes, e.g., node 6). (C) Mutations in the genome either lead to achange in graph connectivity (top, green arrow) or alter the operators used by an internal node(bottom, green node). Mutations that do not affect the phenotype, e.g., in node 6, are called silentmutations. Here, both mutations affect the phenotype and are hence not silent.of the genotype alter connectivity and node operations, which can modify the encoded function(Fig. 2C). The indirect encoding of the computational graph via the genotype supports variable-length phenotypes, since some internal nodes may not be used to compute the output (Fig. 2B). Thesize of the genotype, in contrast, is fixed, thereby restricting the maximal size of the computationalgraph and ensuring compact, interpretable mathematical expressions. Furthermore, the separationinto genotype and phenotype allows for “silent mutations”, i.e., mutations in the genotype thatdo not alter the phenotype. These silent mutations can lead to more efficient search as they canaccumulate and in combination lead to an increase in fitness once affecting the phenotype (Millerand Thomson, 2000). A µ + λ evolution strategy (Beyer and Schwefel, 2002) drives evolution bycreating the next generation of individuals from the current one via tournament selection, mutationand selection of the fittest individuals (Sec. 4.1). Prior to starting the search, we choose themathematical operations that can appear in the plasticity rule and other (hyper)parameters ofthe Cartesian graph and evolutionary algorithm. For simplicity, we consider a restricted set ofmathematical operations and additionally make use of nodes with constant output. After the searchhas completed, e.g., by reaching a target fitness value or a maximal number of generations, we4 N e u r o n Early training
246 248 250
Late training
Time (s) s Generation index F i t n e ss E r j ( R − E r j ( R − E r j ( R − /R E r j ( R − Generation index
R RE r j E r j ( R − RE r j − R − E r j E r j ( R − A BC D
RewardInput Output TargetEligibilitytrace P l a s t i c i t y r u l e Figure 3:
Cartesian genetic programming evolves an efficient reward-driven learningrule. (A)
Network sketch. Multiple input neurons with Poisson activity project to a single outputunit. Pre- and postsynaptic activity generate an eligibility trace in each synapse. Comparisonbetween the output activity and the target activity generates a reward signal. Both of these areprovided to the plasticity rule that determines the weight update. (B)
Raster plot of the activityof input neurons (small black dots) and output neuron (large red dots). Gray (white) backgroundindicate patterns for which the output should be active (inactive). Check marks indicate correctclassifications. We show 8 trials at the beginning (left) and the end of training (right) using theevolved plasticity rule: ∆ w j = ηE r j ( T )( R − (C) Fitness of best individual per generation as afunction of the generation index for multiple runs of the evolutionary algorithm with different initialconditions. Labels represent the rule at the end of the respective run for the four runs with thehighest final fitness (colored lines). Dark gray lines represent runs with low final fitness. Light grayline indicates the fitness of Eq. 2. Blue marker with error bars represents fitness of the discoveredrule ∆ w j = ηE r j ( T )( R −
1) on 10 validation tasks not used during the evolutionary search. (D)
Function f encoded in the best individual as a function of the generation index for one run ofthe evolutionary algorithm. Expressions are only shown after the generation in which a change inhighest fitness occurred.analyze the discovered set of solutions.In the following, we describe the results of three experiments following the recipe outlined above. We consider a reinforcement-learning task for spiking neurons similar to Urbanczik and Senn(2009). The experiment can be succinctly described as follows: N inputs project to a single readoutmodeled by a leaky integrator neuron with exponential postsynaptic currents and stochastic spikegeneration (for details see Sec. 4.5). We generate a finite number M of frozen-Poisson-noise patternsof duration T and assign each of these randomly to one of two classes. The output neuron shouldlearn to classify each of these spatio-temporal input patterns into the corresponding class using aspike/no-spike code (Fig. 3A, B).The fitness F ( f ) of an individual encoding the function f is measured by the average cumulative5eward obtained over a certain number of experiments n exp , each consisting of n classification tasks: F ( f ) := 1 n exp n exp (cid:88) k =1 R k ( f ) , (1)where R k ( f ) = (cid:80) ni =1 R k,i ( f ) is the cumulative reward obtained in experiment k . The reward R k,i is 1 if the classification is correct and − w j = ηE r j ( t )( R − . (2)Here η represents a fixed learning rate, and j represents the presynaptic neuron index. E r j is aneligibility trace that contains contributions from the spiking activity of pre- and post-synapticneurons which is updated over the course of a single trial (for details see Sec. 4.5). Similarly toUrbanczik and Senn (2009), reward and eligibility trace values at the end of each trial ( t = T ) areused to determine synaptic weight changes. The plasticity rule can be written in the followinggeneral form where, for comparability, we choose the same learning rate as for the original rule(Eq. 2): ∆ w j = η f (cid:0) R, E r j ( T ) (cid:1) . (3)Using CGP with two inputs ( R , E r j ( T )) we search for plasticity rules that maximize the fitness F ( f ) (Eq. 1).In about half of the runs the the evolutionary algorithm evolves a plasticity rule identical tothe one derived in previous work (Eq. 2), which achieves high fitness (Fig. 3C). As the rewardis constrained to R ∈ {− , } the task includes a degeneracy and different functions f can leadto identical weight changes. Some of these phenomenologically identical expressions were foundby the evolutionary algorithm, such as E r j ( R − R , RE r j (1 − R ) , − E r j + E r j R . An essential benefitof our approach is that we can identify such classes of solutions that lead to identical weightchanges thanks to a crucial property of genetic programming: since we obtain analytically tractableexpressions for the plasticity rule, we can analyze them with conventional methods, in contrastto approaches representing plasticity rules with ANNs (e.g., Risi and Stanley, 2010; Orchardand Wang, 2016; Bohnstingl et al., 2019), for which it is challenging to fully understand theirmacroscopic computation. A further advantage of obtaining analytical expressions is the possibilityto interpret the evolutionary steps by which the search progresses. We observe that already after afew generations the algorithm discovers an important basic ingredient for learning from rewards:the factor RE r j (Fig. 3D). This product, for example, increases weights of projections which havecontributed to the correct spiking of the output neuron, causing these neurons to be even moreactive the next time this specific pattern is encountered. However, this rule is not stable: due torandomness in spike generation, weight changes do not subside even if the neuron always respondscorrectly and can thus lead to unlearning of good weight configurations. Over the next few hundredgenerations, refinements of this solution evolve without losing the basic structure (Fig. 3D). Finally,a stable solution for this task is found: evolution discovers that weights should only change if theclassification was wrong and remain unchanged if it was correct, i.e., the product is replaced by E r j ( R − We next consider a supervised learning task in which a neuron receives information about how farits output is from the desired behavior, instead of just a scalar reward signal as in the previous task.The widespread success of this approach in machine learning highlights the efficacy of learning6
10 20
Time (s) − − k v ( t ) − u ( t ) k Time (s) S y n a p t i c w e i g h t Generation index − − − F i t n e ss ( v − u )¯ s j (2 u − /vu ( − v + u ) / (2( v ¯ s j + u ))¯ s j ( v + u )( v ( v − u ) − ¯ s j ) /v v ( v − u )¯ s j ( u − / ( v − u ¯ s j ( u − v − u )¯ s j ( v − u )¯ s j − − − A B CD P l a s t i c i t y r u l e Input Teacher potentialStudent potentialFiltered spike train
Figure 4:
Cartesian genetic programming evolves efficient error-driven learning rules.(A)
Network sketch. Multiple input neurons with Poisson activity project to two neurons. Oneof the neurons (the teacher) generates a target for the other (the student). The membranepotentials of teacher and student as well as the filtered pre-synaptic spike trains are provided tothe plasticity rule that determines the weight update. (B)
Root mean squared error between theteacher and student membrane potential over the course of learning using the evolved plasticityrule: ∆ w j ( t ) = η [ v ( t ) − u ( t )] ¯ s j ( t ). (C) Synaptic weights over the course of learning correspondingto panel B. Horizontal dashed lines represent target weights, i.e., the fixed synaptic weights ontothe teacher. (D)
Fitness of the best individual per generation as a function of the generation indexfor multiple runs of the evolutionary algorithm with different initial conditions. Labels representthe rule at the end of the respective run. Colored markers with error bars represent fitness of therespective plasticity rule on 15 validation tasks not used during the evolutionary search.from errors compared to correlation- or reward-driven learning (Goodfellow et al., 2016). It hastherefore often been hypothesized that evolution has installed similar capabilities in biologicalnervous systems (see, e.g., Marblestone et al., 2016; Whittington and Bogacz, 2019).Urbanczik and Senn (2014) introduce an efficient, flexible and biophysically plausible imple-mentation of error-driven learning via multi-compartment neurons. For simplicity, we consider anequivalent formulation of this learning principle in terms of two point neurons modeled as leakyintegrator neurons with exponential postsynaptic currents and stochastic spike generation. Oneneuron mimics the somatic compartment and provides a teaching signal to the other neuron actingas the dendritic compartment. Here, the difference between the teacher and student membranepotentials drives learning: ∆ w j ( t ) = η [ v ( t ) − u ( t )] ¯ s j ( t ) , (4)where v is the teacher potential, u the student membrane potential, and η a fixed learning rate.¯ s j ( t ) = ( κ ∗ s j )( t ) represents the the presynaptic spike train s j filtered by the synaptic kernel κ .Eq. 4 can be interpreted as stochastic gradient descent on an appropriate cost function (Urbanczikand Senn, 2014) and can be extended to support credit assignment in hierarchical neuronal networks(Sacramento et al., 2018). For simplicity we assume the student has direct access to the teacher’smembrane potential, but in principle one could also employ proxies such as firing rates as suggestedin Pfister et al. (2010); Urbanczik and Senn (2014).We consider a one-dimensional regression task in which we feed random Poisson spike trainsinto the two neurons (Fig. 4A). The teacher maintains fixed input weights while the student’sweights should be adapted over the course of learning such that it’s membrane potential follows the7eacher’s (Fig. 4B, C). The fitness F ( f ) of an individual encoding the function f is measured by theroot mean-squared error between the teacher and student membrane potential over the simulationduration T , excluding the initial 10%, averaged over n exp experiments: F ( f ) := 1 n exp n exp (cid:88) k =1 (cid:118)(cid:117)(cid:117)(cid:116) T (cid:88) t =0 . T [ v k ( t ) − u k ( t )] . (5)Each experiment consists of different input spike trains and different teacher weights. The generalform of the plasticity rule for this error-driven learning task is given by:∆ w j = ηf ( v, u, ¯ s j ) . (6)Using CGP with three inputs ( v, u, ¯ s j ), we search for plasticity rules that maximize the fitness F ( f ).Starting from low fitness, about half of the evolutionary runs evolve efficient plasticity rules(Fig. 4D) closely matching the performance of the stochastic gradient descent rule of Urbanczikand Senn (2014). While two runs evolve exactly Eq. 4, other solutions with comparable fitness arediscovered, such as ∆ w j = η ( v − u )¯ s j (2 u − v , and (7)∆ w j = η ¯ s j ( v + u ) ( v ( v − u ) − ¯ s j ) v . (8)At first sight, these rules may appear more complex, but for the considered parameter regime underthe assumptions v ≈ u ; v, u (cid:28)
1, we can write them as:∆ w j = ηc ( v − u )¯ s j + c , (9)with a multiplicative constant c = O (1) and a negligible additive constant c . Elementarymanipulations of the expressions found by CGP thus demonstrate the similarity of these superficiallydifferent rules to Eq. 4. Consequently, they can be interpreted as approximations of the gradient-descent plasticity rule. The constants generally fall into two categories: fine-tuning of the learningrate for the set of task samples encountered during evolution ( c ) and factors that have negligibleinfluence and would most likely be pruned over longer evolutionary timescales ( c ). This pruningcould be accelerated, for example, by imposing a penalty on the model complexity in the fitnessfunction, thus preferentially selecting simpler solutions.In conclusion, the evolutionary search rediscovers variations of a human-designed efficientgradient-descent-based learning rule for the considered error-driven learning task. Due to thecompact, interpretable representation of the plasticity rules we are able to analyze the set ofhigh-performing solutions and thereby identify phenomenologically identical rules despite theirsuperficial differences. We now consider a task in which neurons to not receive any feedback from the environment abouttheir performance but instead only have access to correlations between pre- and postsynapticactivity. For this paradigm Masquelier (2017) proposes a correlation-driven plasticity rule thatallows neurons to discover a repeating frozen-noise pattern interrupted by random backgroundspikes. Their experimental setup is briefly described as follows: N inputs project to a single outputneuron (Fig. 5A). The activity of all inputs is determined by a Poisson process with a fixed rate.A frozen-noise activity pattern of duration T pattern is generated once and replayed every T inter ms(Fig. 5B) while inputs are randomly spiking in between.We define the fitness F ( f ) of an individual encoding the function f by the minimal averagesignal-to-noise ratio SNR across n exp experiments: F ( f ) := min k { SNR k , k ∈ [1 , n exp ] } . (10)8 . . . Time (s) I npu t n e u r o n T inter T pattern u ( m V )
49 50
Time (s)
199 2000 2000
Generation index F i t n e ss −
100 0 100 t post − t pre − . . . ∆ w j −
100 0 100 t post − t pre − . . . ∆ w j −
100 0 100 t post − t pre w out . . ∆ w j −
100 0 100 t post − t pre . . ∆ ˜ w j −
100 0 100 t post − t pre − . − . . ∆ ˜ w j A B CD E FG H I . . . w . . . w P l a s t i c i t y r u l e Input Output
Eligibilitytrace
Figure 5:
Cartesian genetic programming evolves diverse correlation-driven learningrules. (A)
Network sketch. Multiple inputs project to a single output neuron. The currentsynaptic weight w j and the eligibility trace E c j are provided to the plasticity rule that determinesthe weight update. (B) Activity of input units. A frozen-noise pattern of duration T pattern (graybackground) is interleaved with periods of random background spiking of duration T inter . Spiketrains are generated by a stationary Poisson process. (C) Membrane potential u of the outputneuron over the course of learning using Eq. 12. Gray boxes indicate presentation of the frozen-noisepattern. (D) Fitness (Eq. 10) of the best individual per generation as a function of the generationindex for multiple runs of the evolutionary algorithm with different initial conditions. Blue and redcurves correspond to the two representative plasticity rules selected for detailed analysis. Blue andred markers with error bars represent fitness of the two representative rules on 20 validation tasksnot used during the evolutionary search. The orange marker indicates the fitness of the homeostaticSTDP rule (Eq. 12; Masquelier, 2017) on 20 validation tasks. (E, F) : Learning rules evolved by tworuns of CGP ( E : Eq. 14, F : Eq. 15): weight change ∆ w j as a function of spike timing differences∆ t j for three different weights w j . (G) Same for the homeostatic STDP rule (Eq. 12; Masquelier,2017). (H, I)
Effective learning rules: LR 1 (H, Eq. 17) and LR 2 (I, Eq. 18). The learning rate is η = 0 .
01. 9he signal-to-noise ratio SNR k , following Masquelier (2017), is defined as the difference between themaximal free membrane potential during pattern presentation averaged over multiple presentations( (cid:104) u k,i, max (cid:105) ) and the mean of the free membrane potential in between pattern presentations ( (cid:104) u k, inter (cid:105) )divided by its variance (Var( v k, inter )):SNR k := (cid:104) u k,i, max (cid:105) − (cid:104) u k, inter (cid:105) Var( u k, inter ) . (11)The free membrane potential is obtained in a separate simulation with frozen weights by disablingthe spiking mechanism for the output neuron. This removes measurement noise in the signal-to-noiseratio arising from spiking and subsequent membrane-potential reset. Each experiment consists ofdifferent realizations of a frozen-noise pattern and background spiking.The rule described by Masquelier (2017) is a simple additive spike-timing-dependent plasticity(STDP) rule that replaces the depression branch of traditional STDP variants with a homeostaticterm w hom < j changesat each postsynaptic spike according to (Fig. 5G):∆ w j = η (cid:40) w hom ∆ t j < E c j + w hom ∆ t j ≥ . (12)Here, E c j := e −| ∆ t j | /τ represents an eligibility trace that depends on the relative timing of post- andpresynaptic spiking (∆ t j = t post − t pre ,j ) and is represented locally in each synapse (e.g., Morrisonet al., 2008). η represents a fixed learning rate and w hom a constant. The synaptic weight is boundsuch that w j ∈ [0 , w j = η (cid:40) f dep ( w j , E c j ) ∆ t j < f fac ( w j , E c j ) ∆ t j ≥ . (13)Using CGP with two inputs ( w j , E c j ), we search for functions f dep and f fac that maximize thefitness F ( f dep , f fac ) (Eq. 10).After 2000 generations, half of the runs of the evolutionary algorithm discover high-fitnesssolutions (Fig. 5D). These plasticity rules lead to synaptic weight configurations for which thesignal-to-noise ratio of the free membrane potential is at least ten. Empirical investigation showsthat such a signal-to-noise ratio, causes the neuron to reliably detect the frozen-noise pattern. Fromthese well-performing learning rules, we pick two representative examples (Fig. 5E,F) to analyze indetail. Learning rule 1 (LR1, Fig. 5E) is defined by the following equation:∆ w j = η (cid:40) w j − ( w j − E c j ∆ t j < E c j − w j ∆ t j ≥ . (14)Learning rule 2 (LR2, Fig. 5F) is defined by the following equation:∆ w j = η (cid:40) w j − E c j /w j ∆ t j < w j E c j ) w j − t j ≥ . (15)The direct availability of these rules as closed-form expressions enables an interesting line ofinquiry. While both rules reliably enable the neuron to solve its task, they both exhibit nonzerofacilitation and depression even for large time lags ∆ t j , which immediately raises questions abouttheir stability. To address this issue, we consider, for simplicity, a nearest-neighbor spike-pairing10cheme (e.g., Morrison et al., 2008) in which each postsynaptic spike can almost always be assigneda causal interaction with the preceding presynaptic spike and an anticausal interaction with thetrailing presynaptic spike. Weight changes therefore come in pairs: for each causal update theremust exist an anticausal one and vice-versa. In the limit of small firing rates, we can thereforemake the simplifying assumption that each causal (anticausal) spike interaction is accompaniedby an anticausal (causal) interaction with a large time difference ∆ t j (see Appendix Sec. A fora mathematical argument). We can therefore construct effective learning rules by adding theasymptotic weight change for anticausal (causal) interactions to all weight changes on the causal(anticausal) branch:∆ ˜ w j → η (cid:40) ∆ w j (∆ t j ) + lim ∆ t (cid:48) j →−∞ ∆ w j (∆ t (cid:48) j ) ∆ t j < w j (∆ t j ) + lim ∆ t (cid:48) j →∞ ∆ w j (∆ t (cid:48) j ) ∆ t j ≥ / w j = η (cid:40) (1 − w j ) E c j ∆ t j < E c j ∆ t j ≥ , (17)decays to zero for large negative and positive time differences and causes facilitation for small timedifferences, regardless whether they are causal or anticausal. This learning rule favors correlationbetween presynaptic and postsynaptic firing with a small weight-dependent facilitation bias towardscausal interactions. In contrast, effective LR2 (Fig. 5I)∆ w j = η (cid:40) − E c j /w j + w j − t j < w j E c j ) w j + w j − t j ≥ , (18)implements a similar strategy as the learning rule of Masquelier (2017): it depresses the synapse foralmost all interactions except for a narrow time window for small, positive (causal) time differences.Additionally, however, it more pronouncedly punishes anticausal interactions, thus being able toachieve even higher fitness. Note how both rules reproduce important components of experimentallyestablished STDP traces (e.g., Caporale and Dan, 2008).In conclusion, for the correlation-driven task, the evolutionary search is able to discover notonly rules similar to those previously suggested but additionally proposes rules that are markedlydifferent, yet still perform competitively, thus generating new hypotheses for learning in biologicalsubstrates. We note, again, that our insight into the dynamics of these rules was permitted, in thefirst place, by their interpretable representation in our evolutionary framework. Uncovering the mechanisms of learning via synaptic plasticity is a critical step towards understandingbrain (dys)function and building truly intelligent, adaptive machines. We introduce a novel approachto discover biophysically plausible plasticity rules in spiking neuronal networks. Our meta-learningframework uses genetic programming to search for plasticity rules by optimizing a fitness functionspecific to the respective task family. Our approach discovers high-performing solutions in variouslearning paradigms, reward-driven, error-driven, and correlation-driven learning. This is, to thebest of our knowledge, the first demonstration of the power of genetic programming methods in thesearch for plasticity mechanisms in spiking neuronal networks.A key point of our approach is the compact representation of the plasticity rules. We restrict thecomplexity of the expressions by three considerations. First, we assume that effective descriptions11f weight changes can be found that are not unique to each individual synapse. This is is a commonassumption in computational neuroscience and based on the observation that nature must havefound a parsimonious encoding of brain structure, as not every connection in the brain can bespecified in the DNA of the organism (Zador, 2019); rather, genes encode general principles bywhich the neuronal networks and subnetworks are organized and reorganized (cf. Risi and Stanley,2010). Our approach aims at discovering such general principles for synaptic plasticity. Second,physical considerations restrict the information available to the plasticity rule to local quantities,such as pre- and post-synaptic activity traces or specific signals delivered via neuromodulators (e.g.,Miconi et al., 2020; Cox and Witten, 2019). Third, we limit the maximal size of the expressionsto keep the resulting learning rules interpretable and avoid overfitting. We explicitly want toavoid constructing an opaque system that has high task performance but does not allow us tounderstand how the network structure is shaped over the course of learning. The low complexityof the resulting expressions generates intuitive understanding, facilitating communication andhuman-guided generalization from a set of solutions to different network architectures or taskdomains. In the search for plasticity rules suitable for physical implementations in biologicalsystems, these insights are crucial as the identified plasticity mechanisms can serve as buildingblocks for learning rules that generalize to the actual challenges faced by biological agents. Moreover,simple expressions highlight the key interactions between the local variables giving rise to plasticity,thus providing insights into the underlying biophysical processes and potentially suggesting newexperimental approaches.Furthermore, the automated search can discover plasticity rules for a given problem that exploitimplicit assumptions in the task. It therefore highlights underconstrained searches, be this dueto scarcity of biological data, the simplicity of chosen tasks or the omission of critical featuresin the task design. For instance, without asserting equal average spike rates of background andpattern neurons in the correlation-driven task, one could discover plasticity rules that exploit therate difference rather than the spatio-temporal structure of the input.Future work needs to address a general issue of any optimization method: how can we sys-tematically counter overfitting to reveal general solutions? A simple approach would increase thenumber of sample tasks during a single fitness evaluation. However, computational costs increaselinearly in the number of samples. Another technique penalizes the complexity of the resultingexpressions, e.g., proportional to the size of the computational graph. Since it is a priori unclearhow this complexity penalty should be weighted against the original fitness measures, one shouldconsider multi-objective optimization algorithms (e.g., Deb, 2001). Another issue to be addressed infuture work is the choice of the learning rate. Currently, this value is not part of the optimizationprocess and all tasks assume a fixed learning rate. One possibility is to allow for more constants inCGP such that the search can easily scale the learning rate. However, the dimensionality of thesearch space scales exponentially in the number of operators and constants, and the feasibility ofsuch an approach needs to be carefully evaluated. One possibility to mitigate this combinatorialexplosion is to combine the evolutionary search with gradient-based optimization methods that canfine-tune constants in the expressions (Topchy and Punch, 2001; Izzo et al., 2017).Evolved Plastic Artificial Neural Networks (EPANNs; e.g., Soltoggio et al., 2018) and inparticular adaptive HyperNEAT (Risi and Stanley, 2010), represent an alternative approach todesigning plastic neural networks. In contrast to our method, however, these approaches include thenetwork architecture itself into the evolutionary search, alongside synaptic plasticity rules. Whilethis can lead to high-performance solutions due to a synergy between network architecture andplasticity, this interplay has a an important drawback, as in general it is difficult to tease apart thecontribution of plasticity from that of network structure to high task performance (cf. Gaier andHa, 2019). In addition, the distributed, implicit representation of plasticity rules in HyperNEATcan be difficult to interpret, which hinders a deeper understanding of the learning mechanisms. Inmachine-learning-oriented applications, this lack of credit assignment is less of an issue. For researchinto plasticity rules employed by biological systems, however, it presents a significant obstacle.The evolutionary search for plasticity rules requires a large number of simulations, as eachcandidate solution needs to be evaluated on a sufficiently large number of samples from the taskfamily to encourage generalization (e.g., Chalmers, 1991; Bengio et al., 1992). Due to silentmutations in CGP, i.e., modifications of the genotype that do not alter the phenotype, we usecaching methods to significantly reduce computational cost as only new solutions need to be12valuated. However, even employing such methods, the number of required simulations remainslarge, in the order of 10 − per evolutionary run. Neuromorphic systems, dedicated hardwarespecifically designed to emulate neuronal networks, provide an attractive way to speed up theevolutionary search. To serve as suitable substrates for the approach presented here, these systemsshould be able to emulate spiking neuronal networks in an accelerated fashion with respect toreal time and provide on-chip plasticity with a flexible specification of plasticity mechanisms (e.g.,Davies et al., 2018; Billaudelle et al., 2019; Mayr et al., 2019).While the learning rules found in the experiments described above were all evolved from randomexpressions, one can also view the presented framework as a hypothesis-testing machine. Startingfrom a known plasticity rule, our framework would allow researchers to address questions like:assuming the learning rule would additionally have access to variable x , could this be incorporatedinto the weight updates such that learning would improve? The automated procedure makesanswering such questions much more efficient than a human-guided manual search. Additionally,the framework is suitable to find robust biophysically plausible approximations for complex learningrules containing quantities that might be non-local, difficult to compute, and/or hard to implementin physical substrates. In particular, multi-objective optimization is suitable to evolve a known,complex rule into simpler versions while maintaining high task performance. Similarly, one couldsearch for modifications of general rules that are purposefully tuned to quickly learn within aspecific task family, outperforming more general solutions. In each of these cases, prior knowledgeabout effective learning algorithms provides a starting point from which the evolutionary searchcan discover powerful extensions.The experiments considered here were mainly chosen due to their simplicity and prior knowl-edge about corresponding plasticity rules that provided us with a high-performance reference forcomparison. Additionally, in each experiment, we have restricted ourselves to a constrained set ofpossible inputs to the plasticity rule. Future work may involve less preprocessed data as inputswhile considering more diverse mathematical operators. In the correlation-driven task, one couldfor example provide the raw times of pre- and postsynaptic spiking to the graph instead of theexponential of their difference, leaving more freedom for the evolutionary search to discover creativesolutions. We expect particularly interesting applications of our framework to involve more complextasks that are challenging for contemporary algorithms, such as life-long learning, which needs totackle the issue of catastrophic forgetting (French, 1999) or learning in recurrent spiking neuronalnetworks. In order to yield insights into information processing in the nervous system, the designof the network architecture should be guided by known anatomical features, while the consideredtask families should fall within the realm of ecologically relevant problems.We view the presented methods as a machinery for generating, testing, and extending hypotheseson learning in spiking neuronal networks driven by problem instances and prior knowledge andconstrained by experimental evidence. We believe this approach holds significant potential toaccelerate progress towards deep insights into information processing in physical systems, bothbiological and biologically inspired, with immanent potential for the development of powerfulartificial learning machines. We use a µ + λ evolution strategy (Beyer and Schwefel, 2002) to evolve a population of individualstowards high fitness. In each generation, λ new offsprings are created from µ parents via tournamentselection (e.g., Miller et al., 1995) and subsequent mutation. From these µ + λ , individuals thebest µ individuals are selected as parents for the next generation (Alg. 1). In this work we usea tournament size of one and a fixed mutation probability p mutate for each gene in an offspringindividual. Since in CGP crossover of individuals can lead to significant disruption of the searchprocess due to major changes in the computational graphs (Miller, 1999), we avoid it here, in otherwords, new offspring are only modified by mutations. We use neutral search (Miller and Thomson,2000), in which an offspring is preferred over a parent with equal fitness, to allow the accumulation ofsilent mutations that can jointly lead to an increase in fitness. As it is computationally infeasible toexhaustively evaluate an individual on all possible tasks from a task family, we evaluate individuals13 ata: Initial random parent Population P = { p i } of size µt ← while t < n generations do Create new offspring population Q t = CreateOffspringPopulation( P t )Combine parent + offspring populations R t = P t ∪ Q t Evaluate fitness of each individual in R t Pick P t +1 ⊂ R t best individuals as new parents t ← t + 1 endFunction CreateOffspringPopulation( P ) begin Offspring population Q = {} while | Q | < λ do Choose random subset of P of size N tournament Choose best individual in the subset and append to Q endfor q i ∈ Q do Mutate each gene of q i with mutation probability p mutation endReturn Q endAlgorithm 1: Variant of µ + λ evolution strategies used in this study. Note the absence of acrossover step.only on a limited number of sample tasks and aggregate the results into a scalar fitness, eitherby choosing the minimal result or averaging. We manually select the number of sample tasks tobalance computational costs and sampling noise for each task. In each generation, we use the sameinitial conditions to allow a meaningful comparison of results across generations. If an expression isencountered that can not be meaningfully evaluated, such as division by zero, the correspondingindividual is assigned a fitness of −∞ . HAL-CGP (Schmidt and Jordan, 2020) (https://github.com/Happy-Algorithms-League/hal-cgp)is an extensible pure Python library implementing Cartesian genetic programming to represent,mutate and evaluate populations of individuals encoding symbolic expressions targeting applicationswith computationally expensive fitness evaluations. It supports the translation from a CGPgenotype, a two-dimensional Cartesian graph, into the corresponding phenotype, a computationalgraph implementing a particular mathematical expression. These computational graphs can beexported as pure Python functions, NumPy-compatible functions (Walt et al., 2011), SymPyexpressions (Meurer et al., 2017) or PyTorch modules (Paszke et al., 2019). Users define thestructure of the two-dimensional graph from which the computational graph is generated. Thisincludes the number of inputs, columns, rows, and outputs, as well as the computational primitives,i.e., mathematical operators and constants, that compose the mathematical expressions. Due to themodular design of the library, users can easily implement new operators to be used as primitives.The library implements a µ + λ evolution strategy to evolve individuals (see Sec. 4.1). Users needto specify hyperparameters for the evolutionary algorithm, such as the size of parent and offspringpopulations and the maximal number of generations. To avoid reevaluating phenotypes that havebeen previously evaluated, the library provides a mechanism for caching results on disk. Exploitingthe wide availability of multi-core architectures, the library can parallelize the evaluation of allindividuals in a single generation via separate processes. Spiking neuronal network simulations are based on the 2.16.0 release of the NEST simulator(Gewaltig and Diesmann, 2007) (https://github.com/nest/nest-simulator; commit 3c6f0f3). NEST14s an open-source simulator for spiking neuronal networks with a focus on large networks with simpleneuron models. The computationally intensive propagation of network dynamics is implementedin C ++ while the network model can be specified using a Python API (PyNEST; Eppler et al.,2009; Zaytsev and Morrison, 2014). NEST profits from modern multi-core and multi-node systemsby combining local parallelization with OpenMP threads and inter-node communication via theMessage Passing Interface (MPI) (Jordan et al., 2018). The standard distribution offers a varietyof established neuron and plastic synapse models, including variants of spike-timing-dependentplasticity, reward-modulated plasticity and structural plasticity. New models can be implementedvia a domain-specific language (Plotnikov et al., 2016) or custom C ++ code. For the purpose ofthis study, we implemented a reward-driven (Eq. 2; Urbanczik and Senn, 2009) and an error-drivenlearning rule (Eq. 4; Urbanczik and Senn, 2014), as well as a homeostatic STDP rule (Eq. 12;Masquelier, 2017) via custom C ++ code. Due to the specific implementation of spike delivery inNEST, we introduce a constant in the STDP rule that is added at each facilitation call instead ofusing a separate depression term. To support arbitrary mathematical expressions in the error-driven(Eq. 6) and correlation-driven synapse models (Eq. 13), we additionally implemented variants inwhich the weight update can be specified via SymPy compatible strings (Meurer et al., 2017)that are parsed by SymEngine (https://github.com/symengine/symengine), a C ++ library forsymbolic computation. All custom synapse models and necessary kernel patches are available asNEST modules in the repository accompanying this study (https://github.com/Happy-Algorithms-League/e2l-cgp-snn). All experiments were performed on JUWELS (J¨ulich Wizard for European Leadership Science), anHPC system at the J¨ulich Research Centre, J¨ulich, Germany, with 12 Petaflop peak performance.The system contains 2271 general-purpose compute nodes, each equipped with two Intel XeonPlatinum 8168 processors (2x24cores) and 12x8GB main memory. Compute nodes are connectedvia an EDR-Infiniband fat-tree network and run CentOS 7. Each experiment employed a singlecompute node.
We consider a reinforcement learning task for spiking neurons similar to Urbanczik and Senn (2009).Spiking activity of the output neuron is generated by an inhomogeneous Poisson process withinstantaneous rate φ determined by its membrane potential u (Pfister et al., 2006; Urbanczik andSenn, 2009): φ ( u ) := ρ e u − u th∆ u . (19)Here, ρ is the firing rate at threshold, u th the threshold potential, and ∆ u a parameter governingthe noise amplitude. In contrast to Urbanczik and Senn (2009), we consider an instantaneousreset of the membrane potential after a spike instead of an hyperpolarization kernel. The outputneuron receives spike trains from sources randomly drawn from an input population of size N withrandomly initialized weights ( w initial ∼ N (0 , σ w )). Before each pattern presentation, the outputneurons membrane potential and synaptic currents are reset. The eligibility trace in every synapseis updated in continuous time according to the following differential equation (Urbanczik and Senn,2009): τ M ˙ E r j = − E r j + 1∆ u (cid:34)(cid:88) s ∈ y δ ( t − s ) − φ ( u ( t )) (cid:35) ¯ s j ( t ) , (20)where τ M governs the time scale of the eligibility trace, ∆ u is a parameter of the postsynaptic cellgoverning its noise amplitude, y represents the postsynaptic spike train, and ¯ s j ( t ) = ( κ ∗ s j )( t ) thepresynaptic spike train s j filtered by the synaptic kernel κ . The learning rate η was manually tunedto obtain high performance with Eq. 2. 15 .6 Error-driven learning task We consider an error-driven learning task for spiking neurons similar to Urbanczik and Senn (2014). N Poisson inputs with constant rates ( r i ∼ U [ r min , r max ] , i ∈ [1 , N ]) project to a teacher neuronand, with the same connectivity pattern, to a student neuron. Spiking activity of the output neuronis generated by an inhomogeneous Poisson process with instantaneous rate φ determined by itsmembrane potential u (Pfister et al., 2006; Urbanczik and Senn, 2009): φ ( u ) := ρ e u − u th∆ u . (21)Here, ρ is the firing rate at threshold, u th the threshold potential, and ∆ u a parameter governingthe noise amplitude. In contrast to Sec. 4.5, the membrane potential is not reset after spike emission.Fixed synaptic weights from the inputs to the teacher are uniformly sampled from the interval[ w min , w max ], while weights to the student are all initialized to a fixed value w . In each trial werandomly shift all teacher weights by a global value w shift to avoid a bias in the error signal thatmay arise if the teacher membrane potential is initially always larger or always smaller than thestudent membrane potential. Target potentials are read out from the teacher every δt and providedinstantaneously to the student. The learning rate η was chosen via grid search on a single exampletask for high performance with Eq. 4. We consider an correlation-driven learning task for spiking neurons similar to Masquelier (2017):a spiking neuron, modeled as a leaky integrate-and-fire neuron with delta-shaped post-synapticcurrents, receives stochastic spike trains from N inputs via plastic synapses.To construct the input spike trains, we first create a frozen-noise pattern by drawing randomspikes S pattern i ∈ [0 , T pattern ] , i ∈ [0 , N −
1] from a Poisson process with rate ν . Neurons that fire atleast once in this pattern are in the following called “pattern neurons”, the remaining are called“background neurons”. We alternate this frozen-noise pattern with random spike trains of length T inter generated by a Poisson process with rate ν (Fig. 5B). To balance the average rates of patternneurons and background neurons, we reduce the spike rate of pattern neurons in between patternsby a factor α . Background neurons have an average rate of ν inter = ν T inter T inter + T pattern . We assume thatpattern neurons spike only once during the pattern. Thus, they have an average rate of rate of ν = αν inter + T inter + T pattern = αν inter + ν pattern . Plugging in the previous expression for ν inter andsolving for α yields α = 1 − ν pattern ν inter . We choose the same learning rate as Masquelier (2017). The authors declare no competing interests.
References
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., and de Freitas,N. (2016). Learning to learn by gradient descent by gradient descent. In
Advances in NeuralInformation Processing Systems , pages 3981–3989.16rtola, A., Brocher, S., and Singer, W. (1990). Different voltage-dependent thresholds for inducinglong-term depression and long-term potentiation in slices of rat visual cortex.
Nature , 347(6288):69.Bengio, S., Bengio, Y., and Cloutier, J. (1994). Use of genetic programming for the search of a newlearning rule for neural networks. In
Evolutionary Computation, 1994. IEEE World Congress onComputational Intelligence., Proceedings of the First IEEE Conference on , pages 324–327. IEEE.Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. (1992). On the optimization of a synapticlearning rule. In , pages6–8. Univ. of Texas.Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. (1993). Generalization of a parametric learningrule. In
ICANN93 , pages 502–502. Springer.Bengio, Y., Bengio, S., and Cloutier, J. (1990).
Learning a synaptic learning rule . Universit´e deMontr´eal, D´epartement d’informatique et de recherche op´erationnelle.Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution strategies–a comprehensive introduction.
Naturalcomputing , 1(1):3–52.Bi, G.-q. and Poo, M.-m. (1998). Synaptic modifications in cultured hippocampal neurons: depen-dence on spike timing, synaptic strength, and postsynaptic cell type.
J. Neurosci. , 18(24):10464–10472.Billaudelle, S., Stradmann, Y., Schreiber, K., Cramer, B., Baumbach, A., Dold, D., G¨oltz, J., Kungl,A. F., Wunderlich, T. C., Hartel, A., et al. (2019). Versatile emulation of spiking neural networkson an accelerated neuromorphic substrate. arXiv preprint arXiv:1912.12980 .Bohnstingl, T., Scherr, F., Pehle, C., Meier, K., and Maass, W. (2019). Neuromorphic hardwarelearns to learn.
Frontiers in neuroscience , 13.Caporale, N. and Dan, Y. (2008). Spike timing–dependent plasticity: a hebbian learning rule.
Annu.Rev. Neurosci. , 31:25–46.Chalmers, D. J. (1991). The evolution of learning: An experiment in genetic connectionism. In
Connectionist Models , pages 81–90. Elsevier.Clopath, C., B¨using, L., Vasilaki, E., and Gerstner, W. (2010). Connectivity reflects coding: amodel of voltage-based stdp with homeostasis.
Nat. Neurosci. , 13(3):344.Cox, J. and Witten, I. B. (2019). Striatal circuits for reward learning and decision-making.
NatureReviews Neuroscience , page 1.Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P.,Imam, N., Jain, S., et al. (2018). Loihi: A neuromorphic manycore processor with on-chiplearning.
IEEE Micro , 38(1):82–99.Deb, K. (2001).
Multi-objective optimization using evolutionary algorithms , volume 16. John Wiley& Sons.Deneve, S. (2008). Bayesian spiking neurons i: inference.
Neural computation , 20(1):91–117.Dold, D., Bytschok, I., Kungl, A. F., Baumbach, A., Breitwieser, O., Senn, W., Schemmel, J.,Meier, K., and Petrovici, M. A. (2019). Stochasticity from functionwhy the bayesian brain mayneed no noise.
Neural networks , 119:200–213.Dudek, S. M. and Bear, M. F. (1993). Bidirectional long-term modification of synaptic effectivenessin the adult and immature hippocampus.
J. Neurosci. , 13(7):2910–2918.Eppler, J. M., Helias, M., Muller, E., Diesmann, M., and Gewaltig, M.-O. (2009). PyNEST: aconvenient interface to the NEST simulator.
Frontiers in neuroinformatics , 2:12.17rench, R. M. (1999). Catastrophic forgetting in connectionist networks.
Trends in cognitivesciences , 3(4):128–135.Gaier, A. and Ha, D. (2019). Weight agnostic neural networks. arXiv preprint arXiv:1906.04358 .Gewaltig, M.-O. and Diesmann, M. (2007). NEST (Neural Simulation Tool).
Scholarpedia , 2(4):1430.G¨oltz, J., Baumbach, A., Billaudelle, S., Breitwieser, O., Dold, D., Kriener, L., Kungl, A. F.,Senn, W., Schemmel, J., Meier, K., et al. (2019). Fast and deep neuromorphic learning withtime-to-first-spike coding. arXiv preprint arXiv:1912.11443 .Goodfellow, I., Bengio, Y., and Courville, A. (2016).
Deep learning . MIT press.G¨utig, R., Aharonov, R., Rotter, S., and Sompolinsky, H. (2003). Learning input correlationsthrough nonlinear temporally asymmetric hebbian plasticity.
J. Neurosci. , 23(9):3697–3714.Ivakhnenko, A. G. (1971). Polynomial theory of complex systems.
IEEE transactions on Systems,Man, and Cybernetics , 4:364–378.Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of stdp and dopaminesignaling.
Cereb. Cortex , 17(10):2443–2452.Izzo, D., Biscani, F., and Mereta, A. (2017). Differentiable genetic programming. In
EuropeanConference on Genetic Programming , pages 35–51. Springer.Jordan, J., Ippen, T., Helias, M., Kitayama, I., Sato, M., Igarashi, J., Diesmann, M., and Kunkel,S. (2018). Extremely scalable spiking neuronal network simulation code: from laptops to exascalecomputers.
Frontiers in neuroinformatics , 12:2.Jordan, J., Petrovici, M. A., Breitwieser, O., Schemmel, J., Meier, K., Diesmann, M., and Tetzlaff,T. (2019). Deterministic networks for probabilistic computing.
Scientific reports , 9(1):1–17.Kappel, D., Habenschuss, S., Legenstein, R., and Maass, W. (2015). Network plasticity as bayesianinference.
PLoS Comput. Biol. , 11(11):e1004485.Kempter, R., Gerstner, W., and van Hemmen, J. L. (1999). Hebbian learning and spiking neurons.
Phys. Rev. E , 59:4498–4514.Keup, C., K¨uhn, T., Dahmen, D., and Helias, M. (2020). Transient chaotic dimensionality expansionby recurrent networks. arXiv preprint arXiv:2002.11006 .Koza, J. R. (1992).
Genetic programming: on the programming of computers by means of naturalselection , volume 1. MIT press.Koza, J. R. (2010). Human-competitive results produced by genetic programming.
GeneticProgramming and Evolvable Machines , 11(3-4):251–284.Kutschireiter, A., Surace, S. C., Sprekeler, H., and Pfister, J.-P. (2017). Nonlinear bayesian filteringand learning: a neuronal dynamics for perception.
Scientific reports , 7(1):8722.Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learningand neuroscience.
Frontiers in computational neuroscience , 10:94.Masquelier, T. (2017). STDP allows close-to-optimal spatiotemporal spike pattern detection bysingle coincidence detector neurons.
Neuroscience , pages 1–8.Mayr, C., Hoeppner, S., and Furber, S. (2019). Spinnaker 2: A 10 million core processor system forbrain simulation and machine learning. arXiv preprint arXiv:1911.02385 .Meurer, A., Smith, C. P., Paprocki, M., ˇCert´ık, O., Kirpichev, S. B., Rocklin, M., Kumar, A.,Ivanov, S., Moore, J. K., Singh, S., et al. (2017). Sympy: symbolic computing in python.
PeerJComputer Science , 3:e103. 18iconi, T., Rawal, A., Clune, J., and Stanley, K. O. (2020). Backpropamine: training self-modifyingneural networks with differentiable neuromodulated plasticity. arXiv preprint arXiv:2002.10585 .Miller, B. L., Goldberg, D. E., et al. (1995). Genetic algorithms, tournament selection, and theeffects of noise.
Complex systems , 9(3):193–212.Miller, J. and Thomson, P. (2000). Cartesian genetic programming. In
Proc. European Conferenceon Genetic Programming , volume 1802, pages 121–132. Springer.Miller, J. F. (1999). An empirical study of the efficiency of learning boolean functions using acartesian genetic programming approach. In
Proceedings of the 1st Annual Conference on Geneticand Evolutionary Computation-Volume 2 , pages 1135–1142. Morgan Kaufmann Publishers Inc.Miller, J. F. (2011). Cartesian genetic programming. In
Cartesian Genetic Programming , pages17–34. Springer.Moradi, S., Qiao, N., Stefanini, F., and Indiveri, G. (2017). A scalable multicore architecture withheterogeneous memory structures for dynamic neuromorphic asynchronous processors (dynaps).
IEEE transactions on biomedical circuits and systems , 12(1):106–122.Morrison, A., Diesmann, M., and Gerstner, W. (2008). Phenomenological models of synapticplasticity based on spike timing.
Biological cybernetics , 98(6):459–478.Ngezahayo, A., Schachner, M., and Artola, A. (2000). Synaptic activity modulates the induction ofbidirectional synaptic changes in adult mouse hippocampus.
J. Neurosci. , 20(7):2451–2458.Nordlie, E., Gewaltig, M.-O., and Plesser, H. E. (2009). Towards reproducible descriptions ofneuronal network models.
PLoS computational biology , 5(8).Orchard, J. and Wang, L. (2016). The evolution of a generalized neural learning rule. In
NeuralNetworks (IJCNN), 2016 International Joint Conference on , pages 4688–4694. IEEE.Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.,Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A.,Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An imperativestyle, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A.,d’ Alch´e-Buc, F., Fox, E., and Garnett, R., editors,
Advances in Neural Information ProcessingSystems 32 , pages 8024–8035. Curran Associates, Inc.Pfister, J.-P., Dayan, P., and Lengyel, M. (2010). Synapses with short-term plasticity are optimalestimators of presynaptic membrane potentials.
Nature neuroscience , 13(10):1271.Pfister, J.-P., Toyoizumi, T., Barber, D., and Gerstner, W. (2006). Optimal spike-timing-dependentplasticity for precise action potential firing in supervised learning.
Neural computation , 18(6):1318–1348.Plotnikov, D., Rumpe, B., Blundell, I., Ippen, T., Eppler, J. M., and Morrison, A. (2016). NESTML:a modeling language for spiking neurons. arXiv preprint arXiv:1606.02882 .Radi, A. and Poli, R. (2003). Discovering efficient learning rules for feedforward neural networksusing genetic programming. In
Recent advances in intelligent paradigms and applications , pages133–159. Springer.Risi, S. and Stanley, K. O. (2010). Indirectly encoding neural plasticity as a pattern of local rules.In
International Conference on Simulation of Adaptive Behavior , pages 533–543. Springer.Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1985). Learning internal representationsby error propagation. Technical report, California Univ San Diego La Jolla Inst for CognitiveScience.Sacramento, J., Costa, R. P., Bengio, Y., and Senn, W. (2018). Dendritic cortical microcircuitsapproximate the backpropagation algorithm. In
Advances in Neural Information ProcessingSystems , pages 8721–8732. 19chmidt, M. and Jordan, J. (2020). HAL-CGP: Cartesian genetic programming in pure Python.
Zenodo Zenodo:3889163 .Soltoggio, A., Stanley, K. O., and Risi, S. (2018). Born to learn: the inspiration, progress, andfuture of evolved plastic artificial neural networks.
Neural Networks .Topchy, A. and Punch, W. F. (2001). Faster genetic programming based on local gradient search ofnumeric leaf values. In
Proceedings of the 3rd Annual Conference on Genetic and EvolutionaryComputation , pages 155–162. Morgan Kaufmann Publishers Inc.Toyoizumi, T., Pfister, J.-P., Aihara, K., and Gerstner, W. (2005). Generalized bienenstock–cooper–munro rule for spiking neurons that maximizes information transmission.
Proceedings of theNational Academy of Sciences of the United States of America , 102(14):5239–5244.Urbanczik, R. and Senn, W. (2009). Reinforcement learning in populations of spiking neurons.
Nature neuroscience , 12(3):250.Urbanczik, R. and Senn, W. (2014). Learning by the dendritic prediction of somatic spiking.
Neuron ,81(3):521–528.Walt, S. v. d., Colbert, S. C., and Varoquaux, G. (2011). The numpy array: a structure for efficientnumerical computation.
Computing in Science & Engineering , 13(2):22–30.Whittington, J. C. and Bogacz, R. (2019). Theories of error back-propagation in the brain.
Trendsin cognitive sciences .Zador, A. M. (2019). A critique of pure learning and what artificial neural networks can learn fromanimal brains.
Nature communications , 10(1):1–7.Zaytsev, Y. V. and Morrison, A. (2014). CyNEST: a maintainable cython-based interface for theNEST simulator.
Frontiers in neuroinformatics , 8:23.20 B p r e p o s t Figure 6:
Derivation of effective learning rules. (A)
A postsynaptic spike at t = t pairedwith a causal presynaptic spike at t = 0 and anticausal presynaptic spike at t = t + t . (B) Thegeneral form of the learning rules considered in the derivation of Eq. 22. The learning rule variesonly on a restricted interval of time differences between − t ∗ and t ∗ where it is bounded from aboveand below, while it remains constant outside this interval. A Derivation of effective learning rules
We consider the effective weight change resulting from a pair of causal ( t post > t pre ) and anticausal( t post < t pre ) spikes. Note that the approximation of considering only nearest-neighbor spikesbecomes more accurate as the firing of pre- and postsynaptic neurons become more similar. Withoutloss of generality we assume a causal presynaptic spike at t = 0 and a postsynaptic spike at t = t are accompanied by an anticausal presynaptic spike at t = t + t with arbitrary t ≥ ν ,the probability to observe the anticausal spike at t = t + t is given by: p ( t + t | t ) = νe − νt .We consider a plasticity rule ∆ w (∆ t ) which only depends on the time difference ∆ t of pre- andpostsynaptic spikes. Additionally we assume that ∆ w (∆ t ) only varies within | ∆ t | < t ∗ and isconstant outside these bounds (Fig. 6B). Note that this is an approximation to the learningrules described in the main text which are not constant outside specific bounds but rather decayexponentially over a characteristic time scale τ . We account for considering each presynaptic spikein two weight updates, once preceeding, once trailing the presynaptic spike, by introducing thefactor 1 /
2. The resulting weight change from a causal and an anticausal spike can hence be writtenas: ∆ ˜ w ( t ) = 12 ∆ w ( t ) + 12 (cid:104) ∆ w ( − t ) (cid:105) p ( t + t | t ) = 12 ∆ w ( t ) + 12 (cid:90) ∞ dt p ( t + t | t )∆ w ( − t )= 12 ∆ w ( t ) + 12 (cid:90) t ∗ dt νe − νt ∆ w ( − t ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ∆ w max (cid:16) − e − νt ∗ +1 (cid:17) ≈ ν (cid:28) t ∗ ≥ ∆ w min (cid:16) − e − νt ∗ +1 (cid:17) ≈ ν (cid:28) t ∗ + 12 ∆ w − (cid:90) ∞ t ∗ dt νe − νt (cid:124) (cid:123)(cid:122) (cid:125) = ( − e − νt ∗ ) ≈ ν (cid:28) t ∗ ≈
12 ∆ w ( t ) + 12 ∆ w − (22)On line three we used the premise that ∆ w ( t ) is bounded from above and below by ∆ w max and∆ w min , respectively. Additionally, we considered a regime in which presynaptic firing rates aresmall compared to the timescale on which ∆ w ( t ) varies significantly ( ν (cid:28) t ∗ ). The characteristictime scale of learning rules in the main text is τ = 20 ms. With a presynaptic firing rate of 3 Hzthis approximation holds in our experiments. 21 Simulation details
Table 1, Table 2, and Table 3 summarize the network models used in the experiments.22
Model summaryPopulations Topology — Connectivity
Feedforward with fixed connection probability
Neuron model
Leaky integrate-and-fire (LIF) with exponential post-synaptic currents
Plasticity
Reward-driven
Measurements
Spikes
B PopulationsName Elements Size
Input Spike generators with pre-defined spike trains (see Sec. 4.5) N Output LIF neuron 1
C ConnectivitySource Target Pattern
Input Output Fixed pairwise connection probability p ; synapticdelay d ; random initial weights from N (0 , σ w ) D Neuron modelType
LIF neuron with exponential post-synaptic currents
Subthreshold dy-namics d u ( t )d t = − u ( t ) − E L τ m + I s ( t ) C m if not refractory u ( t ) = u r else I s ( t ) = (cid:80) i,k w k e − ( t − t ki ) /τ s Θ( t − t ki ), k : neuron index, i : spike index Spiking
Stochastic spike generation via inhomogeneous Poisson process with in-tensity φ ( u ) = ρ e ( u − u th ) / ∆ u ; reset of u to u r after spike emission andrefractory period of τ r E Synapse modelPlasticity
Reward-driven with episodic update (Eq. 2, Eq. 3)
Other
Each synapse stores an eligibility trace (Eq. 20)
F Simulation ParametersPopulations N = 50 Connectivity p = 0 . , σ w = 10 pA Neuron model ρ = 0 .
01 Hz , ∆ u = 0 . , E L = −
70 mV , u r = −
70 mV , u th = −
55 mV , τ m = 10 ms , C m = 250 pF , τ r = 2 ms , τ s = 2 ms Synapse model η = 10 , τ M = 500 ms , d = 1 ms Input M = 30 , r = 6 Hz , T = 500 ms , n training = 500 , n exp = 10 Other h = 0 .
01 ms
G CGP ParametersPopulation µ = 4 , p mutation = 0 . Genome n inputs = 2 , n outputs = 1 , n rows = 1 , n columns = 5 , l max = 5 Primitives
Add, Sub, Mul, Div, Const(1.0) EA λ = 4 , n breeding = 4 , n tournament = 1 Other max generations = 500 , minimal fitness = 300Table 1: Description of the network model used in the reward-driven learning task (Sec. 4.5)(according to Nordlie et al., 2009). 23 Model summaryPopulations Topology — Connectivity
Feedforward with all-to-all connections
Neuron model
Leaky integrate-and-fire (LIF) with exponential post-synaptic currents
Plasticity
Error-driven
Measurements
Spikes, membrane potentials
B PopulationsName Elements Size
Input Spike generators with pre-defined spike trains (see Sec. 4.6) N Teacher LIF neuron 1Student LIF neuron 1
C ConnectivitySource Target Pattern
Input Teacher All-to-all; synaptic delay d ; random weights w ∼U [ w min , w max ]; weights randomly shifted by w shift oneach trialInput Student All-to-all; synaptic delay d ; fixed initial weights w D Neuron modelType
LIF neuron with exponential post-synaptic currents
Subthreshold dy-namics d u ( t )d t = − u ( t ) − E L τ m + I s ( t ) C m I s ( t ) = (cid:80) i,k J k e − ( t − t ki ) /τ s Θ( t − t ki ) k : neuron index, i : spike index Spiking
Stochastic spike generation via inhomogeneous Poisson process with in-tensity φ ( u ) = ρ e ( u − u th ) / ∆ u ; no reset after spike emission E Synapse modelPlasticity
Error-driven with continuous update (Eq. 4, Eq. 6)
F Simulation ParametersPopulations N = 5 Connectivity w min = − , w max = 20 , w shift ∼ {− , } , w = 5 Neuron model ρ = 0 . , ∆ u = 1 . , E L = −
70 mV , u th = −
55 mV , τ m =10 ms , C m = 250 pF , τ s = 2 ms Synapse model η = 1 . , d = 1 ms Input r min = 150 Hz , r max = 850 Hz , T = 10 ,
000 ms , n exp = 15
Other h = 0 .
01 ms , δt = 5 ms
G CGP ParametersPopulation µ = 4 , p mutation = 0 . Genome n inputs = 3 , n outputs = 1 , n rows = 1 , n columns = 12 , l max = 12 Primitives
Add, Sub, Mul, Div, Const(1.0) EA λ = 4 , n breeding = 4 , n tournament = 1 Other max generations = 1000 , minimal fitness = 0 . Model summaryPopulations Topology — Connectivity
Feedforward with all-to-all connections
Neuron model
Leaky integrate-and-fire (LIF) with delta-shaped post-synaptic currents
Plasticity
Correlation-driven
Measurements
Spikes, membrane potentials
B PopulationsName Elements Size
Input Spike generators with pre-defined spike trains (see Sec. 4.7) N Output LIF neuron 1
C ConnectivitySource Target Pattern
Input Output All-to-all; synaptic delay d ; initial weights chosen toyield an approximate initial firing rate of 20 Hz inthe output neuron w = u th ( Nν − √ Nν ) / (10 − τ m ) D Neuron modelType
LIF neuron with delta-shaped post-synaptic currents
Subthreshold dy-namics d u d t = − u − E L τ m + I s ( t ) C m if (t > t ∗ + τ r ) u ( t ) = u r else I s ( t ) = (cid:80) i,k w k δ ( t − t ki ) k : neuron index, i : spike index Spiking If u ( t − ) < θ ∧ u ( t +) ≥ θ
1. set t ∗ = t , 2. emit spike with time stamp t ∗ E Synapse modelPlasticity
Correlation-driven (Eq. 12, Eq. 14, Eq. 15)
F Simulation parametersNeuron model u r = 0 . , E L = 0 . , u th = 20 . , τ m = 18 . Synapse model d = 1 . W max = 1 . τ = 20 . η = 0 . Experiment N = 1000, T = 100 s, T pattern = 100 . T inter = 400 . ν = 3 . n exp = 8 F CGP parametersPopulation µ = 8 , p mutation = 0 . Genome n inputs = 2 , n outputs = 1 , n rows = 1 , n columns = 5 , l max = 5 Primitives
Add, Sub, Mul, Div, Pow, Const(1.0) EA λ = 8 , n breeding = 8 , n tournament = 1 Other max generations = 2000 , minimal fitness = 10 ..