[PDF] Evolving to learn: discovering interpretable plasticity rules for spiking networks

Abstract

Continuous adaptation allows survival in an ever-changing world. Adjustments in the synaptic coupling strength between neurons are essential for this capability, setting us apart from simpler, hard-wired organisms. How these changes can be mathematically described at the phenomenological level, as so called "plasticity rules", is essential both for understanding biological information processing and for developing cognitively performant artificial systems. We suggest an automated approach for discovering biophysically plausible plasticity rules based on the definition of task families, associated performance measures and biophysical constraints. By evolving compact symbolic expressions we ensure the discovered plasticity rules are amenable to intuitive understanding, fundamental for successful communication and human-guided generalization. We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning from target signals, and uncover various functionally equivalent STDP-like rules with tuned homeostatic mechanisms.

Full PDF

EEvolving to learn:discovering interpretable plasticity rules for spiking networks

Jakob Jordan ∗ , Maximilian Schmidt ∗ , Walter Senn , and Mihai A. Petrovici Department of Physiology, University of Bern, Bern, Switzerland Ascent Robotics, Tokyo, Japan RIKEN Center for Brain Science, Tokyo, Japan Kirchhoﬀ-Institute for Physics, Heidelberg University, Heidelberg, Germany

Abstract

Continuous adaptation allows survival in an ever-changing world. Adjustments in thesynaptic coupling strength between neurons are essential for this capability, setting us apartfrom simpler, hard-wired organisms. How these adjustments come about is essential both forunderstanding biological information processing and for developing cognitively performantartiﬁcial systems. We suggest an automated approach for discovering biophysically plausibleplasticity rules based on the deﬁnition of task families, associated performance measuresand biophysical constraints. This approach makes the relative weighting of guiding factorsexplicit, explores large search spaces, encourages diverse sets of hypotheses, and can discoverdomain-speciﬁc solutions. By evolving compact symbolic expressions we ensure the discoveredplasticity rules are amenable to intuitive understanding. This is fundamental for successfulcommunication and human-guided generalization, for example to diﬀerent network architecturesor task domains. We demonstrate the ﬂexibility of our approach by discovering eﬃcient plasticityrules in typical learning scenarios.

Keywords: metalearning, learning to learn, synaptic plasticity, spiking neuronal networks,genetic programming

How do we learn? Whether we are memorizing the way to the lecture hall at a conference ormastering a new sport, somehow our central nervous system is able to retain the relevant informationover extended periods of time, sometimes with ease, other times only after intense practice. Thisacquisition of new memories and skills manifests at various levels of the system, with changes of theinteraction strength between neurons being a key ingredient. Uncovering the mechanisms behindthis synaptic plasticity is a key challenge in understanding brain function. Most studies approachthis monumental task by searching for symbolic expressions that map local biophysical quantitiesto changes of the connection strength between cells (Fig. 1A, B).Approaches to deciphering synaptic plasticity can be broadly categorized into bottom-up andtop-down. Bottom-up approaches typically rely on experimental data (e.g., Artola et al., 1990;Dudek and Bear, 1993; Bi and Poo, 1998; Ngezahayo et al., 2000) to derive dynamic equationsfor synaptic parameters that lead to functional emergent macroscopic behavior if appropriatelyembedded in networks (e.g., G¨utig et al., 2003; Izhikevich, 2007; Clopath et al., 2010). Top-downapproaches proceed in the opposite direction: from a high-level description of network function,e.g., in terms of an objective function (e.g., Toyoizumi et al., 2005; Deneve, 2008; Kappel et al.,2015; Kutschireiter et al., 2017; Sacramento et al., 2018; G¨oltz et al., 2019), dynamic equations forsynaptic changes are derived and biophysically plausible implementations suggested. Evidently, thisdemarcation is not strict, as most approaches seek some balance between experimental evidence, ∗ These authors contributed equally to this work. a r X i v : . [ q - b i o . N C ] J un CB Time P S P B D O ﬀ spring production via mutationExperimentalsetup Evolution Hypotheses T a s k f a m il y N e t w o r k a r c h i t e c t u r e F i t n e s s f u n c t i o n O ﬀ spring production Selection

Figure 1:

Artiﬁcial evolution of synaptic plasticity rules in spiking neuronal networks.(A)

Sketch of cortical microcircuits consisting of pyramidal cells (orange) and inhibitory interneurons(blue). Stimulation elicits action potentials in pre- and postsynaptic cells, which, in turn, inﬂuencesynaptic plasticity. (B)

Synaptic plasticity leads to a weight change (∆ w ) between the two cells,here measured by the change in the amplitude of post-synaptic potentials. The change in synapticweight can be expressed by a function f that in addition to spike timings ( t pre , t post ) can take intoaccount additional local quantities, such as the concentration of neuromodulators ( ρ , green dots inA) or postsynaptic membrane potentials. (C) For a speciﬁc experimental setup, an evolutionaryalgorithm searches for individuals representing functions f that maximize the corresponding ﬁtnessfunction F . An oﬀspring is generated by modifying the genome of a parent individual. Several runsof the evolutionary algorithm can discover phenomenologically diﬀerent solutions ( f , f , f ) withcomparable ﬁtness. (D) An oﬀspring is generated from a single parent via mutation. Mutations ofthe genome can, for example, exchange mathematical operators, resulting in a diﬀerent function f .functional considerations and model complexity. However, the relative weighting of each of theseaspects is usually not made explicit in the communication of scientiﬁc results, making it diﬃcult totrack by other researchers. Furthermore, the selection of speciﬁc tasks to illustrate the eﬀect of asuggested learning rule is usually made only after the rule was derived based on other considerations.Hence, this typically does not consider competing alternative solutions, as an exhaustive comparisonwould require signiﬁcant additional investment of human resources. A related problem is thatresearchers, in a reasonable eﬀort to use resources eﬃciently, tend to focus on promising parts ofthe search space around known solutions, leaving large parts of the search space unexplored (Radiand Poli, 2003).We suggest an automated approach to discover learning rules in spiking neuronal networksthat explicitly addresses these issues. Automated procedures interpret the search for biologicalplasticity mechanisms as an optimization problem (Bengio et al., 1992), an idea typically referredto as meta-learning or learning-to-learn. These approaches make the emphasis of particular aspectsthat guide this search explicit and place the researcher at the very end of the process, supportingmuch larger search spaces and the generation of a diverse set of hypotheses. Furthermore, theyhave the potential to discover domain-speciﬁc solutions that are more eﬃcient than general-purposealgorithms. Early experiments focusing on learning in artiﬁcial neural networks (ANNs) made useof gradient descent or genetic algorithms to optimize parameterized learning rules (Bengio et al.,1990, 1992, 1993) or genetic programming to evolve less constrained learning rules (Bengio et al.,1994; Radi and Poli, 2003), rediscovering mechanisms resembling the backpropagation of errors(Ivakhnenko, 1971; Rumelhart et al., 1985). Recent experiments demonstrate how optimizationmethods can design optimization algorithms for recurrent ANNs (Andrychowicz et al., 2016).2e extend these meta-learning ideas to spiking neuronal networks. The discrete nature of spike-based neuronal interactions endows these networks with rich dynamical and functional properties(e.g., Dold et al., 2019; Jordan et al., 2019; Keup et al., 2020). In addition, with the advent ofnon-von Neumann computing systems based on spiking neuronal networks with online learningcapabilities (Moradi et al., 2017; Davies et al., 2018; Billaudelle et al., 2019), eﬃcient learningalgorithms for spiking systems become increasingly relevant for non-conventional computing. Here,we employ genetic programming (Fig. 1C, D; Koza, 2010) as a search algorithm for two mainreasons. First, genetic programming can operate on analytically tractable mathematical expressionsdescribing synaptic weight changes that are interpretable. Second, an evolutionary search doesnot need to compute gradients in the search space, thereby circumventing the need to estimate agradient in non-diﬀerentiable systems.We apply our approach to three diﬀerent learning paradigms for spiking neuronal networks:reward-driven, error-driven and correlation-driven learning. We discover eﬃcient plasticity rulesfor these task families that perform competitively with known solutions. For the reward-driventask, we reliably reproduce a previously suggested solution that achieves high performance. Inthe error-driven learning scenario, the evolutionary search discovers a variety of solutions which,with appropriate analysis of the corresponding expressions, can be shown to eﬀectively implementstochastic gradient descent. Finally, in the correlation-driven task, our method generates varioushypotheses for STDP-like plasticity rules with diverse weight dynamics that match the performanceof a rule derived under optimality considerations. Our results demonstrate the signiﬁcant potentialof automated procedures in the search for plasticity rules in spiking neuronal networks, analogous tothe the transition from hand-designed to learned features that lies at the heart of modern machinelearning. We introduce the following recipe to search for biophysically plausible plasticity rules in spikingneuronal networks. First we determine a task family of interest and an associated experimentalsetup which includes speciﬁcation of the network architecture, e.g., neuron types and connectivity,as well as stimulation protocols or training data sets. Crucially, this step involves deﬁning a ﬁtnessfunction to guide the evolutionary search towards promising regions of the search space. It assignshigh ﬁtness to those individuals, i.e., learning rules, that solve the task well and low ﬁtness to others.The ﬁtness function may additionally contain constraints implied by experimental data or arisingfrom computational considerations. We determine each individual’s ﬁtness on various examples fromthe given task family, e.g., diﬀerent input spike train realizations, to discover plasticity rules thatgeneralize well (Chalmers, 1991; Soltoggio et al., 2018). Finally, we specify the neuronal variablesavailable to the plasticity rule, such as low-pass-ﬁltered traces of pre- and postsynaptic spikingactivity or neuromodulator concentrations. This choice is guided by biophysical considerations,e.g., which quantities are locally available at a synapse, as well as by the task family, e.g., whetherreward or error signals are provided by the environment. We write the plasticity rule in the generalform ∆ w = ηf ( . . . ), where η is a ﬁxed learning rate, and employ an evolutionary search to discoverfunctions f that lead to high ﬁtness.We propose to use genetic programming (GP) as an evolutionary algorithm to discover plasticityrules in spiking neuronal networks. GP applies mutations and selection pressure to an initiallyrandom population of computer programs to artiﬁcially evolve algorithms with desired behaviors(e.g., Koza, 1992, 2010). Here we consider the evolution of mathematical expressions. We employ aspeciﬁc form of GP, Cartesian genetic programming (CGP; e.g, Miller and Thomson, 2000; Miller,2011), that uses an indexed graph representation of programs. The genotype of an individual is atwo-dimensional Cartesian graph (Fig. 2A, top). Over the course of an evolutionary run, this graphhas a ﬁxed number of input, output, and internal nodes. The operation of each internal node isfully described by a single function gene and a ﬁxed number of input genes. A function table mapsfunction genes to mathematical operations (Fig. 2A, bottom), while input genes determine fromwhere this node receives data. A given genotype is decoded into a corresponding computationalgraph (the phenotype, Fig. 2B) which deﬁnes a function f . During the evolutionary run, mutations3 utationsPhenotype connectivitygenefunctiongene GenotypeFunction table A B C

02 345 678 91 02 34 6 9

02 4 9 01 4 902 4 9 02 4 9

Inputs Internal nodes Outputs

Figure 2:

Representation and mutation of mathematical expressions in Cartesian ge-netic programming. (A)

The genotype of an individual is a two-dimensional Cartesian graph(top). In this example, the graph contains three input nodes (0 − −

8) anda single output node (9). In each node the genes of a speciﬁc genotype are shown, encoding theoperator used to compute the node’s output and its inputs. Each operator gene maps to a speciﬁcmathematical function (bottom). Special values ( − , −

2) represent input and output nodes. Forexample, node 4 uses the operator 1, the multiplication operation “ ∗ ”, and receives input fromnodes 0 and 2. This node’s output is hence given by x ∗ x . The number of input genes per nodeis determined by the operator with the maximal arity (here two). Red genes highlight ﬁxed genesthat can not be mutated. ∅ denotes non-coding genes. (B) The computational graph (phenotype)generated by the genotype in A. Input nodes ( x , x , x ) represent the arguments of the function f .Each output node selects one of the other nodes as a return value of the computational graph, thusdeﬁning a function from input x to output y = f ( x ). Here, the output node selects node 4 as areturn value. Some nodes deﬁned in the genotype are not used by a particular realization of thecomputational graph (light gray nodes, e.g., node 6). (C) Mutations in the genome either lead to achange in graph connectivity (top, green arrow) or alter the operators used by an internal node(bottom, green node). Mutations that do not aﬀect the phenotype, e.g., in node 6, are called silentmutations. Here, both mutations aﬀect the phenotype and are hence not silent.of the genotype alter connectivity and node operations, which can modify the encoded function(Fig. 2C). The indirect encoding of the computational graph via the genotype supports variable-length phenotypes, since some internal nodes may not be used to compute the output (Fig. 2B). Thesize of the genotype, in contrast, is ﬁxed, thereby restricting the maximal size of the computationalgraph and ensuring compact, interpretable mathematical expressions. Furthermore, the separationinto genotype and phenotype allows for “silent mutations”, i.e., mutations in the genotype thatdo not alter the phenotype. These silent mutations can lead to more eﬃcient search as they canaccumulate and in combination lead to an increase in ﬁtness once aﬀecting the phenotype (Millerand Thomson, 2000). A µ + λ evolution strategy (Beyer and Schwefel, 2002) drives evolution bycreating the next generation of individuals from the current one via tournament selection, mutationand selection of the ﬁttest individuals (Sec. 4.1). Prior to starting the search, we choose themathematical operations that can appear in the plasticity rule and other (hyper)parameters ofthe Cartesian graph and evolutionary algorithm. For simplicity, we consider a restricted set ofmathematical operations and additionally make use of nodes with constant output. After the searchhas completed, e.g., by reaching a target ﬁtness value or a maximal number of generations, we4 N e u r o n Early training

246 248 250

Late training

Time (s) s Generation index F i t n e ss E r j ( R − E r j ( R − E r j ( R − /R E r j ( R − Generation index

R RE r j E r j ( R − RE r j − R − E r j E r j ( R − A BC D

RewardInput Output TargetEligibilitytrace P l a s t i c i t y r u l e Figure 3:

Cartesian genetic programming evolves an eﬃcient reward-driven learningrule. (A)

Network sketch. Multiple input neurons with Poisson activity project to a single outputunit. Pre- and postsynaptic activity generate an eligibility trace in each synapse. Comparisonbetween the output activity and the target activity generates a reward signal. Both of these areprovided to the plasticity rule that determines the weight update. (B)

Raster plot of the activityof input neurons (small black dots) and output neuron (large red dots). Gray (white) backgroundindicate patterns for which the output should be active (inactive). Check marks indicate correctclassiﬁcations. We show 8 trials at the beginning (left) and the end of training (right) using theevolved plasticity rule: ∆ w j = ηE r j ( T )( R − (C) Fitness of best individual per generation as afunction of the generation index for multiple runs of the evolutionary algorithm with diﬀerent initialconditions. Labels represent the rule at the end of the respective run for the four runs with thehighest ﬁnal ﬁtness (colored lines). Dark gray lines represent runs with low ﬁnal ﬁtness. Light grayline indicates the ﬁtness of Eq. 2. Blue marker with error bars represents ﬁtness of the discoveredrule ∆ w j = ηE r j ( T )( R −

1) on 10 validation tasks not used during the evolutionary search. (D)

Function f encoded in the best individual as a function of the generation index for one run ofthe evolutionary algorithm. Expressions are only shown after the generation in which a change inhighest ﬁtness occurred.analyze the discovered set of solutions.In the following, we describe the results of three experiments following the recipe outlined above. We consider a reinforcement-learning task for spiking neurons similar to Urbanczik and Senn(2009). The experiment can be succinctly described as follows: N inputs project to a single readoutmodeled by a leaky integrator neuron with exponential postsynaptic currents and stochastic spikegeneration (for details see Sec. 4.5). We generate a ﬁnite number M of frozen-Poisson-noise patternsof duration T and assign each of these randomly to one of two classes. The output neuron shouldlearn to classify each of these spatio-temporal input patterns into the corresponding class using aspike/no-spike code (Fig. 3A, B).The ﬁtness F ( f ) of an individual encoding the function f is measured by the average cumulative5eward obtained over a certain number of experiments n exp , each consisting of n classiﬁcation tasks: F ( f ) := 1 n exp n exp (cid:88) k =1 R k ( f ) , (1)where R k ( f ) = (cid:80) ni =1 R k,i ( f ) is the cumulative reward obtained in experiment k . The reward R k,i is 1 if the classiﬁcation is correct and − w j = ηE r j ( t )( R − . (2)Here η represents a ﬁxed learning rate, and j represents the presynaptic neuron index. E r j is aneligibility trace that contains contributions from the spiking activity of pre- and post-synapticneurons which is updated over the course of a single trial (for details see Sec. 4.5). Similarly toUrbanczik and Senn (2009), reward and eligibility trace values at the end of each trial ( t = T ) areused to determine synaptic weight changes. The plasticity rule can be written in the followinggeneral form where, for comparability, we choose the same learning rate as for the original rule(Eq. 2): ∆ w j = η f (cid:0) R, E r j ( T ) (cid:1) . (3)Using CGP with two inputs ( R , E r j ( T )) we search for plasticity rules that maximize the ﬁtness F ( f ) (Eq. 1).In about half of the runs the the evolutionary algorithm evolves a plasticity rule identical tothe one derived in previous work (Eq. 2), which achieves high ﬁtness (Fig. 3C). As the rewardis constrained to R ∈ {− , } the task includes a degeneracy and diﬀerent functions f can leadto identical weight changes. Some of these phenomenologically identical expressions were foundby the evolutionary algorithm, such as E r j ( R − R , RE r j (1 − R ) , − E r j + E r j R . An essential beneﬁtof our approach is that we can identify such classes of solutions that lead to identical weightchanges thanks to a crucial property of genetic programming: since we obtain analytically tractableexpressions for the plasticity rule, we can analyze them with conventional methods, in contrastto approaches representing plasticity rules with ANNs (e.g., Risi and Stanley, 2010; Orchardand Wang, 2016; Bohnstingl et al., 2019), for which it is challenging to fully understand theirmacroscopic computation. A further advantage of obtaining analytical expressions is the possibilityto interpret the evolutionary steps by which the search progresses. We observe that already after afew generations the algorithm discovers an important basic ingredient for learning from rewards:the factor RE r j (Fig. 3D). This product, for example, increases weights of projections which havecontributed to the correct spiking of the output neuron, causing these neurons to be even moreactive the next time this speciﬁc pattern is encountered. However, this rule is not stable: due torandomness in spike generation, weight changes do not subside even if the neuron always respondscorrectly and can thus lead to unlearning of good weight conﬁgurations. Over the next few hundredgenerations, reﬁnements of this solution evolve without losing the basic structure (Fig. 3D). Finally,a stable solution for this task is found: evolution discovers that weights should only change if theclassiﬁcation was wrong and remain unchanged if it was correct, i.e., the product is replaced by E r j ( R − We next consider a supervised learning task in which a neuron receives information about how farits output is from the desired behavior, instead of just a scalar reward signal as in the previous task.The widespread success of this approach in machine learning highlights the eﬃcacy of learning6

10 20

Time (s) − − k v ( t ) − u ( t ) k Time (s) S y n a p t i c w e i g h t Generation index − − − F i t n e ss ( v − u )¯ s j (2 u − /vu ( − v + u ) / (2( v ¯ s j + u ))¯ s j ( v + u )( v ( v − u ) − ¯ s j ) /v v ( v − u )¯ s j ( u − / ( v − u ¯ s j ( u − v − u )¯ s j ( v − u )¯ s j − − − A B CD P l a s t i c i t y r u l e Input Teacher potentialStudent potentialFiltered spike train

Figure 4:

Cartesian genetic programming evolves eﬃcient error-driven learning rules.(A)

Network sketch. Multiple input neurons with Poisson activity project to two neurons. Oneof the neurons (the teacher) generates a target for the other (the student). The membranepotentials of teacher and student as well as the ﬁltered pre-synaptic spike trains are provided tothe plasticity rule that determines the weight update. (B)

Root mean squared error between theteacher and student membrane potential over the course of learning using the evolved plasticityrule: ∆ w j ( t ) = η [ v ( t ) − u ( t )] ¯ s j ( t ). (C) Synaptic weights over the course of learning correspondingto panel B. Horizontal dashed lines represent target weights, i.e., the ﬁxed synaptic weights ontothe teacher. (D)

Fitness of the best individual per generation as a function of the generation indexfor multiple runs of the evolutionary algorithm with diﬀerent initial conditions. Labels representthe rule at the end of the respective run. Colored markers with error bars represent ﬁtness of therespective plasticity rule on 15 validation tasks not used during the evolutionary search.from errors compared to correlation- or reward-driven learning (Goodfellow et al., 2016). It hastherefore often been hypothesized that evolution has installed similar capabilities in biologicalnervous systems (see, e.g., Marblestone et al., 2016; Whittington and Bogacz, 2019).Urbanczik and Senn (2014) introduce an eﬃcient, ﬂexible and biophysically plausible imple-mentation of error-driven learning via multi-compartment neurons. For simplicity, we consider anequivalent formulation of this learning principle in terms of two point neurons modeled as leakyintegrator neurons with exponential postsynaptic currents and stochastic spike generation. Oneneuron mimics the somatic compartment and provides a teaching signal to the other neuron actingas the dendritic compartment. Here, the diﬀerence between the teacher and student membranepotentials drives learning: ∆ w j ( t ) = η [ v ( t ) − u ( t )] ¯ s j ( t ) , (4)where v is the teacher potential, u the student membrane potential, and η a ﬁxed learning rate.¯ s j ( t ) = ( κ ∗ s j )( t ) represents the the presynaptic spike train s j ﬁltered by the synaptic kernel κ .Eq. 4 can be interpreted as stochastic gradient descent on an appropriate cost function (Urbanczikand Senn, 2014) and can be extended to support credit assignment in hierarchical neuronal networks(Sacramento et al., 2018). For simplicity we assume the student has direct access to the teacher’smembrane potential, but in principle one could also employ proxies such as ﬁring rates as suggestedin Pﬁster et al. (2010); Urbanczik and Senn (2014).We consider a one-dimensional regression task in which we feed random Poisson spike trainsinto the two neurons (Fig. 4A). The teacher maintains ﬁxed input weights while the student’sweights should be adapted over the course of learning such that it’s membrane potential follows the7eacher’s (Fig. 4B, C). The ﬁtness F ( f ) of an individual encoding the function f is measured by theroot mean-squared error between the teacher and student membrane potential over the simulationduration T , excluding the initial 10%, averaged over n exp experiments: F ( f ) := 1 n exp n exp (cid:88) k =1 (cid:118)(cid:117)(cid:117)(cid:116) T (cid:88) t =0 . T [ v k ( t ) − u k ( t )] . (5)Each experiment consists of diﬀerent input spike trains and diﬀerent teacher weights. The generalform of the plasticity rule for this error-driven learning task is given by:∆ w j = ηf ( v, u, ¯ s j ) . (6)Using CGP with three inputs ( v, u, ¯ s j ), we search for plasticity rules that maximize the ﬁtness F ( f ).Starting from low ﬁtness, about half of the evolutionary runs evolve eﬃcient plasticity rules(Fig. 4D) closely matching the performance of the stochastic gradient descent rule of Urbanczikand Senn (2014). While two runs evolve exactly Eq. 4, other solutions with comparable ﬁtness arediscovered, such as ∆ w j = η ( v − u )¯ s j (2 u − v , and (7)∆ w j = η ¯ s j ( v + u ) ( v ( v − u ) − ¯ s j ) v . (8)At ﬁrst sight, these rules may appear more complex, but for the considered parameter regime underthe assumptions v ≈ u ; v, u (cid:28)

1, we can write them as:∆ w j = ηc ( v − u )¯ s j + c , (9)with a multiplicative constant c = O (1) and a negligible additive constant c . Elementarymanipulations of the expressions found by CGP thus demonstrate the similarity of these superﬁciallydiﬀerent rules to Eq. 4. Consequently, they can be interpreted as approximations of the gradient-descent plasticity rule. The constants generally fall into two categories: ﬁne-tuning of the learningrate for the set of task samples encountered during evolution ( c ) and factors that have negligibleinﬂuence and would most likely be pruned over longer evolutionary timescales ( c ). This pruningcould be accelerated, for example, by imposing a penalty on the model complexity in the ﬁtnessfunction, thus preferentially selecting simpler solutions.In conclusion, the evolutionary search rediscovers variations of a human-designed eﬃcientgradient-descent-based learning rule for the considered error-driven learning task. Due to thecompact, interpretable representation of the plasticity rules we are able to analyze the set ofhigh-performing solutions and thereby identify phenomenologically identical rules despite theirsuperﬁcial diﬀerences. We now consider a task in which neurons to not receive any feedback from the environment abouttheir performance but instead only have access to correlations between pre- and postsynapticactivity. For this paradigm Masquelier (2017) proposes a correlation-driven plasticity rule thatallows neurons to discover a repeating frozen-noise pattern interrupted by random backgroundspikes. Their experimental setup is brieﬂy described as follows: N inputs project to a single outputneuron (Fig. 5A). The activity of all inputs is determined by a Poisson process with a ﬁxed rate.A frozen-noise activity pattern of duration T pattern is generated once and replayed every T inter ms(Fig. 5B) while inputs are randomly spiking in between.We deﬁne the ﬁtness F ( f ) of an individual encoding the function f by the minimal averagesignal-to-noise ratio SNR across n exp experiments: F ( f ) := min k { SNR k , k ∈ [1 , n exp ] } . (10)8 . . . Time (s) I npu t n e u r o n T inter T pattern u ( m V )

49 50

Time (s)

199 2000 2000

Generation index F i t n e ss −

100 0 100 t post − t pre − . . . ∆ w j −

100 0 100 t post − t pre w out . . ∆ w j −

100 0 100 t post − t pre . . ∆ ˜ w j −

100 0 100 t post − t pre − . − . . ∆ ˜ w j A B CD E FG H I . . . w . . . w P l a s t i c i t y r u l e Input Output

Eligibilitytrace

Figure 5:

Cartesian genetic programming evolves diverse correlation-driven learningrules. (A)

Network sketch. Multiple inputs project to a single output neuron. The currentsynaptic weight w j and the eligibility trace E c j are provided to the plasticity rule that determinesthe weight update. (B) Activity of input units. A frozen-noise pattern of duration T pattern (graybackground) is interleaved with periods of random background spiking of duration T inter . Spiketrains are generated by a stationary Poisson process. (C) Membrane potential u of the outputneuron over the course of learning using Eq. 12. Gray boxes indicate presentation of the frozen-noisepattern. (D) Fitness (Eq. 10) of the best individual per generation as a function of the generationindex for multiple runs of the evolutionary algorithm with diﬀerent initial conditions. Blue and redcurves correspond to the two representative plasticity rules selected for detailed analysis. Blue andred markers with error bars represent ﬁtness of the two representative rules on 20 validation tasksnot used during the evolutionary search. The orange marker indicates the ﬁtness of the homeostaticSTDP rule (Eq. 12; Masquelier, 2017) on 20 validation tasks. (E, F) : Learning rules evolved by tworuns of CGP ( E : Eq. 14, F : Eq. 15): weight change ∆ w j as a function of spike timing diﬀerences∆ t j for three diﬀerent weights w j . (G) Same for the homeostatic STDP rule (Eq. 12; Masquelier,2017). (H, I)

Eﬀective learning rules: LR 1 (H, Eq. 17) and LR 2 (I, Eq. 18). The learning rate is η = 0 .

01. 9he signal-to-noise ratio SNR k , following Masquelier (2017), is deﬁned as the diﬀerence between themaximal free membrane potential during pattern presentation averaged over multiple presentations( (cid:104) u k,i, max (cid:105) ) and the mean of the free membrane potential in between pattern presentations ( (cid:104) u k, inter (cid:105) )divided by its variance (Var( v k, inter )):SNR k := (cid:104) u k,i, max (cid:105) − (cid:104) u k, inter (cid:105) Var( u k, inter ) . (11)The free membrane potential is obtained in a separate simulation with frozen weights by disablingthe spiking mechanism for the output neuron. This removes measurement noise in the signal-to-noiseratio arising from spiking and subsequent membrane-potential reset. Each experiment consists ofdiﬀerent realizations of a frozen-noise pattern and background spiking.The rule described by Masquelier (2017) is a simple additive spike-timing-dependent plasticity(STDP) rule that replaces the depression branch of traditional STDP variants with a homeostaticterm w hom < j changesat each postsynaptic spike according to (Fig. 5G):∆ w j = η (cid:40) w hom ∆ t j < E c j + w hom ∆ t j ≥ . (12)Here, E c j := e −| ∆ t j | /τ represents an eligibility trace that depends on the relative timing of post- andpresynaptic spiking (∆ t j = t post − t pre ,j ) and is represented locally in each synapse (e.g., Morrisonet al., 2008). η represents a ﬁxed learning rate and w hom a constant. The synaptic weight is boundsuch that w j ∈ [0 , w j = η (cid:40) f dep ( w j , E c j ) ∆ t j < f fac ( w j , E c j ) ∆ t j ≥ . (13)Using CGP with two inputs ( w j , E c j ), we search for functions f dep and f fac that maximize theﬁtness F ( f dep , f fac ) (Eq. 10).After 2000 generations, half of the runs of the evolutionary algorithm discover high-ﬁtnesssolutions (Fig. 5D). These plasticity rules lead to synaptic weight conﬁgurations for which thesignal-to-noise ratio of the free membrane potential is at least ten. Empirical investigation showsthat such a signal-to-noise ratio, causes the neuron to reliably detect the frozen-noise pattern. Fromthese well-performing learning rules, we pick two representative examples (Fig. 5E,F) to analyze indetail. Learning rule 1 (LR1, Fig. 5E) is deﬁned by the following equation:∆ w j = η (cid:40) w j − ( w j − E c j ∆ t j < E c j − w j ∆ t j ≥ . (14)Learning rule 2 (LR2, Fig. 5F) is deﬁned by the following equation:∆ w j = η (cid:40) w j − E c j /w j ∆ t j < w j E c j ) w j − t j ≥ . (15)The direct availability of these rules as closed-form expressions enables an interesting line ofinquiry. While both rules reliably enable the neuron to solve its task, they both exhibit nonzerofacilitation and depression even for large time lags ∆ t j , which immediately raises questions abouttheir stability. To address this issue, we consider, for simplicity, a nearest-neighbor spike-pairing10cheme (e.g., Morrison et al., 2008) in which each postsynaptic spike can almost always be assigneda causal interaction with the preceding presynaptic spike and an anticausal interaction with thetrailing presynaptic spike. Weight changes therefore come in pairs: for each causal update theremust exist an anticausal one and vice-versa. In the limit of small ﬁring rates, we can thereforemake the simplifying assumption that each causal (anticausal) spike interaction is accompaniedby an anticausal (causal) interaction with a large time diﬀerence ∆ t j (see Appendix Sec. A fora mathematical argument). We can therefore construct eﬀective learning rules by adding theasymptotic weight change for anticausal (causal) interactions to all weight changes on the causal(anticausal) branch:∆ ˜ w j → η (cid:40) ∆ w j (∆ t j ) + lim ∆ t (cid:48) j →−∞ ∆ w j (∆ t (cid:48) j ) ∆ t j < w j (∆ t j ) + lim ∆ t (cid:48) j →∞ ∆ w j (∆ t (cid:48) j ) ∆ t j ≥ / w j = η (cid:40) (1 − w j ) E c j ∆ t j < E c j ∆ t j ≥ , (17)decays to zero for large negative and positive time diﬀerences and causes facilitation for small timediﬀerences, regardless whether they are causal or anticausal. This learning rule favors correlationbetween presynaptic and postsynaptic ﬁring with a small weight-dependent facilitation bias towardscausal interactions. In contrast, eﬀective LR2 (Fig. 5I)∆ w j = η (cid:40) − E c j /w j + w j − t j < w j E c j ) w j + w j − t j ≥ , (18)implements a similar strategy as the learning rule of Masquelier (2017): it depresses the synapse foralmost all interactions except for a narrow time window for small, positive (causal) time diﬀerences.Additionally, however, it more pronouncedly punishes anticausal interactions, thus being able toachieve even higher ﬁtness. Note how both rules reproduce important components of experimentallyestablished STDP traces (e.g., Caporale and Dan, 2008).In conclusion, for the correlation-driven task, the evolutionary search is able to discover notonly rules similar to those previously suggested but additionally proposes rules that are markedlydiﬀerent, yet still perform competitively, thus generating new hypotheses for learning in biologicalsubstrates. We note, again, that our insight into the dynamics of these rules was permitted, in theﬁrst place, by their interpretable representation in our evolutionary framework. Uncovering the mechanisms of learning via synaptic plasticity is a critical step towards understandingbrain (dys)function and building truly intelligent, adaptive machines. We introduce a novel approachto discover biophysically plausible plasticity rules in spiking neuronal networks. Our meta-learningframework uses genetic programming to search for plasticity rules by optimizing a ﬁtness functionspeciﬁc to the respective task family. Our approach discovers high-performing solutions in variouslearning paradigms, reward-driven, error-driven, and correlation-driven learning. This is, to thebest of our knowledge, the ﬁrst demonstration of the power of genetic programming methods in thesearch for plasticity mechanisms in spiking neuronal networks.A key point of our approach is the compact representation of the plasticity rules. We restrict thecomplexity of the expressions by three considerations. First, we assume that eﬀective descriptions11f weight changes can be found that are not unique to each individual synapse. This is is a commonassumption in computational neuroscience and based on the observation that nature must havefound a parsimonious encoding of brain structure, as not every connection in the brain can bespeciﬁed in the DNA of the organism (Zador, 2019); rather, genes encode general principles bywhich the neuronal networks and subnetworks are organized and reorganized (cf. Risi and Stanley,2010). Our approach aims at discovering such general principles for synaptic plasticity. Second,physical considerations restrict the information available to the plasticity rule to local quantities,such as pre- and post-synaptic activity traces or speciﬁc signals delivered via neuromodulators (e.g.,Miconi et al., 2020; Cox and Witten, 2019). Third, we limit the maximal size of the expressionsto keep the resulting learning rules interpretable and avoid overﬁtting. We explicitly want toavoid constructing an opaque system that has high task performance but does not allow us tounderstand how the network structure is shaped over the course of learning. The low complexityof the resulting expressions generates intuitive understanding, facilitating communication andhuman-guided generalization from a set of solutions to diﬀerent network architectures or taskdomains. In the search for plasticity rules suitable for physical implementations in biologicalsystems, these insights are crucial as the identiﬁed plasticity mechanisms can serve as buildingblocks for learning rules that generalize to the actual challenges faced by biological agents. Moreover,simple expressions highlight the key interactions between the local variables giving rise to plasticity,thus providing insights into the underlying biophysical processes and potentially suggesting newexperimental approaches.Furthermore, the automated search can discover plasticity rules for a given problem that exploitimplicit assumptions in the task. It therefore highlights underconstrained searches, be this dueto scarcity of biological data, the simplicity of chosen tasks or the omission of critical featuresin the task design. For instance, without asserting equal average spike rates of background andpattern neurons in the correlation-driven task, one could discover plasticity rules that exploit therate diﬀerence rather than the spatio-temporal structure of the input.Future work needs to address a general issue of any optimization method: how can we sys-tematically counter overﬁtting to reveal general solutions? A simple approach would increase thenumber of sample tasks during a single ﬁtness evaluation. However, computational costs increaselinearly in the number of samples. Another technique penalizes the complexity of the resultingexpressions, e.g., proportional to the size of the computational graph. Since it is a priori unclearhow this complexity penalty should be weighted against the original ﬁtness measures, one shouldconsider multi-objective optimization algorithms (e.g., Deb, 2001). Another issue to be addressed infuture work is the choice of the learning rate. Currently, this value is not part of the optimizationprocess and all tasks assume a ﬁxed learning rate. One possibility is to allow for more constants inCGP such that the search can easily scale the learning rate. However, the dimensionality of thesearch space scales exponentially in the number of operators and constants, and the feasibility ofsuch an approach needs to be carefully evaluated. One possibility to mitigate this combinatorialexplosion is to combine the evolutionary search with gradient-based optimization methods that canﬁne-tune constants in the expressions (Topchy and Punch, 2001; Izzo et al., 2017).Evolved Plastic Artiﬁcial Neural Networks (EPANNs; e.g., Soltoggio et al., 2018) and inparticular adaptive HyperNEAT (Risi and Stanley, 2010), represent an alternative approach todesigning plastic neural networks. In contrast to our method, however, these approaches include thenetwork architecture itself into the evolutionary search, alongside synaptic plasticity rules. Whilethis can lead to high-performance solutions due to a synergy between network architecture andplasticity, this interplay has a an important drawback, as in general it is diﬃcult to tease apart thecontribution of plasticity from that of network structure to high task performance (cf. Gaier andHa, 2019). In addition, the distributed, implicit representation of plasticity rules in HyperNEATcan be diﬃcult to interpret, which hinders a deeper understanding of the learning mechanisms. Inmachine-learning-oriented applications, this lack of credit assignment is less of an issue. For researchinto plasticity rules employed by biological systems, however, it presents a signiﬁcant obstacle.The evolutionary search for plasticity rules requires a large number of simulations, as eachcandidate solution needs to be evaluated on a suﬃciently large number of samples from the taskfamily to encourage generalization (e.g., Chalmers, 1991; Bengio et al., 1992). Due to silentmutations in CGP, i.e., modiﬁcations of the genotype that do not alter the phenotype, we usecaching methods to signiﬁcantly reduce computational cost as only new solutions need to be12valuated. However, even employing such methods, the number of required simulations remainslarge, in the order of 10 − per evolutionary run. Neuromorphic systems, dedicated hardwarespeciﬁcally designed to emulate neuronal networks, provide an attractive way to speed up theevolutionary search. To serve as suitable substrates for the approach presented here, these systemsshould be able to emulate spiking neuronal networks in an accelerated fashion with respect toreal time and provide on-chip plasticity with a ﬂexible speciﬁcation of plasticity mechanisms (e.g.,Davies et al., 2018; Billaudelle et al., 2019; Mayr et al., 2019).While the learning rules found in the experiments described above were all evolved from randomexpressions, one can also view the presented framework as a hypothesis-testing machine. Startingfrom a known plasticity rule, our framework would allow researchers to address questions like:assuming the learning rule would additionally have access to variable x , could this be incorporatedinto the weight updates such that learning would improve? The automated procedure makesanswering such questions much more eﬃcient than a human-guided manual search. Additionally,the framework is suitable to ﬁnd robust biophysically plausible approximations for complex learningrules containing quantities that might be non-local, diﬃcult to compute, and/or hard to implementin physical substrates. In particular, multi-objective optimization is suitable to evolve a known,complex rule into simpler versions while maintaining high task performance. Similarly, one couldsearch for modiﬁcations of general rules that are purposefully tuned to quickly learn within aspeciﬁc task family, outperforming more general solutions. In each of these cases, prior knowledgeabout eﬀective learning algorithms provides a starting point from which the evolutionary searchcan discover powerful extensions.The experiments considered here were mainly chosen due to their simplicity and prior knowl-edge about corresponding plasticity rules that provided us with a high-performance reference forcomparison. Additionally, in each experiment, we have restricted ourselves to a constrained set ofpossible inputs to the plasticity rule. Future work may involve less preprocessed data as inputswhile considering more diverse mathematical operators. In the correlation-driven task, one couldfor example provide the raw times of pre- and postsynaptic spiking to the graph instead of theexponential of their diﬀerence, leaving more freedom for the evolutionary search to discover creativesolutions. We expect particularly interesting applications of our framework to involve more complextasks that are challenging for contemporary algorithms, such as life-long learning, which needs totackle the issue of catastrophic forgetting (French, 1999) or learning in recurrent spiking neuronalnetworks. In order to yield insights into information processing in the nervous system, the designof the network architecture should be guided by known anatomical features, while the consideredtask families should fall within the realm of ecologically relevant problems.We view the presented methods as a machinery for generating, testing, and extending hypotheseson learning in spiking neuronal networks driven by problem instances and prior knowledge andconstrained by experimental evidence. We believe this approach holds signiﬁcant potential toaccelerate progress towards deep insights into information processing in physical systems, bothbiological and biologically inspired, with immanent potential for the development of powerfulartiﬁcial learning machines. We use a µ + λ evolution strategy (Beyer and Schwefel, 2002) to evolve a population of individualstowards high ﬁtness. In each generation, λ new oﬀsprings are created from µ parents via tournamentselection (e.g., Miller et al., 1995) and subsequent mutation. From these µ + λ , individuals thebest µ individuals are selected as parents for the next generation (Alg. 1). In this work we usea tournament size of one and a ﬁxed mutation probability p mutate for each gene in an oﬀspringindividual. Since in CGP crossover of individuals can lead to signiﬁcant disruption of the searchprocess due to major changes in the computational graphs (Miller, 1999), we avoid it here, in otherwords, new oﬀspring are only modiﬁed by mutations. We use neutral search (Miller and Thomson,2000), in which an oﬀspring is preferred over a parent with equal ﬁtness, to allow the accumulation ofsilent mutations that can jointly lead to an increase in ﬁtness. As it is computationally infeasible toexhaustively evaluate an individual on all possible tasks from a task family, we evaluate individuals13 ata: Initial random parent Population P = { p i } of size µt ← while t < n generations do Create new oﬀspring population Q t = CreateOﬀspringPopulation( P t )Combine parent + oﬀspring populations R t = P t ∪ Q t Evaluate ﬁtness of each individual in R t Pick P t +1 ⊂ R t best individuals as new parents t ← t + 1 endFunction CreateOﬀspringPopulation( P ) begin Oﬀspring population Q = {} while | Q | < λ do Choose random subset of P of size N tournament Choose best individual in the subset and append to Q endfor q i ∈ Q do Mutate each gene of q i with mutation probability p mutation endReturn Q endAlgorithm 1: Variant of µ + λ evolution strategies used in this study. Note the absence of acrossover step.only on a limited number of sample tasks and aggregate the results into a scalar ﬁtness, eitherby choosing the minimal result or averaging. We manually select the number of sample tasks tobalance computational costs and sampling noise for each task. In each generation, we use the sameinitial conditions to allow a meaningful comparison of results across generations. If an expression isencountered that can not be meaningfully evaluated, such as division by zero, the correspondingindividual is assigned a ﬁtness of −∞ . HAL-CGP (Schmidt and Jordan, 2020) (https://github.com/Happy-Algorithms-League/hal-cgp)is an extensible pure Python library implementing Cartesian genetic programming to represent,mutate and evaluate populations of individuals encoding symbolic expressions targeting applicationswith computationally expensive ﬁtness evaluations. It supports the translation from a CGPgenotype, a two-dimensional Cartesian graph, into the corresponding phenotype, a computationalgraph implementing a particular mathematical expression. These computational graphs can beexported as pure Python functions, NumPy-compatible functions (Walt et al., 2011), SymPyexpressions (Meurer et al., 2017) or PyTorch modules (Paszke et al., 2019). Users deﬁne thestructure of the two-dimensional graph from which the computational graph is generated. Thisincludes the number of inputs, columns, rows, and outputs, as well as the computational primitives,i.e., mathematical operators and constants, that compose the mathematical expressions. Due to themodular design of the library, users can easily implement new operators to be used as primitives.The library implements a µ + λ evolution strategy to evolve individuals (see Sec. 4.1). Users needto specify hyperparameters for the evolutionary algorithm, such as the size of parent and oﬀspringpopulations and the maximal number of generations. To avoid reevaluating phenotypes that havebeen previously evaluated, the library provides a mechanism for caching results on disk. Exploitingthe wide availability of multi-core architectures, the library can parallelize the evaluation of allindividuals in a single generation via separate processes. Spiking neuronal network simulations are based on the 2.16.0 release of the NEST simulator(Gewaltig and Diesmann, 2007) (https://github.com/nest/nest-simulator; commit 3c6f0f3). NEST14s an open-source simulator for spiking neuronal networks with a focus on large networks with simpleneuron models. The computationally intensive propagation of network dynamics is implementedin C ++ while the network model can be speciﬁed using a Python API (PyNEST; Eppler et al.,2009; Zaytsev and Morrison, 2014). NEST proﬁts from modern multi-core and multi-node systemsby combining local parallelization with OpenMP threads and inter-node communication via theMessage Passing Interface (MPI) (Jordan et al., 2018). The standard distribution oﬀers a varietyof established neuron and plastic synapse models, including variants of spike-timing-dependentplasticity, reward-modulated plasticity and structural plasticity. New models can be implementedvia a domain-speciﬁc language (Plotnikov et al., 2016) or custom C ++ code. For the purpose ofthis study, we implemented a reward-driven (Eq. 2; Urbanczik and Senn, 2009) and an error-drivenlearning rule (Eq. 4; Urbanczik and Senn, 2014), as well as a homeostatic STDP rule (Eq. 12;Masquelier, 2017) via custom C ++ code. Due to the speciﬁc implementation of spike delivery inNEST, we introduce a constant in the STDP rule that is added at each facilitation call instead ofusing a separate depression term. To support arbitrary mathematical expressions in the error-driven(Eq. 6) and correlation-driven synapse models (Eq. 13), we additionally implemented variants inwhich the weight update can be speciﬁed via SymPy compatible strings (Meurer et al., 2017)that are parsed by SymEngine (https://github.com/symengine/symengine), a C ++ library forsymbolic computation. All custom synapse models and necessary kernel patches are available asNEST modules in the repository accompanying this study (https://github.com/Happy-Algorithms-League/e2l-cgp-snn). All experiments were performed on JUWELS (J¨ulich Wizard for European Leadership Science), anHPC system at the J¨ulich Research Centre, J¨ulich, Germany, with 12 Petaﬂop peak performance.The system contains 2271 general-purpose compute nodes, each equipped with two Intel XeonPlatinum 8168 processors (2x24cores) and 12x8GB main memory. Compute nodes are connectedvia an EDR-Inﬁniband fat-tree network and run CentOS 7. Each experiment employed a singlecompute node.

We consider a reinforcement learning task for spiking neurons similar to Urbanczik and Senn (2009).Spiking activity of the output neuron is generated by an inhomogeneous Poisson process withinstantaneous rate φ determined by its membrane potential u (Pﬁster et al., 2006; Urbanczik andSenn, 2009): φ ( u ) := ρ e u − u th∆ u . (19)Here, ρ is the ﬁring rate at threshold, u th the threshold potential, and ∆ u a parameter governingthe noise amplitude. In contrast to Urbanczik and Senn (2009), we consider an instantaneousreset of the membrane potential after a spike instead of an hyperpolarization kernel. The outputneuron receives spike trains from sources randomly drawn from an input population of size N withrandomly initialized weights ( w initial ∼ N (0 , σ w )). Before each pattern presentation, the outputneurons membrane potential and synaptic currents are reset. The eligibility trace in every synapseis updated in continuous time according to the following diﬀerential equation (Urbanczik and Senn,2009): τ M ˙ E r j = − E r j + 1∆ u (cid:34)(cid:88) s ∈ y δ ( t − s ) − φ ( u ( t )) (cid:35) ¯ s j ( t ) , (20)where τ M governs the time scale of the eligibility trace, ∆ u is a parameter of the postsynaptic cellgoverning its noise amplitude, y represents the postsynaptic spike train, and ¯ s j ( t ) = ( κ ∗ s j )( t ) thepresynaptic spike train s j ﬁltered by the synaptic kernel κ . The learning rate η was manually tunedto obtain high performance with Eq. 2. 15 .6 Error-driven learning task We consider an error-driven learning task for spiking neurons similar to Urbanczik and Senn (2014). N Poisson inputs with constant rates ( r i ∼ U [ r min , r max ] , i ∈ [1 , N ]) project to a teacher neuronand, with the same connectivity pattern, to a student neuron. Spiking activity of the output neuronis generated by an inhomogeneous Poisson process with instantaneous rate φ determined by itsmembrane potential u (Pﬁster et al., 2006; Urbanczik and Senn, 2009): φ ( u ) := ρ e u − u th∆ u . (21)Here, ρ is the ﬁring rate at threshold, u th the threshold potential, and ∆ u a parameter governingthe noise amplitude. In contrast to Sec. 4.5, the membrane potential is not reset after spike emission.Fixed synaptic weights from the inputs to the teacher are uniformly sampled from the interval[ w min , w max ], while weights to the student are all initialized to a ﬁxed value w . In each trial werandomly shift all teacher weights by a global value w shift to avoid a bias in the error signal thatmay arise if the teacher membrane potential is initially always larger or always smaller than thestudent membrane potential. Target potentials are read out from the teacher every δt and providedinstantaneously to the student. The learning rate η was chosen via grid search on a single exampletask for high performance with Eq. 4. We consider an correlation-driven learning task for spiking neurons similar to Masquelier (2017):a spiking neuron, modeled as a leaky integrate-and-ﬁre neuron with delta-shaped post-synapticcurrents, receives stochastic spike trains from N inputs via plastic synapses.To construct the input spike trains, we ﬁrst create a frozen-noise pattern by drawing randomspikes S pattern i ∈ [0 , T pattern ] , i ∈ [0 , N −

1] from a Poisson process with rate ν . Neurons that ﬁre atleast once in this pattern are in the following called “pattern neurons”, the remaining are called“background neurons”. We alternate this frozen-noise pattern with random spike trains of length T inter generated by a Poisson process with rate ν (Fig. 5B). To balance the average rates of patternneurons and background neurons, we reduce the spike rate of pattern neurons in between patternsby a factor α . Background neurons have an average rate of ν inter = ν T inter T inter + T pattern . We assume thatpattern neurons spike only once during the pattern. Thus, they have an average rate of rate of ν = αν inter + T inter + T pattern = αν inter + ν pattern . Plugging in the previous expression for ν inter andsolving for α yields α = 1 − ν pattern ν inter . We choose the same learning rate as Masquelier (2017). The authors declare no competing interests.

References

Andrychowicz, M., Denil, M., Gomez, S., Hoﬀman, M. W., Pfau, D., Schaul, T., and de Freitas,N. (2016). Learning to learn by gradient descent by gradient descent. In

Advances in NeuralInformation Processing Systems , pages 3981–3989.16rtola, A., Brocher, S., and Singer, W. (1990). Diﬀerent voltage-dependent thresholds for inducinglong-term depression and long-term potentiation in slices of rat visual cortex.

Nature , 347(6288):69.Bengio, S., Bengio, Y., and Cloutier, J. (1994). Use of genetic programming for the search of a newlearning rule for neural networks. In

Evolutionary Computation, 1994. IEEE World Congress onComputational Intelligence., Proceedings of the First IEEE Conference on , pages 324–327. IEEE.Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. (1992). On the optimization of a synapticlearning rule. In , pages6–8. Univ. of Texas.Bengio, S., Bengio, Y., Cloutier, J., and Gecsei, J. (1993). Generalization of a parametric learningrule. In

ICANN93 , pages 502–502. Springer.Bengio, Y., Bengio, S., and Cloutier, J. (1990).

Learning a synaptic learning rule . Universit´e deMontr´eal, D´epartement d’informatique et de recherche op´erationnelle.Beyer, H.-G. and Schwefel, H.-P. (2002). Evolution strategies–a comprehensive introduction.

Naturalcomputing , 1(1):3–52.Bi, G.-q. and Poo, M.-m. (1998). Synaptic modiﬁcations in cultured hippocampal neurons: depen-dence on spike timing, synaptic strength, and postsynaptic cell type.

J. Neurosci. , 18(24):10464–10472.Billaudelle, S., Stradmann, Y., Schreiber, K., Cramer, B., Baumbach, A., Dold, D., G¨oltz, J., Kungl,A. F., Wunderlich, T. C., Hartel, A., et al. (2019). Versatile emulation of spiking neural networkson an accelerated neuromorphic substrate. arXiv preprint arXiv:1912.12980 .Bohnstingl, T., Scherr, F., Pehle, C., Meier, K., and Maass, W. (2019). Neuromorphic hardwarelearns to learn.

Frontiers in neuroscience , 13.Caporale, N. and Dan, Y. (2008). Spike timing–dependent plasticity: a hebbian learning rule.

Annu.Rev. Neurosci. , 31:25–46.Chalmers, D. J. (1991). The evolution of learning: An experiment in genetic connectionism. In

Connectionist Models , pages 81–90. Elsevier.Clopath, C., B¨using, L., Vasilaki, E., and Gerstner, W. (2010). Connectivity reﬂects coding: amodel of voltage-based stdp with homeostasis.

Nat. Neurosci. , 13(3):344.Cox, J. and Witten, I. B. (2019). Striatal circuits for reward learning and decision-making.

NatureReviews Neuroscience , page 1.Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P.,Imam, N., Jain, S., et al. (2018). Loihi: A neuromorphic manycore processor with on-chiplearning.

IEEE Micro , 38(1):82–99.Deb, K. (2001).

Multi-objective optimization using evolutionary algorithms , volume 16. John Wiley& Sons.Deneve, S. (2008). Bayesian spiking neurons i: inference.

Neural computation , 20(1):91–117.Dold, D., Bytschok, I., Kungl, A. F., Baumbach, A., Breitwieser, O., Senn, W., Schemmel, J.,Meier, K., and Petrovici, M. A. (2019). Stochasticity from functionwhy the bayesian brain mayneed no noise.

Neural networks , 119:200–213.Dudek, S. M. and Bear, M. F. (1993). Bidirectional long-term modiﬁcation of synaptic eﬀectivenessin the adult and immature hippocampus.

J. Neurosci. , 13(7):2910–2918.Eppler, J. M., Helias, M., Muller, E., Diesmann, M., and Gewaltig, M.-O. (2009). PyNEST: aconvenient interface to the NEST simulator.

Frontiers in neuroinformatics , 2:12.17rench, R. M. (1999). Catastrophic forgetting in connectionist networks.

Trends in cognitivesciences , 3(4):128–135.Gaier, A. and Ha, D. (2019). Weight agnostic neural networks. arXiv preprint arXiv:1906.04358 .Gewaltig, M.-O. and Diesmann, M. (2007). NEST (Neural Simulation Tool).

Scholarpedia , 2(4):1430.G¨oltz, J., Baumbach, A., Billaudelle, S., Breitwieser, O., Dold, D., Kriener, L., Kungl, A. F.,Senn, W., Schemmel, J., Meier, K., et al. (2019). Fast and deep neuromorphic learning withtime-to-ﬁrst-spike coding. arXiv preprint arXiv:1912.11443 .Goodfellow, I., Bengio, Y., and Courville, A. (2016).

Deep learning . MIT press.G¨utig, R., Aharonov, R., Rotter, S., and Sompolinsky, H. (2003). Learning input correlationsthrough nonlinear temporally asymmetric hebbian plasticity.

J. Neurosci. , 23(9):3697–3714.Ivakhnenko, A. G. (1971). Polynomial theory of complex systems.

IEEE transactions on Systems,Man, and Cybernetics , 4:364–378.Izhikevich, E. M. (2007). Solving the distal reward problem through linkage of stdp and dopaminesignaling.

Cereb. Cortex , 17(10):2443–2452.Izzo, D., Biscani, F., and Mereta, A. (2017). Diﬀerentiable genetic programming. In

EuropeanConference on Genetic Programming , pages 35–51. Springer.Jordan, J., Ippen, T., Helias, M., Kitayama, I., Sato, M., Igarashi, J., Diesmann, M., and Kunkel,S. (2018). Extremely scalable spiking neuronal network simulation code: from laptops to exascalecomputers.

Frontiers in neuroinformatics , 12:2.Jordan, J., Petrovici, M. A., Breitwieser, O., Schemmel, J., Meier, K., Diesmann, M., and Tetzlaﬀ,T. (2019). Deterministic networks for probabilistic computing.

Scientiﬁc reports , 9(1):1–17.Kappel, D., Habenschuss, S., Legenstein, R., and Maass, W. (2015). Network plasticity as bayesianinference.

PLoS Comput. Biol. , 11(11):e1004485.Kempter, R., Gerstner, W., and van Hemmen, J. L. (1999). Hebbian learning and spiking neurons.

Phys. Rev. E , 59:4498–4514.Keup, C., K¨uhn, T., Dahmen, D., and Helias, M. (2020). Transient chaotic dimensionality expansionby recurrent networks. arXiv preprint arXiv:2002.11006 .Koza, J. R. (1992).

Genetic programming: on the programming of computers by means of naturalselection , volume 1. MIT press.Koza, J. R. (2010). Human-competitive results produced by genetic programming.

GeneticProgramming and Evolvable Machines , 11(3-4):251–284.Kutschireiter, A., Surace, S. C., Sprekeler, H., and Pﬁster, J.-P. (2017). Nonlinear bayesian ﬁlteringand learning: a neuronal dynamics for perception.

Scientiﬁc reports , 7(1):8722.Marblestone, A. H., Wayne, G., and Kording, K. P. (2016). Toward an integration of deep learningand neuroscience.

Frontiers in computational neuroscience , 10:94.Masquelier, T. (2017). STDP allows close-to-optimal spatiotemporal spike pattern detection bysingle coincidence detector neurons.

Neuroscience , pages 1–8.Mayr, C., Hoeppner, S., and Furber, S. (2019). Spinnaker 2: A 10 million core processor system forbrain simulation and machine learning. arXiv preprint arXiv:1911.02385 .Meurer, A., Smith, C. P., Paprocki, M., ˇCert´ık, O., Kirpichev, S. B., Rocklin, M., Kumar, A.,Ivanov, S., Moore, J. K., Singh, S., et al. (2017). Sympy: symbolic computing in python.

PeerJComputer Science , 3:e103. 18iconi, T., Rawal, A., Clune, J., and Stanley, K. O. (2020). Backpropamine: training self-modifyingneural networks with diﬀerentiable neuromodulated plasticity. arXiv preprint arXiv:2002.10585 .Miller, B. L., Goldberg, D. E., et al. (1995). Genetic algorithms, tournament selection, and theeﬀects of noise.

Complex systems , 9(3):193–212.Miller, J. and Thomson, P. (2000). Cartesian genetic programming. In

Proc. European Conferenceon Genetic Programming , volume 1802, pages 121–132. Springer.Miller, J. F. (1999). An empirical study of the eﬃciency of learning boolean functions using acartesian genetic programming approach. In

Proceedings of the 1st Annual Conference on Geneticand Evolutionary Computation-Volume 2 , pages 1135–1142. Morgan Kaufmann Publishers Inc.Miller, J. F. (2011). Cartesian genetic programming. In

Cartesian Genetic Programming , pages17–34. Springer.Moradi, S., Qiao, N., Stefanini, F., and Indiveri, G. (2017). A scalable multicore architecture withheterogeneous memory structures for dynamic neuromorphic asynchronous processors (dynaps).

IEEE transactions on biomedical circuits and systems , 12(1):106–122.Morrison, A., Diesmann, M., and Gerstner, W. (2008). Phenomenological models of synapticplasticity based on spike timing.

Biological cybernetics , 98(6):459–478.Ngezahayo, A., Schachner, M., and Artola, A. (2000). Synaptic activity modulates the induction ofbidirectional synaptic changes in adult mouse hippocampus.

J. Neurosci. , 20(7):2451–2458.Nordlie, E., Gewaltig, M.-O., and Plesser, H. E. (2009). Towards reproducible descriptions ofneuronal network models.

PLoS computational biology , 5(8).Orchard, J. and Wang, L. (2016). The evolution of a generalized neural learning rule. In

NeuralNetworks (IJCNN), 2016 International Joint Conference on , pages 4688–4694. IEEE.Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z.,Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A.,Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An imperativestyle, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A.,d’ Alch´e-Buc, F., Fox, E., and Garnett, R., editors,

Advances in Neural Information ProcessingSystems 32 , pages 8024–8035. Curran Associates, Inc.Pﬁster, J.-P., Dayan, P., and Lengyel, M. (2010). Synapses with short-term plasticity are optimalestimators of presynaptic membrane potentials.

Nature neuroscience , 13(10):1271.Pﬁster, J.-P., Toyoizumi, T., Barber, D., and Gerstner, W. (2006). Optimal spike-timing-dependentplasticity for precise action potential ﬁring in supervised learning.

Neural computation , 18(6):1318–1348.Plotnikov, D., Rumpe, B., Blundell, I., Ippen, T., Eppler, J. M., and Morrison, A. (2016). NESTML:a modeling language for spiking neurons. arXiv preprint arXiv:1606.02882 .Radi, A. and Poli, R. (2003). Discovering eﬃcient learning rules for feedforward neural networksusing genetic programming. In

Recent advances in intelligent paradigms and applications , pages133–159. Springer.Risi, S. and Stanley, K. O. (2010). Indirectly encoding neural plasticity as a pattern of local rules.In

International Conference on Simulation of Adaptive Behavior , pages 533–543. Springer.Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1985). Learning internal representationsby error propagation. Technical report, California Univ San Diego La Jolla Inst for CognitiveScience.Sacramento, J., Costa, R. P., Bengio, Y., and Senn, W. (2018). Dendritic cortical microcircuitsapproximate the backpropagation algorithm. In

Advances in Neural Information ProcessingSystems , pages 8721–8732. 19chmidt, M. and Jordan, J. (2020). HAL-CGP: Cartesian genetic programming in pure Python.

Zenodo Zenodo:3889163 .Soltoggio, A., Stanley, K. O., and Risi, S. (2018). Born to learn: the inspiration, progress, andfuture of evolved plastic artiﬁcial neural networks.

Neural Networks .Topchy, A. and Punch, W. F. (2001). Faster genetic programming based on local gradient search ofnumeric leaf values. In

Proceedings of the 3rd Annual Conference on Genetic and EvolutionaryComputation , pages 155–162. Morgan Kaufmann Publishers Inc.Toyoizumi, T., Pﬁster, J.-P., Aihara, K., and Gerstner, W. (2005). Generalized bienenstock–cooper–munro rule for spiking neurons that maximizes information transmission.

Proceedings of theNational Academy of Sciences of the United States of America , 102(14):5239–5244.Urbanczik, R. and Senn, W. (2009). Reinforcement learning in populations of spiking neurons.

Nature neuroscience , 12(3):250.Urbanczik, R. and Senn, W. (2014). Learning by the dendritic prediction of somatic spiking.

Neuron ,81(3):521–528.Walt, S. v. d., Colbert, S. C., and Varoquaux, G. (2011). The numpy array: a structure for eﬃcientnumerical computation.

Computing in Science & Engineering , 13(2):22–30.Whittington, J. C. and Bogacz, R. (2019). Theories of error back-propagation in the brain.

Trendsin cognitive sciences .Zador, A. M. (2019). A critique of pure learning and what artiﬁcial neural networks can learn fromanimal brains.

Nature communications , 10(1):1–7.Zaytsev, Y. V. and Morrison, A. (2014). CyNEST: a maintainable cython-based interface for theNEST simulator.

Frontiers in neuroinformatics , 8:23.20 B p r e p o s t Figure 6:

Derivation of eﬀective learning rules. (A)

A postsynaptic spike at t = t pairedwith a causal presynaptic spike at t = 0 and anticausal presynaptic spike at t = t + t . (B) Thegeneral form of the learning rules considered in the derivation of Eq. 22. The learning rule variesonly on a restricted interval of time diﬀerences between − t ∗ and t ∗ where it is bounded from aboveand below, while it remains constant outside this interval. A Derivation of eﬀective learning rules

We consider the eﬀective weight change resulting from a pair of causal ( t post > t pre ) and anticausal( t post < t pre ) spikes. Note that the approximation of considering only nearest-neighbor spikesbecomes more accurate as the ﬁring of pre- and postsynaptic neurons become more similar. Withoutloss of generality we assume a causal presynaptic spike at t = 0 and a postsynaptic spike at t = t are accompanied by an anticausal presynaptic spike at t = t + t with arbitrary t ≥ ν ,the probability to observe the anticausal spike at t = t + t is given by: p ( t + t | t ) = νe − νt .We consider a plasticity rule ∆ w (∆ t ) which only depends on the time diﬀerence ∆ t of pre- andpostsynaptic spikes. Additionally we assume that ∆ w (∆ t ) only varies within | ∆ t | < t ∗ and isconstant outside these bounds (Fig. 6B). Note that this is an approximation to the learningrules described in the main text which are not constant outside speciﬁc bounds but rather decayexponentially over a characteristic time scale τ . We account for considering each presynaptic spikein two weight updates, once preceeding, once trailing the presynaptic spike, by introducing thefactor 1 /

2. The resulting weight change from a causal and an anticausal spike can hence be writtenas: ∆ ˜ w ( t ) = 12 ∆ w ( t ) + 12 (cid:104) ∆ w ( − t ) (cid:105) p ( t + t | t ) = 12 ∆ w ( t ) + 12 (cid:90) ∞ dt p ( t + t | t )∆ w ( − t )= 12 ∆ w ( t ) + 12 (cid:90) t ∗ dt νe − νt ∆ w ( − t ) (cid:124) (cid:123)(cid:122) (cid:125) ≤ ∆ w max (cid:16) − e − νt ∗ +1 (cid:17) ≈ ν (cid:28) t ∗ ≥ ∆ w min (cid:16) − e − νt ∗ +1 (cid:17) ≈ ν (cid:28) t ∗ + 12 ∆ w − (cid:90) ∞ t ∗ dt νe − νt (cid:124) (cid:123)(cid:122) (cid:125) = ( − e − νt ∗ ) ≈ ν (cid:28) t ∗ ≈

12 ∆ w ( t ) + 12 ∆ w − (22)On line three we used the premise that ∆ w ( t ) is bounded from above and below by ∆ w max and∆ w min , respectively. Additionally, we considered a regime in which presynaptic ﬁring rates aresmall compared to the timescale on which ∆ w ( t ) varies signiﬁcantly ( ν (cid:28) t ∗ ). The characteristictime scale of learning rules in the main text is τ = 20 ms. With a presynaptic ﬁring rate of 3 Hzthis approximation holds in our experiments. 21 Simulation details

Table 1, Table 2, and Table 3 summarize the network models used in the experiments.22

Model summaryPopulations Topology — Connectivity

Feedforward with ﬁxed connection probability

Neuron model

Leaky integrate-and-ﬁre (LIF) with exponential post-synaptic currents

Plasticity

Reward-driven

Measurements

Spikes

B PopulationsName Elements Size

Input Spike generators with pre-deﬁned spike trains (see Sec. 4.5) N Output LIF neuron 1

C ConnectivitySource Target Pattern

Input Output Fixed pairwise connection probability p ; synapticdelay d ; random initial weights from N (0 , σ w ) D Neuron modelType

LIF neuron with exponential post-synaptic currents

Subthreshold dy-namics d u ( t )d t = − u ( t ) − E L τ m + I s ( t ) C m if not refractory u ( t ) = u r else I s ( t ) = (cid:80) i,k w k e − ( t − t ki ) /τ s Θ( t − t ki ), k : neuron index, i : spike index Spiking

Stochastic spike generation via inhomogeneous Poisson process with in-tensity φ ( u ) = ρ e ( u − u th ) / ∆ u ; reset of u to u r after spike emission andrefractory period of τ r E Synapse modelPlasticity

Reward-driven with episodic update (Eq. 2, Eq. 3)

Other

Each synapse stores an eligibility trace (Eq. 20)

F Simulation ParametersPopulations N = 50 Connectivity p = 0 . , σ w = 10 pA Neuron model ρ = 0 .

01 Hz , ∆ u = 0 . , E L = −

70 mV , u r = −

70 mV , u th = −

55 mV , τ m = 10 ms , C m = 250 pF , τ r = 2 ms , τ s = 2 ms Synapse model η = 10 , τ M = 500 ms , d = 1 ms Input M = 30 , r = 6 Hz , T = 500 ms , n training = 500 , n exp = 10 Other h = 0 .

01 ms

G CGP ParametersPopulation µ = 4 , p mutation = 0 . Genome n inputs = 2 , n outputs = 1 , n rows = 1 , n columns = 5 , l max = 5 Primitives

Add, Sub, Mul, Div, Const(1.0) EA λ = 4 , n breeding = 4 , n tournament = 1 Other max generations = 500 , minimal ﬁtness = 300Table 1: Description of the network model used in the reward-driven learning task (Sec. 4.5)(according to Nordlie et al., 2009). 23 Model summaryPopulations Topology — Connectivity

Feedforward with all-to-all connections

Neuron model

Leaky integrate-and-ﬁre (LIF) with exponential post-synaptic currents

Plasticity

Error-driven

Measurements

Spikes, membrane potentials

B PopulationsName Elements Size

Input Spike generators with pre-deﬁned spike trains (see Sec. 4.6) N Teacher LIF neuron 1Student LIF neuron 1

C ConnectivitySource Target Pattern

Input Teacher All-to-all; synaptic delay d ; random weights w ∼U [ w min , w max ]; weights randomly shifted by w shift oneach trialInput Student All-to-all; synaptic delay d ; ﬁxed initial weights w D Neuron modelType

LIF neuron with exponential post-synaptic currents

Subthreshold dy-namics d u ( t )d t = − u ( t ) − E L τ m + I s ( t ) C m I s ( t ) = (cid:80) i,k J k e − ( t − t ki ) /τ s Θ( t − t ki ) k : neuron index, i : spike index Spiking

Stochastic spike generation via inhomogeneous Poisson process with in-tensity φ ( u ) = ρ e ( u − u th ) / ∆ u ; no reset after spike emission E Synapse modelPlasticity

Error-driven with continuous update (Eq. 4, Eq. 6)

F Simulation ParametersPopulations N = 5 Connectivity w min = − , w max = 20 , w shift ∼ {− , } , w = 5 Neuron model ρ = 0 . , ∆ u = 1 . , E L = −

70 mV , u th = −

55 mV , τ m =10 ms , C m = 250 pF , τ s = 2 ms Synapse model η = 1 . , d = 1 ms Input r min = 150 Hz , r max = 850 Hz , T = 10 ,

000 ms , n exp = 15

Other h = 0 .

01 ms , δt = 5 ms

G CGP ParametersPopulation µ = 4 , p mutation = 0 . Genome n inputs = 3 , n outputs = 1 , n rows = 1 , n columns = 12 , l max = 12 Primitives

Add, Sub, Mul, Div, Const(1.0) EA λ = 4 , n breeding = 4 , n tournament = 1 Other max generations = 1000 , minimal ﬁtness = 0 . Model summaryPopulations Topology — Connectivity

Feedforward with all-to-all connections

Neuron model

Leaky integrate-and-ﬁre (LIF) with delta-shaped post-synaptic currents

Plasticity

Correlation-driven

Measurements

Spikes, membrane potentials

B PopulationsName Elements Size

Input Spike generators with pre-deﬁned spike trains (see Sec. 4.7) N Output LIF neuron 1

C ConnectivitySource Target Pattern

Input Output All-to-all; synaptic delay d ; initial weights chosen toyield an approximate initial ﬁring rate of 20 Hz inthe output neuron w = u th ( Nν − √ Nν ) / (10 − τ m ) D Neuron modelType

LIF neuron with delta-shaped post-synaptic currents

Subthreshold dy-namics d u d t = − u − E L τ m + I s ( t ) C m if (t > t ∗ + τ r ) u ( t ) = u r else I s ( t ) = (cid:80) i,k w k δ ( t − t ki ) k : neuron index, i : spike index Spiking If u ( t − ) < θ ∧ u ( t +) ≥ θ

1. set t ∗ = t , 2. emit spike with time stamp t ∗ E Synapse modelPlasticity

Correlation-driven (Eq. 12, Eq. 14, Eq. 15)

F Simulation parametersNeuron model u r = 0 . , E L = 0 . , u th = 20 . , τ m = 18 . Synapse model d = 1 . W max = 1 . τ = 20 . η = 0 . Experiment N = 1000, T = 100 s, T pattern = 100 . T inter = 400 . ν = 3 . n exp = 8 F CGP parametersPopulation µ = 8 , p mutation = 0 . Genome n inputs = 2 , n outputs = 1 , n rows = 1 , n columns = 5 , l max = 5 Primitives

Add, Sub, Mul, Div, Pow, Const(1.0) EA λ = 8 , n breeding = 8 , n tournament = 1 Other max generations = 2000 , minimal ﬁtness = 10 ..