[PDF] Fast simulations of highly-connected spiking cortical models using GPUs

Abstract

Over the past decade there has been a growing interest in the development of parallel hardware systems for simulating large-scale networks of spiking neurons. Compared to other highly-parallel systems, GPU-accelerated solutions have the advantage of a relatively low cost and a great versatility, thanks also to the possibility of using the CUDA-C/C++ programming languages. NeuronGPU is a GPU library for large-scale simulations of spiking neural network models, written in the C++ and CUDA-C++ programming languages, based on a novel spike-delivery algorithm. This library includes simple LIF (leaky-integrate-and-fire) neuron models as well as several multisynapse AdEx (adaptive-exponential-integrate-and-fire) neuron models with current or conductance based synapses, user definable models and different devices. The numerical solution of the differential equations of the dynamics of the AdEx models is performed through a parallel implementation, written in CUDA-C++, of the fifth-order Runge-Kutta method with adaptive step-size control. In this work we evaluate the performance of this library on the simulation of a cortical microcircuit model, based on LIF neurons and current-based synapses, and on a balanced network of excitatory and inhibitory neurons, using AdEx neurons and conductance-based synapses. On these models, we will show that the proposed library achieves state-of-the-art performance in terms of simulation time per second of biological activity. In particular, using a single NVIDIA GeForce RTX 2080 Ti GPU board, the full-scale cortical-microcircuit model, which includes about 77,000 neurons and 3⋅ 10 8 connections, can be simulated at a speed very close to real time, while the simulation time of a balanced network of 1,000,000 AdEx neurons with 1,000 connections per neuron was about 70 s per second of biological activity.

Full PDF

AA new GPU library for fast simulation of large-scale networks ofspiking neurons

Bruno Golosio , Gianmarco Tiddia , Chiara De Luca , Elena Pastorelli , FrancescoSimula , Pier Stanislao Paolucci , Department of Physics, University of Cagliari, Italy Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Cagliari, Italy Ph.D. Program in Behavioural Neuroscience, “Sapienza” University of Rome; Italy Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Roma, Italy* [email protected]

ABSTRACT

Over the past decade there has been a growing interest in the development of parallelhardware systems for simulating large-scale networks of spiking neurons. Compared toother highly-parallel systems, GPU-accelerated solutions have the advantage of a relativelylow cost and a great versatility, thanks also to the possibility of using the CUDA-C/C++programming languages. NeuronGPU is a GPU library for large-scale simulations of spikingneural network models, written in the C++ and CUDA-C++ programming languages, basedon a novel spike-delivery algorithm. This library includes simple LIF (leaky-integrate-and-ﬁre) neuron models as well as several multisynapse AdEx (adaptive-exponential-integrate-and-ﬁre) neuron models with current or conductance based synapses, user deﬁnable modelsand diﬀerent devices. The numerical solution of the diﬀerential equations of the dynamics ofthe AdEx models is performed through a parallel implementation, written in CUDA-C++, ofthe ﬁfth-order Runge-Kutta method with adaptive step-size control. In this work we evaluatethe performance of this library on the simulation of a well-known cortical microcircuit model,based on LIF neurons and current-based synapses, and on a balanced networks of excitatoryand inhibitory neurons, using AdEx neurons and conductance-based synapses. On thesemodels, we will show that the proposed library achieves state-of-the-art performance in terms1 a r X i v : . [ q - b i o . N C ] J u l f simulation time per second of biological activity. In particular, using a single NVIDIAGeForce RTX 2080 Ti GPU board, the full-scale cortical-microcircuit model, which includesabout 77,000 neurons and · connections, can be simulated at a speed very close to realtime, while the simulation time of a balanced network of 1,000,000 AdEx neurons with 1,000connections per neuron was about 70 s per second of biological activity. I. INTRODUCTION

The human brain is an extremely complex system, with a number of neurons in the orderof 100 billions, an average number of connections per neuron in the order of 10 thousands,hundreds of diﬀerent neuron types, several types of neurotransmitters and receptors. Be-cause of this complexity, the simulation of brain activity at the level of signals produced byindividual neurons is extremely demanding, even if it is limited to relatively small regionsof the brain. Therefore there is a growing interest in the development of high-performancehardware and software tools for eﬃcient simulations of large-scale networks of spiking neuronmodels. Some simulators, as for instance NEST , NEURON and Brian , combine ﬂexibilityand simplicity of use with the possibility to simulate a wide range of spiking neuron andsynaptic models. All three of these simulators oﬀer support for multithread parallel computa-tion for parallelization on a single computer and MPI support for distributed simulations oncomputer clusters. On the other hand, a fertile ﬁeld of research in recent decades has investi-gated the use of highly parallel hardware systems for simulating large-scale networks of spik-ing neurons. Such systems include custom made neuromorphic very-large-scale-integration(VLSI) circuits , ﬁeld programmable gate arrays (FPGAs) and systems based on graphicalprocessing units (GPUs) . Compared to other highly-parallel systems, the latter have theadvantage of a relatively low cost, a great versatility, thanks also to the possibility of usingthe CUDA (Compute Uniﬁed Device Architecture) platform, and a sustained technologi-cal development driven by the consumer market. General purpose computing on graphicalprocessing units (GPGPU) is widely employed for massively parallel computing. GPGPUscan signiﬁcantly reduce the processing time compared to multi-core CPU systems for tasksthat require a high degree of parallelism, because a single GPU can perform thousands ofcore computations in parallel. However, in order to derive maximum beneﬁt from GPGPU,the applications must be carefully designed taking into account the hardware architecture.2UDA is a parallel computing platform and programing model that has been created byNVIDIA to allow software developers to take full advantage of the GPU capabilities . Overthe past decade, several GPU-based spiking neural network simulators have been developed(see Brette and Goodman (2012) for a review). EDLUT is a hybrid CPU/GPU spikingneural network simulator which combines time-driven (in GPU) and event-driven (in CPU)simulation methods to achieve real-time simulation of medium-size networks, which can beexploited in real-time experiments as for instance the control of a robotic arm. ANNarchy isa simulator for distributed rate-coded or spiking neural networks, which provides a Pythoninterface for the deﬁnition of the networks and generates optimized C++ code to actually runthe simulation in parallel, using either openMP on CPU architectures or CUDA on GPUs.CARLsim is a GPU-accelerated library for simulating large-scale spiking neural network(SNN) based on the Izhikevich neuron model, which provides a PyNN-like programminginterface in C/C++. Recently, the GeNN simulator achieved cutting edge performancein GPU-based simulation of spiking neural networks, achieving better performance thanCPU based clusters and neuromorphic systems in the simulation of the full-scale corticalmicrocircuit model proposed by Potjans and Diesmann . In this work we present a com-prehensive GPU library for fast simulation of large-scale networks of spiking neurons, calledNeuronGPU, which uses a novel GPU-optimized algorithm for spike delivery. This librarycan be used either in Python or in C/C++. The Python interface is very similar to thatof the NEST simulator and allows interactive use of the library. NeuronGPU was recentlyproposed for being integrated with the NEST neural simulator . In the following sections,after a general description of the library and of the spike delivery algorithm, we will evaluatethe library on two types of spiking neural network models: • the Potjans-Diesmann cortical microcircuit model , based on the leaky-integrate-and-ﬁre (LIF) neuron model, which describes the behavior of a region of the cerebral cortexhaving a surface of 1 mm and includes about 77,000 neurons and · connections; • a balanced network of excitatory and inhibitory neurons , based on the adaptive-exponential-integrate-and-ﬁre (AdEx) neuron model , with up to 1,000,000 neuronsand connections.We will show that, although the building time is larger compared to other simulators, Neu-ronGPU achieves state-of-the-art performance in terms of simulation time per unit time of3iological activity. II. MATERIALS AND METHODSA. The NeuronGPU library

NeuronGPU is a GPU library for simulation of large-scale networks of spiking neurons,written in the C++ and CUDA-C++ programming languages. Currently it can simulate LIFmodels, diﬀerent multisynapse AdEx models with current or conductance based synapsesand two user deﬁnable neuron models. The LIF model subthresold dynamics is integratedby the exact integration scheme described in on the time grid given by the simulation timeresolution. On the other hand, the numerical solution of the diﬀerential equations of theAdEx dynamics is performed through a parallel implementation, written in CUDA C++,of the ﬁfth-order Runge-Kutta method with adaptive control of the step size . In general,NeuronGPU can simulate networks of any neuron model and synaptic current model whosedynamics can be described by a system of ordinary diﬀerential equations (ODEs). The com-putations are carried out using mainly 32-bit ﬂoating point numerical precision, with theexception of some parts of the code for which double precision calculations are more appro-priate, e.g. those in which a very large number of terms can be added. Neuron parametersand connection weights and delays can be initialized either using ﬁxed values or througharrays or probability distributions. Neuron groups can be connected either using predeﬁnedconnection rules (one-to-one, all-to-all, ﬁxed indegree, ﬁxed outdegree, ﬁxed total number)or by user-deﬁned connections. In addition to the standard synapse model, nearest-neighborspike-timing-dependent-plasticity (STDP) is also available . Diﬀerent types of devices canbe simulated, including Poisson signal generators, spike generators, multimeters and par-rot neurons. NeuronGPU includes an eﬃcient implementation of GPU-MPI communicationamong diﬀerent nodes of a GPU cluster, however the performance of the proposed library onGPU clusters has not yet been thoroughly evaluated, therefore this feature is not described inthe present work. The Python interface is very similar to that of NEST in main commands,use of dictionaries, connection rules, device and neuron model names and parameters. Thefollowing Python code sample illustrates this strong similarity. import neurongpu as ngpu create Poisson generator with rate poiss_ratepg = ngpu.Create("poisson_generator")poiss_rate = 12000.0ngpu.SetStatus(pg, "rate", poiss_rate) About 30 test scripts and C++ programs have been designed to check the correctness ofneuron model dynamics, devices, spike delivery, connection rules. Many of such tests usesimilar NEST simulations as reference. Several examples in C++ and in Python are alsoavailable. NeuronGPU is an open-source library, freely available on github from the webaddress https://github.com/golosio/NeuronGPU under the terms of the GNU GeneralPublic License v3.0. 5 . The spike-delivery algorithm

NeuronGPU uses one (output) spike buﬀer per neuron, which holds the spikes that havebeen ﬁred by the neuron. The output connections of each neuron are organized in groups,all connection in the same group having the same delay. Only three values per spike arestored in the buﬀer: a multiplicity, a time index t s , which starts from 0 and is incrementedby 1 at every time step, a connection-group index i g , which also starts from zero and isincremented by 1 every time the spike reaches a connection group, i.e. when the time index t s matches the connection-group delay. Figure 1(a) represents the structure of the spikebuﬀer and illustrates an example of how the spike is delivered from the neuron that ﬁredit to the target neurons of diﬀerent connection groups. Keeping a connection-group indexand having output-connection groups ordered according to their delays is useful for reducingthe computational cost, because with this approach there is no need for a nested loop forcomparing the time index of the spike with the connection delays. When the time index ofa spike t s matches a connection-group delay, the spike is sent to the spike array, as shown inFig. 1(b). Finally, spikes are sent from this array to the target neurons. This ﬁnal deliveryis done directly by a CUDA kernel, so no additional memory is required. The maximum sizeof the global spike array is equal to the number of nodes, so the maximum GPU memoryrequired by this algorithm is well deﬁned.In MPI connections, when a source node is connected to target nodes of another host, aspike buﬀer, similar to the local one, is created in the remote host. When the source nodeﬁres a spike, this is sent to its spike buﬀer of the remote host, which deliver the spike to alltarget neurons after proper delays. C. The Potjans-Diesmann cortical microcircuit model

The cortical microcircuit model used in this work was developed in 2014 by T. C. Potjansand M. Diesmann and describes a portion of 1 mm of sensory cortex, comprising approx-imately 77,000 LIF neurons organized into layers 2/3, 4, 5, and 6. Each layer contains anexcitatory and an inhibitory population of LIF neurons with current-based synapses, for atotal of 8 populations: 2/3I, 2/3E, 4I, 4E, 5I, 5E, 6I and 6E. The number of neurons in eachpopulation, the connection probability matrix and the rates of the external Poisson inputs6 IG. 1: a) Example of spike delivery through the spike buﬀer. At time t, the i-th neuron emits aspike which is inserted in the spike buﬀer. In this example, the buﬀer contains also another spikeemitted previously. At each time step, the spike time index is incremented by 1. When itbecomes equal to the delay of some connection group, the spike is delivered to that group and itsconnection group index is incremented by 1. b) The spike array. When the time index of a spikematches the delay of a connection group, the spike is sent to the spike array, which is used fordelivering the spike to all neurons of the connection group. IG. 2:

Schematic diagram of the Potjans-Diesmann cortical microcircuit model. are based on the integration of anatomical and physiological data mainly from cat V1 andrat S1. The total number of connections is about · . Figure 2 shows a diagram of themodel with a schematic representation of the connections having probabilities > 0.04.The LIF neuron model, used in the cortical microcircuit, is one of the simplest spikingneuron models. The neuron dynamics is modeled by the following diﬀerential equation τ m dV i dt = − ( V i − E L ) + R m I syn ,i (1)where V i ( t ) represents the membrane potential of neuron i at time t , τ m is the membranetime constant, E L is the resting membrane potential, R m is the membrane resistance and I syn ,i is the synaptic input current. In the exponential shaped postsynaptic currents (PSCs)model, which will be used to simulate the Potjans-Diesmann cortical microcircuit model,the input current is described by the following equation τ syn dI syn ,i dt = − I syn ,i + (cid:88) i w ij (cid:88) t fj δ ( t − t fj ) (2)where τ syn is the synaptic time constant, w ij are the connection weights and t fj are the spiketimes from presynaptic neuron j . 8 IG. 3:

Schematic diagram of the balanced network used in the simulations.

D. The balanced network model

The performance of the library was also assessed on a balanced network of sparsely con-nected excitatory and inhibitory neurons , using the AdEx neuron model with conductance-based synapses and synaptic conductance modeled by a alpha function . Both populationsof excitatory and inhibitory neurons are stimulated by an external Poissonian signal, asshown in Fig. 3. Simulations have been made with a variable number of neurons andconnections, with up to 1,000,000 neurons and connections. Table I represents theparamenter used for the balanced network simulations. The AdEx model represents anattractive neuron model for use in large-scale network simulations, because it is relativelysimple compared to biologically detailed spiking neuron models, nonetheless it provides agood level of realism in representing the spiking behavior of biological neurons in manyconditions, in the sense that it ﬁts well the response of neurons as measured from electro-physiological recordings . This model is described by a system of two diﬀerential equations.The ﬁrst equation describes the dynamics of the membrane potential V(t) and includes anactivation term with an exponential voltage dependence C dVdt = − g L ( V − E L ) + g L ∆ T e V − VT ∆ T + I syn ( V, t ) − ω + I e (3)9 arameter Value N ex (n. of excitatory neurons) variable N in (n. of inhibitory neurons) N ex / CE (n. of input excitatory synapses per neuron) variable CI (n. of input inhibitory synapses per neuron) CE/ W ex (excitatory connection weight) 0.05 W in (inhibitory connection weight) 0.35Mean delay 0.5 msDelay STD 0.25 ms W poisson (Poisson signal weight) 0.37 Rate poisson (Poisson signal rate) 20,000 HzNeuron average ﬁring rate 30.7 Hz

TABLE I:

Values of the parameters used for the balanced network simulations. where the synaptic current is I syn ( V, t ) = (cid:88) i g i ( t )( V − E rev ,i ) (4) C is the membrane capacitance, g L is the leak conductance, E L is the leak reversal potential, ∆ T is a slope factor, V T is the spike initiation threshold, ω is the spike-adaptation current, I e is an external input current, g i ( t ) are the synaptic conductances and E rev ,i are the reversalpotentials. The voltage is coupled to a second equation which describes adaptation τ ω dωdt = a ( V − E L ) − ω (5)where τ ω is the adaptation time-constant and a is the subthreshold adaptation parameter.When the neuron ﬁres a spike, the adaptation current ω changes in ω → ω + b , where b isa spike-triggered adaptation parameter, while the membrane potential changes in V → V r .Table II reports the AdEx parameter values that have been used for the balanced networksimulations. 10 arameter Value C (Membrane capacitance) 281 pF g L (leak conductance) 30 nS E L (leak reversal potential) -70.6 mV V T (spike initiation threshold) -50.4 mV ∆ T (slope factor) 2 mV τ w (adaptation time constant) 144 ms a (subthreshold adaptation) 4 nS b (spike-trigger adaptation) 80.5 pA V r (reset value of V m after a spike) -60 mV E ex (excitatory reversal potential) 0 mV E in (inhibitory reversal potential) -85 mV τ syn (synaptic time constant) 1 ms TABLE II:

Values of the AdEx parameters used in the balanced network simulations.

III. RESULTS AND DISCUSSION

The cortical microcircuit model and the balanced network described in the previous sec-tion were used both to verify the correctness of the simulations performed using NeuronGPUand to compare the performance of the proposed library with those of NEST version 2.20.0and GeNN version 3.2.0. For this purpose, we used a PC with a CPU Intel core I9-9900Kwith with a frequency of 3.6GHz and 8 cores featuring hyperthreading with two threads percore, for a total number of 16 hardware threads, 64 GB RAM, and a GPU card NVIDIAGeForce RTX 2080 Ti with 11GB of GDDR6 VRAM, 4352 CUDA cores and a boost clock of1635 MHz. NeuronGPU and GeNN simulations were also performed on a system equippedwith a NVIDIA Tesla V100 GPU with 16 GB GPU memory and 5120 CUDA cores.

A. Simulation of the cortical microcircuit model

Following the procedure proposed by van Albada et al. and by Knight and Nowotny ,in this section we will verify the correctness of the simulations by comparing some relevant11tatistical distributions extracted from the simulations of the Potjans-Diesmann corticalmicrocircuit model made using NeuronGPU with the analogous distributions obtained usingthe NEST simulator. Subsequently, still following the same line of ref. and , the corticalmicrocircuit model will be used as a benchmark to evaluate the performance of NeuronGPUin terms of building time and simulation time per unit of time of biological activity.The Python code used for simulations, available in https://github.com/golosio/NeuronGPU/tree/master/python/Potjans_2014 it is almost identical to the NEST imple-mentation of the model ( https://nest-simulator.readthedocs.io/en/stable/microcircuit/ ).Figure 4 shows a raster plot of the spike times of neurons from each population of the model,simulated using NEST and NeuronGPU, in a time window of 200 ms.In order to verify the correctness of the simulations, we simulated 11 s of biological activityof the full-scale Potjans-Diesmann model both with NeuronGPU and with NEST, with atime step of 0.1 ms. For both simulators we performed 10 simulations, distinct from eachother only for the initial seed for random number generation. As in ref. and , the ﬁrstsecond was discarded in order to eliminate transient trends. The spike times of all neuronshave been recorded during the simulations, and subsequently they have been used to extractthree distributions for each population, namely: • the average ﬁring rate of the single neuron; • the coeﬃcient of variation of the inter-spike time interval (CV ISI), deﬁned as the ratiobetween the standard deviation and the average of the inter-spike time intervals; • the Pearson correlation between the spike trains.The latter has been computed on a subset of 200 neurons for each population, as in ref. and . This number represents a compromise between statistical precision and computationtime. The spike trains of those neurons have ﬁrst been rebinned to a time step of 2 ms,equal to the refractory time. Denoting the binned spike trains as b i and their mean value as µ i , the correlation coeﬃcient between the spike trains b i and b j is deﬁned as C [ i, j ] = < b i − µ i , b j − µ j > / (cid:112) < b i − µ i , b i − µ i > · < b j − µ j , b j − µ j > where <, > represents the scalar product. For 200 spike trains, a 200x200 correlation matrixis returned. The Pearson correlation distribution is evaluated as the distribution of the oﬀ-diagonal elements of this matrix. All distributions have been evaluated from the spike time12 IG. 4:

Raster plot showing spike times (dots) of neurons from each population of the corticalmicrocircuit model, simulated using NEST (top) and NeuronGPU (bottom), in a time window of200 ms (in blue the excitatory and in red the inhibitory). Due to the high number of neurons inthe model, only the spikes of one neuron out of ten are shown. recordings using the Elephant (Electrophysiology Analysis Toolkit) package , dedicatedto the analysis of electrophysiological data in the Python environment. The distributionshave been smoothed using the KDE (Kernel Density Estimation) method , available inthe sklearn Python library through the function sklearn.neighbors.KernelDensity . TheKDE method allows to estimate the probability density of a random variable with a reduceddependence on random ﬂuctuations linked to individual simulations. In particular, each ofthe N points belonging to a sample is represented by a Gaussian function of suitable width,13alled kernel bandwidth. The integral of each of these functions is normalized to /N ; theoverall distribution is therefore estimated as the sum of all these Gaussians, and obviouslyit has an integral normalized to one. The kernel bandwidth has been optimized using theso-called Silverman’s rule , which prescribes a bandwidth value of b = 0 . · min (cid:18) ˆ σ, IQR . (cid:19) · N − (6)where ˆ σ is the standard deviation of the samples, N is the sample size and IQR is theinterquartile range. It should be observed that the distributions obtained through the KDEmethod are continuous functions, since they are evaluated as the sum of a set of Gaussianfunctions. Fig.s 5, 6 and 7 show the distributions of the ﬁring rate, the CV ISI and thePearson correlation coeﬃcient, respectively, for the 8 populations of the Potjans-Diesmannmodel, averaged over 10 simulations made using NEST or NeuronGPU. As can be seenin the graphs, the distributions obtained from the two simulators are very similar to eachother. In order to compare the distributions obtained using NeuronGPU to those obtainedusing NEST, we evaluated the Kullback-Leibler (KL) divergence , deﬁned as D KL ( p , p ) = − (cid:80) i p ,i log( p ,i /p ,i ) , where p and p are two distributions, and the index i runs on thesampling points of the two distributions. For this purpose, we used 10 pairs of simulations(NeuronGPU-NEST and NEST-NEST) using diﬀerent seeds for random number generation.The KL divergence was then calculated for each pair and its average and standard deviationwere calculated on the 10 pairs. Since the KDE method provides a smooth continuousfunction, the result is not sensitive to the sampling step as long as this is small enough.The KL divergence was evaluated using the Python scientiﬁc library and in particular the scipy.stats.entropy function. Table III shows the average and standard deviation of theKL divergences between the distributions of ﬁring rates, CV ISI and Pearson correlation,obtained from NEST and from NeuronGPU simulations, for the eight populations of thecortical microcircuit model. It can be observed that the KL divergence between distributionsobtained from NEST and from NeuronGPU are perfectly compatible with the divergencebetween distributions obtained from NEST simulations with diﬀerent seeds.To compare the performance of NeuronGPU with those of NEST and GeNN, we per-formed a series of 10 simulations of 10 s of biological activity of the cortical microcircuitwith each simulator, using diﬀerent seeds for random number generation. The executiontime of the simulations can be divided into building time and simulation time of biologi-14 IG. 5:

Firing rate distributions for the eight populations of the cortical microcircuit model,averaged over 10 simulations made using NEST (blue) or NeuronGPU (red). IG. 6:

Distribution of the coeﬃcient of variation of interspike intervals (CV ISI) for the eightpopulations of the cortical microcircuit model, averaged over 10 simulations made using NEST(blue) or NeuronGPU (red). IG. 7:

Distribution of the Pearson correlation coeﬃcient of the spike trains for the eightpopulations of the cortical microcircuit model, averaged over 10 simulations made using NEST(blue) or NeuronGPU (red). KL of the ﬁring rate, CV ISI and Pearson correlation distributions D KL ﬁring rate D KL CV ISI D KL Pearson correlationPopulation NEST, NeuronGPU NEST, NEST NEST, NeuronGPU NEST, NEST NEST, NeuronGPU NEST, NESTL2/3E . ± . . ± . . ± . . ± . . ± .

014 0 . ± . L2/3I . ± . . ± . . ± .

001 0 . ± . . ± .

011 0 . ± . L4E . ± . . ± . . ± . . ± . . ± .

002 0 . ± . L4I . ± . . ± . . ± . . ± . . ± .

002 0 . ± . L5E . ± . . ± . . ± .

001 0 . ± . . ± .

005 0 . ± . L5I . ± .

002 0 . ± . . ± .

004 0 . ± .

002 0 . ± .

002 0 . ± . L6E . ± . . ± . . ± . . ± . . ± .

003 0 . ± . L6I . ± . . ± . . ± .

002 0 . ± . . ± .

006 0 . ± . TABLE III:

Kullback-Leibler divergence between the distributions of the ﬁring rate, coeﬃcientof variation of interspike intervals (CV ISI) and Pearson correlation coeﬃcient, extracted fromNEST and NeuronGPU simulations. The columns labeled as

NEST-NeuronGPU show theaverage values and the standard deviations of the divergence between NEST and NeuronGPU,while the columns labeled as

NEST-NEST show the same values for NEST simulations withdiﬀerent seeds. cal activity. The building time includes the time needed to allocate memory for neurons,devices and connections, to build connections and to initialize the values of state variablesand parameters. Figure 8(a) shows the building time for NEST and NeuronGPU. On asystem equipped with an Intel core i9-9900K CPU, the building times were . ± . sand . ± . s for NEST and NeuronGPU respectively. Figure 8(b) shows the time forcode generation, compilation and initialization of the cortical microcircuit model for GeNN.The building time of NeuronGPU is comparable to that of NEST. This is due to the factthat in NeuronGPU the connections are initially created in the RAM, and only immedi-ately before the simulation they are copied from RAM to GPU memory. Compared tomost GPU-based simulators, NeuronGPU oﬀers a wide range of choices for connection rulesand connection parameter distributions, which can be exploited runtime and interactivelythrough the Python interface. It is easier to manage these connection rules and these dis-tributions on the CPU side, also thanks to the functions provided by the standard C++library. In both NEST and NeuronGPU the model parameters, the neuron populations andthe network architecture are deﬁned at runtime, and the memory they need is allocated18 IG. 8: (a) Building time of the cortical microcircuit model simulated with NEST and withNeuronGPU on a system equipped with an Intel core i9-9900K CPU. (b) Times for codegeneration, compilation and initialization of the cortical microcircuit model for GeNN. The ﬁrsttwo phases are performed by the CPU, and the times refer to a system equipped with an Intelcore i9-9900K. The third phase is mainly performed by the GPU, and the ﬁgure shows the timefor a NVIDIA Tesla V100. (b),(c) Simulation times per second of biological time of the corticalmicrocircuit model simulated with NEST, NeuronGPU and GeNN on various CPU and GPUhardware. The horizontal line represents the biological time. dynamically. On the other hand, GeNN uses a code-generation approach. The model pa-rameters, neuron populations and architecture are deﬁned using code fragments similar toC/C++, from which the CUDA/C ++ code of the model is generated. This code must becompiled before execution. Any changes in the parameters, neuron populations or networkarchitecture require a new generation and compilation of the code. Once the code is gen-erated and compiled, the initialization is very fast as it is carried out directly by the GPUwith parallel computing algorithms.Figures 8 (c) and (d) show the simulation times per unit time of biological activity forNeuronGPU, NEST and GeNN on diﬀerent CPU and GPU platforms. The simulation timeper second of biological time with NEST running on the Intel core i9-9900K CPU was19 . ± . s. On a system equipped with a NVIDIA Tesla V100 GPU card, the simulationtime per second of biological time with GeNN was 2.16 s. NeuronGPU was 31.6% fasterthan GeNN, with a simulation time of . ± . s on the same GPU. On a NVIDIARTX 2080 Ti GPU card, the simulation time per second of biological time with GeNN was . ± . s, while NeuronGPU was 32.5% faster with a simulation time of . ± . s. B. Simulation of the balanced network model

Figure 9(a) shows the time course of the membrane voltage of an AdEx neuron with theparameter values reported in Table II and an injected current of 700 pA, simulated withNeuronGPU and with NEST. With the exception of the peaks, the two plots appear to beperfectly superimposed on this scale. Figure 9(b) represents the diﬀerence between the twosignals simulated with NEST and with NeuronGPU. With the exception of the peaks, thediﬀerence is in the order of a few − mV. Figure 10 shows the time course of the membranevoltage of an AdEx neuron stimulated by three input spikes on three diﬀerent receptor portsin a subthreshold condition, simulated with NeuronGPU and with NEST. In the remainingpart of this section we present the results of simulations of the balanced network illustratedin Fig. 3, with the parameters shown in Table 3.Figure 11(a) shows the building time for the balanced network simulations as a functionof the number of neurons, for a ﬁxed number of 1000 input connections per neuron. Figure20 IG. 9:

Membrane voltage of an AdEx neuron with the parameter values reported in Table IIand an injected current of 700 pA, simulated with NeuronGPU and with NEST (a) and diﬀerencebetween the two simulation signals (b) connections to 30.4x for neurons with connections.Figure 12(a) shows the building time as a function of the number of connections perneurons for a ﬁxed total number of neurons, which was set to 30,000. Fig. 12(b) representsthe simulation time as a function of the number of connection per neurons. It can beobserved that in this case GPU simulations are faster than the CPU’s by a factor rangingfrom about 16x for 30,000 neurons with · connections to about 27x for 30,000 neuronswith · connections. IV. CONCLUSION

In this work we evaluated the performance of the NeuronGPU library on the simulation ofthe full-scale Potjans-Diesmann cortical microcircuit model using the LIF neuron model, andon the simulation of balanced networks of excitatory and inhibitory neurons, with variablenumbers of neurons and connections, using the AdEx neuron model. The building time21

IG. 10:

Membrane voltage of an AdEx neuron stimulated by three input spikes in asubthreshold condition, simulated using NeuronGPU and NEST. of the cortical microcircuit model simulated using NeuronGPU was comparable to that ofNEST, mainly because in the current version of NeuronGPU the connections are createdin the RAM and only immediately before the simulation loop they are copied to the GPUmemory. On the other hand, NeuronGPU achieved a simulation time per second of biologicalactivity of 1.64 s on a NVIDIA Tesla V100 GPU and of 1.055 s on a NVIDIA RTX 2080 TiGPU, about 32% faster than GeNN and very close to biological time.The building time of the balanced network simulated using NeuronGPU was about twiceas large as that of NEST. However, NeuronGPU was faster than NEST in terms of simulationtime per second of biological activity by a factor ranging from about 16x for smaller networksto about 30x for networks with connections. In future releases of the library, the buildingtime could signiﬁcantly be reduced by creating the connections directly in the GPU memory,exploiting the parallel computing capabilities of the GPU and avoiding the bottleneck ofmemory transfer from RAM to GPU memory. On the other hand, the high simulationspeed demonstrated by the proposed library, signiﬁcantly higher than that of other CPUand GPU based simulators, combined with the availability of a wide range of neuron models,devices and connection rules, makes this library particularly useful for simulations of largespiking neural networks over relatively long biological times.22 a) Building time (b)

Simulation time (c)

GPU simulation time

FIG. 11:

Building time (a) and simulation time (b) for the balanced network simulations with avariable number of neurons and a ﬁxed number of 1000 input connections per neuron, simulatedusing NeuronGPU and NEST, and simulation time for NeuronGPU shown on a diﬀerent scale (c).

ACKNOWLEDGMENTS

We are grateful to Prof. Hans Ekkehard Plesser and to Dr. Tanguy Fardet for theirrevision of the aeif_cond_beta_multisynapse model in the NEST simulator, which was thebasis for the implementation of the same model in NeuronGPU. We would also like to thankProf. Plesser, Prof. Markus Diesmann, Dr. Alexander Peyser and Dr. Wouter Klijn for theuseful discussions on the dynamics of spiking neural networks, the use of CPU and GPUclusters and the spike delivery algorithms. This work has been supported by the European23 a) Building time (b)

Simulation time (c)

GPU simulation time

FIG. 12:

Building time (a) and simulation time (b) for the balanced network simulations with aﬁxed number of 30000 neurons and a variable number of input connections per neuron, simulatedusing NeuronGPU and NEST, and simulation time for NeuronGPU shown on a diﬀerent scale (c).

Union Horizon 2020 Research and Innovation program under the FET Flagship HumanBrain Project (grant agreement SGA3 n. 945539 and grant agreement SGA2 n. 785907)and by the INFN APE Parallel/Distributed Computing laboratory.

REFERENCES Tanguy Fardet et al. NEST 2.20.0, 2020. Nicholas T. Carnevale and Michael L. Hines.

The NEURON Book . Cambridge UniversityPress, 2006. 24

Romain Brette and Dan Goodman. Brian: a simple and ﬂexible simulator for spikingneural networks.

The Neuromorphic Engineer , July 2009. Giacomo Indiveri, Bernabé Linares-Barranco, Tara Julia Hamilton, André van Schaik,Ralph Etienne-Cummings, Tobi Delbruck, Shih-Chii Liu, Piotr Dudek, Philipp Häﬂiger,Sylvie Renaud, Johannes Schemmel, Gert Cauwenberghs, John Arthur, Kai Hynna, Fope-folu Folowosele, Sylvain Saighi, Teresa Serrano-Gotarredona, Jayawan Wijekoon, YingxueWang, and Kwabena Boahen. Neuromorphic silicon neuron circuits.

Frontiers in Neuro-science , 5, 2011. Runchun M. Wang, Chetan S. Thakur, and André van Schaik. An FPGA-based massivelyparallel neuromorphic cortex simulator.

Frontiers in Neuroscience , 12, apr 2018. Romain Brette and Dan F. M. Goodman. Simulating spiking neural networks on GPU.

Network: Computation in Neural Systems , 23(4):167–182, oct 2012. Jason Sanders and Edward Kandrot.

CUDA by Example: An Introduction to General-Purpose GPU Programming . Addison-Wesley, Upper Saddle River, NJ, 2010. Jesus A. Garrido, Richard R. Carrillo, Niceto R. Luque, and Eduardo Ros. Event andtime driven hybrid simulation of spiking neural networks. In

Advances in ComputationalIntelligence , pages 554–561. Springer Berlin Heidelberg, 2011. Julien Vitay, Helge Ü. Dinkelbach, and Fred H. Hamker. ANNarchy: a code generationapproach to neural simulations on parallel hardware.

Frontiers in Neuroinformatics , 9,July 2015. Ting-Shuo Chou, Hirak J Kashyap, Jinwei Xing, Stanislav Listopad, Emily L Rounds,Michael Beyeler, Nikil Dutt, and Jeﬀrey L Krichmar. CARLsim 4: An open source libraryfor large scale, biologically detailed spiking neural network simulation using heterogeneousclusters. In . IEEE, July2018. James C. Knight and Thomas Nowotny. GPUs outperform current HPC and neuromorphicsolutions in terms of speed and energy when simulating a highly-connected cortical model.

Frontiers in Neuroscience , 12, December 2018. Tobias C. Potjans and Markus Diesmann. The cell-type speciﬁc cortical microcircuit:Relating structure and activity in a full-scale spiking network model.

Cerebral Cortex ,24(3):785–806, December 2014. 25 Bruno Golosio, Chiara De Luca, Elena Pastorelli, Francesco Simula, Gianmarco Tiddia,and Pier Stanislao Paolucci. Toward a possible integration of NeuronGPU in NEST. In

NEST Conference 2020 , page 7, 2020. Nicolas Brunel. Dynamics of sparsely connected networks of excitatory and inhibitoryspiking neurons.

Journal of Computational Neuroscience , 8(3):183–208, 2000. Romain Brette and Wulfram Gerstner. Adaptive exponential integrate-and-ﬁre model asan eﬀective description of neuronal activity.

Journal of Neurophysiology , 94(5):3637–3642,nov 2005. Stefan Rotter and Markus Diesmann. Exact digital simulation of time-invariant linearsystems with applications to neuronal modeling.

Biological Cybernetics , 81(5-6):381–402,nov 1999. William H. Press and Saul A. Teukolsky. Adaptive stepsize runge-kutta integration.

Com-puters in Physics , 6(2):188, 1992. Abigail Morrison, Markus Diesmann, and Wulfram Gerstner. Phenomenological modelsof synaptic plasticity based on spike timing.

Biological Cybernetics , 98(6):459–478, May2008. A. Roth and M.C.W. van Rossum. Modeling synapses. In E. D. Schutter, editor,

Com-putational Modeling Methods for Neuroscientists , chapter 6, pages 266–290. MIT Press,Cambridge, MA, 2013. Sacha J. van Albada, Andrew G. Rowley, Johanna Senk, Michael Hopkins, MaximilianSchmidt, Alan B. Stokes, David R. Lester, Markus Diesmann, and Steve B. Furber. Per-formance comparison of the digital neuromorphic hardware SpiNNaker and the neuralnetwork simulation software NEST for a full-scale cortical microcircuit model.

Frontiersin Neuroscience , 12, May 2018. Alper Yegenoglu et al. Elephant (electrophysiology analysis toolkit) 0.7.0, 2020. Murray Rosenblatt. Remarks on some nonparametric estimates of a density function.

TheAnnals of Mathematical Statistics , 27(3):832–837, September 1956. Emanuel Parzen. On estimation of a probability density function and mode.

The Annalsof Mathematical Statistics , 33(3):1065–1076, September 1962. B. W. Silverman.

Density estimation for statistics and data analysis . Chapman and Hall,London, 1986. 26 S. Kullback and R. A. Leibler. On information and suﬃciency.