An Online Unsupervised Structural Plasticity Algorithm for Spiking Neural Networks
11 An Online Unsupervised Structural Plasticity Algorithm for Spiking Neural Networks
Subhrajit Roy,
Student Member, IEEE and Arindam Basu,
Member, IEEE
Abstract —In this article, we propose a novel Winner-Take-All(WTA) architecture employing neurons with nonlinear dendritesand an online unsupervised structural plasticity rule for trainingit. Further, to aid hardware implementations, our networkemploys only binary synapses. The proposed learning rule isinspired by spike time dependent plasticity (STDP) but differsfor each dendrite based on its activation level. It trains theWTA network through formation and elimination of connectionsbetween inputs and synapses. To demonstrate the performanceof the proposed network and learning rule, we employ it tosolve two, four and six class classification of random Poissonspike time inputs. The results indicate that by proper tuningof the inhibitory time constant of the WTA, a trade-off betweenspecificity and sensitivity of the network can be achieved. We usethe inhibitory time constant to set the number of subpatterns perpattern we want to detect. We show that while the percentageof successful trials are 92%, 88% and 82% for two, four andsix class classification when no pattern subdivisions are made,it increases to 100% when each pattern is subdivided into 5or 10 subpatterns. However, the former scenario of no patternsubdivision is more jitter resilient than the later ones.
I. I
NTRODUCTION AND M OTIVATION
The WTA is a computational framework in which a groupof recurrent neurons cooperate and compete with each otherfor activation. The computational power of WTA [1]–[3] andits function in cortical processing [1], [4] have been studied indetail. Various models and hardware implementations of WTAhave been proposed for both rate [5]–[12] and spike based[13]–[15] neural networks. In recent past, researchers havelooked into the application of STDP learning rule on WTAcircuits. The performance of competitive spiking neuronstrained with STDP has been studied for different types of inputsuch as discrete spike volleys [16]–[18], periodic inputs [19],[20] and inputs with random intervals [15], [21], [22].In this paper, for the first time we are proposing aWinner-Take-All (WTA) network which uses neurons withnonlinear dendrites (NNLD) and binary synapses as the ba-sic computational units. This architecture, which we referto as Winner-Take-All employing Neurons with NonLinearDendrites (WTA-NNLD), uses a novel branch-specific SpikeTiming Dependent Plasticity based Network Rewiring (STDP-NRW) learning rule for its training. We have earlier presented[23] a branch-specific STDP rule for batch learning of asupervised classifier constructed of NNLDs. The primarydifferences of our current approach with [23] are:
Manuscript received Nov 15, 2014The authors are with the School of Electrical and ElectronicEngineering, Nanyang Technological University, Singapore 639798.(e-mail:[email protected]). This work was supported by MOE throughgrants RG 21/10 and ARC 8/13.Copyright (c) 2010 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending an email to [email protected]. • We present an unsupervised learning rule for training aWTA network. • We propose an online learning scheme where connectionmodifications occur after presentation of each pattern.In this article we consider spike train inputs with patternsoccurring at random order which is the same type presented in[15]. The primary differences between our work and the oneproposed in [15] are: • Our WTA network is composed of neurons with nonlineardendrites instead of traditional neurons with no dendrites. • Unlike the network proposed in [15] that requires highresolution weights, the proposed network uses low-resolution non-negative integer weights and trains itselfthrough modifying connections of inputs to dendrites.Hence, change of the ‘morphology’ or structure of theneuron (in terms of connectivity pattern) reflects thelearning. This results in easier hardware implementationsince a low-resolution non-negative integer weight of W can be implemented by activating a shared binary synapse W times through time multiplexing schemes like addressevent representation (AER) [24], [25]. • In [15], though the neurons were allowed to learn andrespond to subpatterns, there was no actual guideline orcontrol parameter to set the number of subpatterns to belearned. Here we utilize the slow time constant of theinhibitory signal to select the number of subpatterns wewant to divide a pattern into.In the following section, we will present a overview ofNNLD, propose the WTA-NNLD architecture and STDP-NRW learning rule and show how the inhibitory slow timeconstant can be used to select subpatterns within a pattern.Then we shall provide guidelines on selecting the parametersassociated with WTA-NNLD and STDP-NRW. In Section IVwe will describe the classification task considered in thisarticle which will be followed by the results. We will alsopresent the robustness of the proposed method to variationsof parameters in Section V, a quality that is essential forits implementation in low-power, subthreshold neuromorphicdesigns that are plagued with mismatch. We will conclude thepaper by discussing the implications of our work and futuredirections in the last section.II. B
ACKGROUND AND T HEORY
In this section, we shall first present the working principle ofa NNLD. This will be followed by a description of the WTA-NNLD architecture and STDP-NRW learning rule. Lastly, wewill throw some light on the role of inhibitory time constantin balancing the specificity and sensitivity of the network. a r X i v : . [ c s . N E ] D ec m d m s e h c n a r b c iti r dn e d d inputs b() ∑ f(t) Fig. 1: A neuronal cell with nonlinear dendrites
A. Neuron with nonlinear dendrites (NNLD)
The computational model of NNLD was first proposed byMel et al. in [26]. They showed that such neurons have higherstorage capacity than their non-dendritic counterparts. Theyused two such NNLDs to construct a supervised classifierand demonstrated its performance in pattern memorization.Recently NNLD has also been employed to develop com-putationally powerful rate [27] and spike based [28], [29]supervised classifiers. The structure of NNLD is isomorphicto a feedforward spiking neural network with a single layerof hidden neurons and one output neuron [30]. The lumpeddendritic nonlinearities b () are equivalent to the hidden neu-rons interposed between the input and output layers. However,spiking neurons implement nonlinear thresholding, integration,refractory period etc. Hence, it is typically a much largercircuit compared to the square law nonlinearity of a dendrite,which makes the NNLD an area efficient architecture.As depicted in Fig. 1, a NNLD consists of m dendriticbranches having lumped nonlinearities, with each branch con-taining k excitatory synaptic contact points of weight 1. If weconsider a d dimensional input pattern, then each synapse isdriven by any one of these input dimensions where d >> k .We use the Leaky Integrate-and-Fire (LIF) model to generateoutput spikes. Thus, the neuronal membrane voltage is guidedby the following differential equation: C dV ( t ) dt + V ( t ) R = I in ( t ) If V ( t ) ≥ V thr , V ( t ) →
0; & f ( t ) → else f ( t ) = 0 (1)where V ( t ) , V thr , I in ( t ) and f ( t ) are the membrane voltage,threshold voltage, input current and output spikes of the NNLDrespectively. Let us denote the input spike train arriving at the i th input line as e i ( t ) which is given by: e i ( t ) = (cid:88) g δ ( t − t ig ) (2)where g = 1 , , ... is the label of the spike. Then, the inputcurrent I in ( t ) to the neuron can be calculated as: I in ( t ) = m (cid:80) j =1 I jb,out ( t ) (3) I jb,out ( t ) = b ( I jb,in ( t )) (4) I jb,in ( t ) = d (cid:80) i =1 w ij ( (cid:80) t ig We propose a spike based WTA network, depicted in Fig. 2,which is composed of N such NNLDs. Each NNLD is com-posed of m dendrites, where each dendrite chooses (repetitionallowed) k of the d available input lines and connects to k synapses having weight 1. The membrane voltage, thresholdvoltage and input current of the n th NNLD are denoted by V n ( t ) , V thr and I nin ( t ) respectively and their dynamics isgoverned by Equation 1. For the n th NNLD, while the pre-synaptic spike-train arriving at the i th input line is denotedby e i ( t ) as before, the emitted output spike train is given by f n ( t ) = (cid:80) a δ ( t − t na ) . Note that for any applied input pattern, t na is measured from its beginning.We have modelled the effect of lateral inhibition by provid-ing each NNLD with a global inhibitory current signal I inh ( t ) supplied by a single inhibitory neuron N inh through synapses.The signal I inh ( t ) is provided by the inhibitory neuron to allthe NNLD whenever any one of them fires an output spike. I inh ( t ) is modeled as I inh ( t ) = K inh ( t − t nlast ) , when the lastpost-synaptic spike is produced by the n th NNLD at t nlast . Theinhibitory post-synaptic kernel, K inh , is given by: K inh ( t ) = I ,inh ( e − tτs,inh − e − tτf,inh ) (8)where τ f,inh and τ s,inh are the fast and slow time constantsdictating the rise and fall times of the inhibitory currentrespectively and I ,inh sets its amplitude. ∑ ∑ f (t) b() Input Spike Trains ∑ ∑ b() m D L NN r e p s e ti r dn e d N inh I inh (t)I inh (t)I inh (t)I inh (t) Excitatory ConnectionInhibitory Connection f (t)f (t)f N (t) Fig. 2: A spike based WTA network employing neurons with lumpeddendritic nonlinearities as the competing entities. For implementinglateral inhibition, a inhibitory neuron has been included which, uponactivation, provides a global inhibition signal to all the NNLDs. C. Spike Timing Dependent Plasticity based Network Re-Wiring learning rule (STDP-NRW) Since we consider binary synapses with weight or , we donot have the provision to keep real valued weights associatedwith them. Hence, to guide the unsupervised learning, wedefine a correlation coefficient based fitness value c npj ( t ) forthe p th synaptic contact point on the j th dendrite of the n th NNLD of the WTA network, as a substitute for its weight.In the proposed algorithm, structural plasticity or connectionmodifications happen in longer timescales (at the end ofpatterns) which is guided by the fitness function c npj ( t ) updatedby a STDP inspired rule in shorter timescales (at each pre-and post-synaptic spike). The operation of the network andlearning process comprises the following steps whenever apattern is presented: • c npj ( t ) is initialized as c npj ( t = 0) = 0 ∀ p =1 , , ...k ; j = 1 , , ..., m & n = 1 , , ..., N . • The value of c npj ( t ) is depressed at pre-synaptic andpotentiated at post-synaptic spikes according to the fol-lowing rule:1) Depression: If the pre-synaptic spike occurs at the p th synapse on the j th dendritic branch of the n th NNLD at time t pre , then the value of c npj ( t ) at t = t pre is updated by a quantity ∆ c npj ( t = t pre ) givenby: ∆ c npj ( t ) = − b (cid:48) j ( t ) ¯ f n ( t ) (cid:12)(cid:12) t = t pre (9)where ¯ f n ( t ) = K ( t ) ∗ f n ( t ) is the post-synaptic trace of the n th NNLD and b (cid:48) () denotes derivativeof the nonlinear function b () .2) Potentiation: If the n th NNLD of the WTA-NNLDnetwork fires a post-synaptic spike at time t post then c npj ( t ) at t = t post ∀ p = 1 , , ...k ; j = 1 , , ..., m i.e. for each synapse connected to the n th NNLD isupdated by ∆ c npj ( t = t post ) given by: ∆ c npj ( t ) = b (cid:48) j ( t ) ¯ e i ( t ) (cid:12)(cid:12) t = t post (10)where ¯ e i ( t ) = K ( t ) ∗ e i ( t ) is the pre-synaptic traceof the corresponding input line connected to it.A pictorial explanation of this update rule of c npj ( t ) isshown in Fig. 3. Note that for a square law nonlinearity, b (cid:48) ( z ) ∝ z and hence can be easily computed in hardwarewithout requiring any extra circuitry to calculate thederivative. • During the presentation of the pattern whenever a spikeis produced by any of the N excitatory NNLDs, theinhibitory neuron N inh sends an inhibitory signal to allthe NNLDs of the WTA. • After the network has been integrated over the currentpattern of duration T p , the synaptic connections of theNNLDs which have produced at least one spike aremodified. • If we consider that Q out of N NNLDs have producedpost-synaptic spike/spikes for the current pattern, then theconnectivity of the q th NNLD ∀ q = 1 , ..., Q is updatedby tagging the synapse ( s qmin ) having the lowest value ofcorrelation coefficient at t = T p out of the m × k synapsesconnected to it for possible replacement. • To aid the unsupervised learning process, randomly cho-sen sets R q containing n R of the d input dimensionsare forced to make silent synapses of weight 1 on thedendritic branch of s qmin ∀ q = 1 , ..., Q . We termthese synapses as “silent” since they do not contributeto the computation of V n ( t ) - so they do not alter theclassification when the same pattern set is re-applied.The value of c qpj ( t = T p ) is calculated for synapsesin R q and the synapse having maximum c qpj ( t = T p ) in R q denoted by r qmax ∀ q = 1 , ..., Q is identified.Next, the input line connected to s qmin is swapped withthe input line connected to r qmax . Hence, instead ofthe traditional method of training by changing of high-resolution synaptic weights, our learning rule modifiesthe connections between the inputs and dendrites basedon the fitness values. • All the c npj ( t ) values are reset to zero and the above men-tioned steps are repeated whenever a pattern is presented.Here, we define an epoch for C class classification as aset of patterns consisting of one pattern from each of the C classes- in random order. We define another term l mean as the average of the latencies of the post-synaptic spikesin the network over time period of the last epoch whichis given by: l mean = < (cid:88) n (cid:88) a t na > (11) f n (t)c p jn (t) t pre1 t pre2 t post1 Δ c p jn (t post1 ) Δ c p jn (t pre2 ) ttttt t post2 Δ c p jn (t post2 ) f n (t) e (t) i e (t) i Fig. 3: An example of the update rule of fitness value ( c npj ( t ) ) isshown. When a post-synaptic spike occurs at t post the value of c npj ( t ) increases by b (cid:48) j ( t post )¯ e i ( t post ) . Due to the appearance of apre-synaptic spike at t pre , c npj ( t ) reduces by b (cid:48) j ( t pre ) ¯ f n ( t pre ) asshown in the figure. where < · > denotes averaging over one epoch. We notethe value of l mean for every epoch and the learning isconsidered to converge when the value of a ‘ConvergenceMeasure’ ( CM ) based on l mean reaches saturation. Wedefine our ‘Convergence Measure’ in Section III. D. Specificity and Sensitivity: Role of Inhibitory Time Con-stant When a pattern is presented to the WTA-NNLD and anyone of the N NNLDs produce an output spike, a globalinhibition current I inh ( t ) is injected into all the N NNLDs.The slow time constant τ s,inh of this signal controls the outputfiring activity of the WTA-NNLD. Typically, a large valueof τ s,inh (w.r.t to T p ) is set, and only one NNLD producesan output spike i.e. patterns of same class are encoded by asingle NNLD. The post-synaptic spike latency for a pattern P is defined as the time difference between the start ofthe pattern and the first spike produced by any one of the N neurons of WTA-NNLD. During training of WTA-NNLDfor this case, different NNLDs get locked onto differentclasses of pattern and the latency gradually decreases untilthe end of the training. Thus, after completion of training,the unique NNLDs which have learned different classes ofpattern rely only on the first few spikes (determined by thelatency at the end of training) to predict the pattern’s classthereby significantly reducing the prediction time [15]. So, thesensitivity of the network is increased. However, the problemswith this approach are: • The percentage of successful classifications can be lessdue to the strict requirement of different neurons firingbased only on first few spikes of different patterns (shownin Section IV). • Though the prediction time of a pattern’s class is sig-nificantly reduced, this method neglects most part of thepattern after the first few spikes which may lead to a lotof false detections.We demonstrate the limitation mentioned in the above pointby a simple example in Fig. 4. Let us consider we are perform-ing C class classification and assume that after the training Responding NNLD : N f1 Class 1 pattern WTA- NNLD Responding NNLD : N f1 WTA-NNLDRandom pattern Fig. 4: Specificity is reduced if only one NNLD encodes a patternbased on its first few spikes. As shown, a different pattern with asection resembling the beginning of class 1 pattern may cause neuron N f to respond. phase is complete, NNLD N f responds to patterns belongingto Class 1. NNLD N f has trained itself to provide an outputspike depending on the position of the first few spikes (redspikes in dashed box of Fig. 4) of the pattern. It neglectsthe rest of the pattern while providing a prediction. However,for longer patterns there is a chance that this spike set canoccur anywhere inside a random pattern (not belonging to anyclass or to another class). The same NNLD N f responds tosuch patterns by producing a post-synaptic spike. Thus, we seethat though trained WTA-NNLD is very sensitive in this case,it loses specificity. On the other hand, if we set a moderatevalue of τ s,inh , then for a single pattern multiple NNLDs arecapable of producing output spikes. Hence, patterns of thesame class are now encoded by a sequence of successive firingof few NNLDs where each NNLD fires for one subpattern.Let n sub be the number of subpatterns that is set by a properchoice of τ s,inh . Thus the original case of one NNLD firingfor each pattern corresponds to n sub = 1 . In this article, fora C class classification we define a successful trial as onein which (a) during the training phase WTA-NNLD learnsdifferent unique representations for patterns of different classesand (b) after completion of training and achieving successin (a), the network produces the same representation, whenpresented with testing patterns corresponding to classes that ithad learned during the training phase. When n sub = 1 i.e. nopattern subdivisions are made, this unique representation is adifferent neuron firing for different classes of patterns. When n sub > 1, the unique representation is a different sequenceof successive NNLDs firing for different classes of patterns.When, n sub > 1, we allow the NNLDs to detect subpatternswithin patterns. Since in this approach the WTA-NNLD givesweightage to the entire pattern before predicting its class, thenumber of false detections can be largely reduced. However,this method has a limitation of being less jitter resilient − one of the many subpatterns can be easily corrupt by noisyjitters in spike (shown in Section IV) and fail to produce aunique identifier during testing phase. Hence, the choice of n sub and consequently the inhibitory time constant dependson the amount of temporal jitter in the application. III. C HOICE OF P ARAMETERS The following is an exhaustive list of the parameters usedby WTA-NNLD and STDP-NRW:1) T p : Duration of a pattern2) d : Dimension of the input3) m : Number of dendrites per NNLD4) k : Number of synapses per dendrite5) n R : Number of input dimensions in replacement set6) τ s and τ f : Slow and fast time constant of excitatorycurrent kernel7) I : Normalization constant of excitatory current kernel8) τ s,inh and τ f,inh : Slow and fast time constant of in-hibitory current kernel9) I ,inh : Normalization constant of inhibitory current ker-nel10) x thr : Threshold of dendritic nonlinearity11) V thr : Firing threshold voltage of NNLD12) N : Number of NNLDs in WTA13) C : Number of classes of patternsWe will now provide some guidelines on choosing the keyparameters: a) Total number of synapses per NNLD (s): The numberof synapses allocated to each neuronal cell of WTA-NNLD arekept as equal to the dimension ( d ) of the input patterns. Thisis done to ensure NNLD uses the same amount of synapticresources as the simplest neuron–a perceptron. Thus, if theproposed network is comprised of N such neuronal cells thenthe total number of synaptic resources required are d × N . b) Number of dendrites per NNLD (m): In [26] a mea-sure of the pattern memorization capacity, B N , of the NNLD(Fig.1) has been provided by counting all possible functionsrealizable as: B N = log (cid:18)(cid:0) k + d − k (cid:1) + m − m (cid:19) bits (12)where m , k and d are the number of dendrites, number ofsynapses per dendrites and dimension of the input respectivelyfor this neuronal cell. When a new classification problem isencountered, we first note down the value of d , which in turnsets our s since we have considered s = d . Since s = m × k ,for a fixed s all possible values which m can take are factorsof s . We calculate B N for these values of m by Equation 12.The value of m for which B N attains its maxima is set as m in our experiment. As an example, we show in Fig. 5 thevariation of B N with m when d = 100 . It is evident from thecurve that the capacity is maximum when m = 25 and so inour simulations for classifying dimensional patterns weemploy neuronal cells having 25 dendrites. c) Number of synapses per branch (k): After s and m have been set, the value of k can be computed as k = sm . d) The normalization constant ( I ), slow ( τ s ) and fasttime constant ( τ f ) of excitatory PSC kernel: The fast timeconstant ( τ f ) and slow time constant ( τ s ) have been definedin Section II-C. In hardware implementation of a synapse [29] τ f usually takes a small positive value and is typically nottuned. The slow time constant, τ s , is responsible for integrationacross temporally correlated spikes and the performance of C a p ac it y ( b it s ) Fig. 5: The pattern memorization capacity of a NNLD ( B N ) is plottedas function of the number of dendrites ( m ) for a fixed number of inputdimensions ( d = 100 ) and synapses ( s = 100 ). the network is dependent on its value. If τ s takes too small avalue, then the post synaptic current due to individual spikesdies down rapidly and thus temporal integration of separatedinputs does not take place. On the other hand large valuesof τ s render all spikes effectively simultaneous. So, in bothextremes the extraction of temporal features from the inputpattern is hampered. In [29] we have provided a mathematicalformula for calculating τ s,opt , the optimal value of τ s , withrespect to the inter spike interval (ISI) of the input pattern forwhich optimal performance of the network is obtained. If weare considering d dimensional patterns and the mean firingrate of each dimension is µ f , then the mean ISI across theentire pattern is given by µ ISI = 1 / ( d × µ f ) . We can then set τ s,opt according to the formula: τ s,opt = 52 . µ ISI − . (13)In our simulations, we keep τ f as τ f = τ s . Since theweights of all the active synapses are , we set I = 1 . to normalize the amplitude of the PSC to be . e) Threshold of nonlinearity ( x thr ): During the trainingof WTA-NNLD, the STDP-NRW rule preferably selects thoseconnection topologies where correlated inputs for synapticconnections are connected to the same branch. Thus, thelumped dendritic nonlinearity b ( z ) = z x thr should give asupra-linear output only when correlated input dimensions areconnected to the dendrite. To ensure this we keep the valueof x thr equal to the average input to the nonlinear function incase of random connections. We create numerous instances ofdendrites having k synapses and calculate the average inputto the nonlinear function, b in,avg , for the pattern set at hand.Then we set the value of x thr as, x thr = b in,avg . f) V thr of NNLD: The NNLD should provide a post-synaptic spike only when correlated inputs have been con-nected to its dendrites. We consider a NNLD having m den-drites and k synapses and create numerous instances of randomconnections to these synapses. We measure the average valueof the maximum membrane voltage ( ( V max ) av ) producedwhen this NNLD is integrated over the pattern duration forall these instances and set V thr = ( V max ) av . g) The normalization constant ( I ,inh ), slow ( τ s,inh ) andfast ( τ f,inh ) time constant of I inh ( t ) : The post-synaptic firing N/C % o f s u cce ss f u l t r i a l s Two classFour classSix class (a) C onv e r g e n ce M ea s u r e ( C M ) Two classFour classSix class (b) σ jitter / τ s % o f s u cce ss f u l t r i a l s n sub = 1n sub = 5n sub = 10 (c) Fig. 6: (a) The percentage of successful trials is plotted against NC for two-class, four-class and six-class classification. The figure shows thatas NC increases the percentage of successful trials also increases and becomes constant after NC = 11 . (b) The evolution of CM (averagedover 50 trials) with the number of epochs for two-class, four-class and six-class classification for n sub = 1 . (c) The percentage of successfultrials is plotted against σ jitter /τ s for n sub =1, 5 and 10. As the number of subpattern divisions are increased the jitter/noise robustness ofthe network decreases. activity of the WTA-NNLD network is dependent on τ s,inh and I ,inh . To simulate the hardware scenario we set τ f,inh toa small value given by τ f,inh = τ s,inh . To set I ,inh and τ s,inh ,we first excite WTA-NNLD with ep ini epochs of patterns priorto training and calculate the average excitatory current ( I e,av )to the NNLDs as: I e,av = < n N (cid:88) n =1 I nin ( t ) > ep ini (14)where < · > ep ini denotes averaging over ep ini epochs. Theidea is to generate a I inh ( t ) which, if provided by N inh atthe beginning of a subpattern, decays exponentially to I e,av at the end of the subpattern i.e. after time T sub has elapsed.This ensures that once a post-synaptic spike is generated bya NNLD in a particular T sub time window, other NNLDs areunable to fire during that same T sub time window. Assuming τ f,inh << τ s,inh , we can derive that the required I inh ( t ) isimplemented by setting τ s,inh as: τ s,inh = T sub ln ( I ,inh I e,av ) (15)Note that τ s,inh has an inverse logarithmic relation to I ,inh . h) Convergence Measure ( CM ): The formula for calcu-lating CM , applicable to both n sub = 1 and n sub > , fordetecting the convergence of learning is given by: CM = l mean n sub − ( n sub − T sub (16)Note that for n sub = 1 , CM = l mean and so it computesthe time-to-first spike for patterns averaged over an epoch. For n sub > , CM calculates the average time-to-first spike fromthe beginning of each subpattern of C patterns of an epoch.We consider the learning has converged when the value of CM saturates.IV. E XPERIMENTS AND R ESULTS In this section, we will describe the classification taskconsidered in this article. To show how the classification performance generalizes to multi-class we will consider two,four and six class classification. We will be showing theperformance of WTA-NNLD and STDP-NRW for three valuesof n sub given by n sub = 1, 5 and 10. A. Problem Description The benchmark task we have selected to analyze the perfor-mance of the proposed method is the Spike Train Classificationproblem [31]. In the generalized Spike Train Classificationproblem, C arrays of h Poisson spike trains having frequency f and length T p are present which are labeled as classes 1 to C . Jittered versions of these templates are created by alteringthe position of each spike within the templates by a randomamount that is randomly drawn from a Gaussian distributionwith zero mean and standard deviation σ jitter . The networkis trained by these jittered versions of spike trains, and thetask is to correctly identify a pattern’s class. In this article,unless otherwise mentioned, we have considered h = 100 andPoisson spike trains are present in each afferent, f = 20 and T p = 0.5 sec and varied C and σ jitter . Inspired by [15], we alsoconsider the scenario when h/ randomly chosen afferents donot contain any spikes, while the remaining h/ afferents arePoisson spike trains. B. Case 1: n sub = 1 In this case we have T sub = T p , so one NNLD is capableof firing only once when a pattern is presented. Considering σ jitter = 0, we have varied the number of NNLDs and notedthe percentage of successful trials which is depicted in Fig.6(a). To make the horizontal axis invariant of the number ofclasses, we have taken NC as the horizontal axis. From thefigure we can conclude that the percentage of successful trialsgradually increases with an increase in NC and finally becomesconstant after NC = 11 . Thus, unless otherwise mentioned, wewill keep N = 11 × C when n sub = 1 . It can be seen from Fig.6(a) that the percentage of successful trials cannot go beyond N f Class 1 pattern0 50 100 15000.10.2 Epochs L a t e n c y ( s ec ) (a) N f Class 2 pattern0 50 100 15000.10.2 Epochs L a t e n c y ( s ec ) (b) N f Class 3 pattern0 50 100 15000.20.4 Epochs L a t e n c y ( s ec ) (c) N f Class 4 pattern L a t e n c y ( s ec ) (d) Fig. 7: For four class classification the above figure shows that outof 44 NNLDs (a) th NNLD recognizes Class 1 pattern (b) Class 2pattern fires for the th NNLD, (c) th NNLD recognizes Class3 pattern (d) Class 4 pattern is recognized by th NNLD and thelatency for all four of them decreases over epochs until saturation. CM saturates. For n sub = 1 , CM = l mean is the average of the time-to-first spikes for patterns in anepoch. As an example we consider a particular trial of four-class classification and show in Fig. 7 that during training,the latencies of the four NNLDs, N f , N f , N f and N f which uniquely recognize the four class of patterns graduallyreduce until reaching a saturation point. Moreover, in Fig. 6(b)we show the epochwise evolution of CM averaged over 50trials for two-class, four-class and six-class classification. Itis evident from the figure that the value of CM decreasesthereby showing that the algorithm is favoring correlatedinputs such that the post-synaptic spikes can occur faster andfinally saturates after some epochs have passed. We denote thenumber of epochs taken by the algorithm for saturation of CM as ep sat and note its value for 50 trials. The average value of ep sat for 50 trials, ep sat,avg , is then computed to be ep sat,avg = 149, 157 and 165 for two-class, four-class and six-classclassification respectively. Moreover, this phenomenon clearlyindicates that while the WTA-NNLD network is being trainedby STDP-NRW learning rule, C unique NNLDs which havelocked onto the C different classes of pattern, are trying torecognize the start of repeating patterns for different classes.Fig. 6(b) also suggests that after the training has stopped, these C unique NNLDs, instead of looking at the whole patternof duration T p = 500 ms , can now look only at the starting . ms , . ms and . ms of the patterns for C = 2, 4 and6 respectively to predict its class.Let us now consider the effect of jitter and we show inFig. 6(c) the performance of the proposed method when theintensity of jitter is varied. Next, we look into the performance of the proposed method when patterns with 50% emptyafferents are considered. Fig. 8(a) depicts the results obtainedby our network for this case with varying amounts of jitter. C. Case 2: n sub > Next, we consider n sub = 5 i.e. we divide each pattern into5 subpatterns by setting τ s,inh and I ,inh as per Equation 15.For C class classification, the maximum number of subpatternscan be C × n sub so we set N = C × n sub in this casei.e. we keep N = 10, 20 and 30 for two, four and sixclass classification respectively. Considering σ jitter = 0 , theevolution of CM with epochs for two, four and six classclassification averaged over 50 trials is shown in Fig. 8(b).Moreover, the value of ep sat,avg (averaged over 50 trials) isfound out to be 210, 221 and 230 when C = 2, 4 and 6respectively. Unlike Case 1, here we consider the responseto a pattern as a unique firing sequence of few NNLDs.As an example, we consider a particular trial of four classclassification and look into the first and last 3 epochs duringits training. It is evident from Fig. 9 that during the first 3epochs, WTA-NNLD produces arbitrary sequences of spikes.However, it can be seen that after the training of the networkis complete, WTA-NNLD produces different firing sequencesfor different patterns while producing the same sequence whensame patterns are encountered. WTA-NNLD trained by thismethod produces a 100% accuracy in recognizing differentpatterns by producing its unique firing sequence for two, fourand six class classification. The performance of the networkwith varying intensity of jitter is provided in Fig. 6(c) (spikespresent in all afferents) and Fig. 8(a) (spikes present in onlyhalf of the afferents) which depict that the n sub = 5 case isless jitter resilient than the n sub = 1 case.We further increase the resolution of pattern subdivisionby decreasing τ s,inh . We consider n sub = 10 and followingthe principle of n sub = 5, the number of NNLDs employedfor n sub = 10 are 20, 40 and 60 for two-class, four-classand six-class classification. This approach also provides 100%accuracy in providing a unique sequence of firing whenever aparticular pattern is encountered when σ jitter = 0 . However,the performance of the network falls rapidly with the increaseof σ jitter as shown in Fig. 6(c) and Fig. 8(a). We also show theevolution of CM with the epochs in Fig. 8(b). Furthermore,the number of epochs needed for convergence of CM in thiscase is much more than the previous cases as depicted inFig. 8(c). We conclude that dividing a pattern into too manysubpatterns hampers the network performance.Next we delve a bit further and show the statistics of causesfor the failure of the system in producing successful trials. Atrial may fail if either condition (a) or (b) (described in SectionII-D) is not satisfied. We denote the failure of condition (a) as F . Note that for a trial, condition (b) can fail if a patternis misclassified as a pattern of another class (denoted as F ) or as a random pattern (denoted as F ). Table I showsthe statistics of failed trials for n sub = 1, 5 and 10 when σ jitter /τ s ≈ . . Note that F is high for n sub = F reduces for n sub = n sub = 10 since σ jitter / τ s % o f s u cc e ss f u l t r i a l s n sub =1n sub =5n sub =10 (a) C onv e r g e n ce M ea s u r e ( C M ) Two classFour classSix class (b) sub e p s a t , a vg Two classFour classSix class (c) Fig. 8: (a) The percentage of successful trials is plotted against σ jitter /τ s for n sub =1, 5 and 10 for patterns having spikes in only 50% of the afferents. (b) The evolution of CM (averaged over 50 trials) with the number of epochs for two-class, four-class and six-classclassification when n sub = 5 . (c) This figure depicts the number of epochs needed for saturation of CM (averaged over 50 trials) against thenumber of subpatterns considered for each pattern ( n sub ). As n sub increases, WTA-NNLD has to train itself for more number of subpatternsand thus there is an increase in ep sat,avg . sometimes the network fails to produce a unique 10 indiceslong representation for all patterns of the same class.Moreover, we test our network with random patterns andnote the cases where a learnt unique representation is producedfor a random input pattern i.e. a false positive error occurs.The percentage of false positive errors produced for n sub = 1, 5 and 10 when σ jitter /τ s ≈ . are 8%, 0% and 0%respectively. Note that no false positive errors occur for n sub = TABLE I: Analysis of failure statistics Case n sub = 1 n sub = 5 n sub = 10 F 12% 2% 6% F 2% 2% 2% F 2% 4% 8% V. VLSI I MPLEMENTATION : E FFECT OF STATISTICALVARIATION In this section we analyze the stability of our algorithm tohardware nonidealities by incorporating the statistical varia-tions of the key subcircuits. The primary subcircuits neededto implement our architecture are synapse, dendritic squaringblock, neuron and c npj calculator. While the variabilities ofthe synapse circuit are modeled by mismatch in the amplitude( I ) and time constant ( τ s ) of the synaptic kernel function,the variabilities of the squaring block are captured by amultiplicative constant ( cb ni ) [29]. We do not consider thevariation of inhibitory current kernel since it is global and onlya single instance is present in the architecture. In our earlierwork [29], [32], we proposed the circuits for implementing thesynapse and squaring block of NNLD and performed MonteCarlo analysis to find their variabilities. We presented that the σµ of I , τ s and cb ni for the worst case scenario are 13%,10.1% and 18% respectively. The mismatch of the LIF neuroncircuit proposed in [32] was captured by variations in the firing threshold V thr , the σµ of which was computed to be 12.5%.Lastly, the nonidealities of the c npj calculator block, describedin [33], are modeled as a multiplicative constant ( cc ni ). MonteCarlo analysis of the c npj calculator block revealed that its σµ for the worst case is 18%.Fig. 10 shows the performance of the proposed methodwhen these nonidealities are included in the model for n sub =1 and n sub = 5 keeping σ jitter /τ s ≈ I , τ s , cb ni , and cc ni denote the performance degrada-tion when statistical variations of I , τ s , cb ni , and cc ni areincluded individually. The results of Fig. 10(a) and Fig. 10(b)depict that the performance of the proposed algorithm is mostaffected by τ s and cc ni and least by cb ni . Finally, to mimicthe proper hardware scenario we consider the simultaneousimplementations of all the nonidealities, which is marked by(...). The (...) bars show that there is an 8% and 6% decreasein performance for n sub = 1 and n sub = 5 respectively.VI. C ONCLUSION We have proposed a new neuro-inspired Winner-Take-Allarchitecture (WTA-NNLD) and a STDP inspired dendritespecific structural plasticity based learning rule (STDP-NRW)for its training. Motivated by recent biological evidences andmodels suggesting nonlinear processing properties of neu-ronal dendrites we employ neurons with nonlinear dendritesto construct our WTA architecture. Moreover, we considerbinary synapses instead of high resolution synaptic weights.Thus our learning rule, instead of weight updates, trains thenetwork by modification of the connections between inputand synapses. We have also provided a method by which thenumber of subpatterns per pattern learned by WTA-NNLDcan be controlled. WTA-NNLD encodes patterns of differentclasses by either activity of distinct NNLDs or by a distinctsequence of NNLD firings. To demonstrate the performanceof WTA-NNLD and STDP-NRW, we have considered two,four and six class classification of 100 dimensional Poissonspike trains. We can conclude from the result that the slowtime constant of inhibitory signal ( τ s,inh ) can be properly set Time (s) no i s n e m i d t upn I Epoch P P P P Epoch P P P P P P P P Epoch First three epochs Time (s) x e dn i D L NN Last three epochs Epoch ep sat -2 Epoch ep sat -1 Epoch ep sat Output of WTA-NNLDInput to WTA-NNLD P P P P P P P P P P P P Input to WTA-NNLD Time (s) no i s n e m i d t upn I x e dn i D L NN Time (s) Output of WTA-NNLD Fig. 9: The input and output of WTA-NNLD has been shown for a particular trial of the four class classification when n sub = 5 . P , P , P and P represent the patterns of a particular class. The figure depicts that before learning WTA-NNLD produces arbitrary spikes whenevera pattern is presented. After learning, the network produces unique sequence of NNLD spikes for patterns of different classes. This uniquesequence acts as an identifier of the pattern class. to obtain a tradeoff between specificity and sensitivity of thenetwork. Our immediate future work will include studyingthe effects of connection changes after the network getsintegrated over multiple patterns. This will reduce the numberof required computations. On another note, we will look intothe classification of spike based MNIST [34], [35] datasets byour method. Our network can be immediately scaled to learnthe digits of MNIST dataset, the only requirement being addi-tional simulation time and computational memory comparedto the tasks considered in this article. Furthermore, to achieve invariance to scaling and rotation during image classification,we will be constructing NNLD based convolutional neuralnetworks [36] trained by structural plasticity. We will alsoimplement the proposed network in hardware and apply it forreal time online unsupervised classification of spatio-temporalspike trains. R EFERENCES[1] M. Riesenhuber and T. Poggio, “Hierarchical models of object recog-nition in cortex,” Nature Neuroscience , vol. 2, no. 11, pp. 1019–1025,1999. % o f s u cce ss f u l t r i a l s (...) cc ni I τ s cb ni (a) % o f s u cce ss f u l t r i a l s I τ s (...) cb ni cc ni (b) Fig. 10: Stability of WTA-NNLD trained by STDP-NRW is plottedwith respect to different hardware nonidealities for n sub = 1 (a)and n sub = 5 (b). The constant red line indicates the percentageof successful trials obtained by our method without any nonidealitieswhen σ jitter /τ s ≈ . . The bars represent the percentage of success-ful trials obtained after inclusion of nonidealities. The rightmost barmarked by (...) represents the performance when all the nonidealitiesare included simultaneously. [2] W. Maass, “Neural computation with winner-take-all as the onlynonlinear operation,” in Advances in Neural Information ProcessingSystems 12 , pp. 293–299. MIT Press, 2000.[3] W. Maass, “On the computational power of winner-take-all,” NeuralComputation , vol. 12, no. 11, pp. 2519–2535, 2000.[4] L. Itti, C. Koch, and E. Niebur, “A model of saliency-based visualattention for rapid scene analysis,” IEEE Transactions on PatternAnalysis and Machine Intelligence , vol. 20, no. 11, pp. 1254–1259, Nov.1998.[5] S. Kaski and T. Kohonen, “Winner-take-all networks for physiologicalmodels of competitive learning,” Neural Networks , vol. 7, no. 67, pp.973 – 984, 1994.[6] J. A. Barnden and K. Srinivas, “Temporal winner-take-all networks:a time-based mechanism for fast selection in neural networks,” IEEETransactions on Neural Networks , vol. 4, no. 5, pp. 844–853, Sep. 1993.[7] Y. He and E. Sanchez-Sinencio, “Min-net winner-take-all CMOSimplementation,” Electronics Letters , vol. 29, no. 14, pp. 1237–1239,Jul. 1993.[8] J. A. Starzyk and X. Fang, “CMOS current mode winner-take-all circuitwith both excitatory and inhibitory feedback,” Electronics Letters , vol.29, no. 10, pp. 908–910, May 1993. [9] T. Serrano and B. L. Barranco, “A modular current-mode high-precisionwinner-take-all circuit,” in Proceedings of the IEEE InternationalSymposium on Circuits and Systems (ISCAS) , May 1994, vol. 5, pp.557–560.[10] G. Indiveri, “Winner-take-all networks with lateral excitation,” AnalogIntegrated Circuits and Signal Processing , vol. 13, no. 1-2, pp. 185–193,1997.[11] G. Indiveri, “A current-mode hysteretic winner-take-all network, withexcitatory and inhibitory coupling,” Analog Integrated Circuits andSignal Processing , vol. 28, no. 3, pp. 279–291, 2001.[12] S. C. Liu, “A normalizing aVLSI network with controllable winner-take-all properties,” Analog Integrated Circuits and Signal Processing ,vol. 31, no. 1, pp. 47–53, 2002.[13] M. Oster, R. Douglas, and S. C. Liu, “Computation with spikes ina winner-take-all network,” Neural Computation , vol. 21, no. 9, pp.2437–2465, Sep. 2009.[14] J. L. McKinstry and G. M. Edelman, “Temporal sequence learning inwinner-take-all networks of spiking neurons demonstrated in a brain-based device,” Frontiers in Neurorobotics , vol. 7, no. 10, 2013.[15] T. Masquelier, R. Guyonneau, and S. J. Thorpe, “Competitive STDP-based spike pattern learning,” Neural Computation , vol. 21, no. 5, pp.1259–1276, May 2009.[16] A. Delorme, L. Perrinet, and S. J. Thorpe, “Networks of integrate-and-fire neurons using rank order coding B: Spike timing dependent plasticityand emergence of orientation selectivity,” Neurocomputing , vol. 38, no.40, pp. 539–545, 2001.[17] R. Guyonneau, R. VanRullen, and S. J. Thorpe, “Temporal codes andsparse representations: A key to understanding rapid processing in thevisual system,” Journal of Physiology-Paris , vol. 98, no. 4-6, pp. 487 –497, 2004.[18] T. Masquelier and S. J Thorpe, “Unsupervised learning of visualfeatures through spike timing dependent plasticity,” PLoS ComputationalBiology , vol. 3, no. 2, pp. e31. doi:10.1371/journal.pcbi.0030031, 022007.[19] W. Gerstner, R. Ritz, and J. L. V. Hemmen, “Why spikes? hebbianlearning and retrieval of time-resolved excitation patterns,” BiologicalCybernetics , vol. 69, no. 5-6, pp. 503–515, 1993.[20] M. Yoshioka, “Spike-timing-dependent learning rule to encode spa-tiotemporal patterns in a network of spiking neurons,” Physical ReviewE , vol. 65, pp. 011903, Dec 2001.[21] B. Nessler, M. Pfeiffer, L. Buesing, and W. Maass, “Bayesian compu-tation emerges in generic cortical microcircuits through spike-timing-dependent plasticity,” PLoS Computational Biology , vol. 9, no. 4, pp.e1003037, 2013.[22] D. Kappel, B. Nessler, and W. Maass, “STDP installs in winner-take-all circuits an online approximation to hidden markov model learning,” PLoS Computational Biology , vol. 10, no. 3, pp. e1003511, Mar 2014.[23] S. Hussain, S. C. Liu, and Arindam Basu, “Hardware-amenablestructural learning for spike-based pattern classification using a simplemodel of active dendrites,” Neural Computation , vol. 27, no. 4, pp.845–897, 2015.[24] K. A. Boahen, “Point-to-point connectivity between neuromorphic chipsusing address events,” IEEE Transactions on Circuits and Systems II:Analog and Digital Signal Processing , vol. 47, no. 5, pp. 416–434, May2000.[25] S. Brink, S. Nease, P. Hasler, S. Ramakrishnan, R. Wunderlich, A. Basu,and B. Degnan, “A learning-enabled neuron array IC based upontransistor channel models of biological phenomena,” IEEE Transactionson Biomedical Circuits and Systems , vol. 7, no. 1, pp. 71–81, Feb. 2013.[26] P. Poirazi and B. W. Mel, “Impact of active dendrites and structuralplasticity on the memory capacity of neural tissue,” Neuron , vol. 29,no. 3, pp. 779–796, Mar. 2001.[27] S. Hussain, R. Gopalakrishnan, A. Basu, and S. C. Liu, “MorphologicalLearning: Increased Memory Capacity of Neuromorphic Systems withBinary Synapses Exploiting AER Based Reconfiguration,” in IEEE Intl.Joint Conference on Neural Networks (IJCNN) , Aug. 2013, pp. 1 – 7.[28] S. Roy, A. Basu, and S. Hussain, “Hardware efficient, NeuromorphicDendritically Enhanced Readout for Liquid State Machines,” in Pro-ceedings of the IEEE Biomedical Circuits and Systems (BioCAS) , Nov2013, pp. 302–305.[29] S. Roy, A. Banerjee, and A. Basu, “Liquid state machine withdendritically enhanced readout for low-power, neuromorphic VLSI im-plementations,” IEEE Transactions on Biomedical Circuits and Systems ,vol. 8, no. 5, pp. 681–695, Oct. 2014.[30] M.P. Jadi, B.F. Behabadi, A. Poleg-Polsky, J. Schiller, and B.W. Mel,“An augmented two-layer model captures nonlinear analog spatial inte- gration effects in pyramidal neuron dendrites,” Proceedings of the IEEE ,vol. 102, no. 5, pp. 782–798, May 2014.[31] T. Natschl¨ager, H. Markram, and W. Maass, Computer models andanalysis tools for neural microcircuits , chapter 9, Kluver AcademicPublishers (Boston), 2002.[32] A. Banerjee, A. Bhaduri, S. Roy, S. Kar, and A. Basu, “A current-mode spiking neural classifier with lumped dendritic nonlinearity,” in Proceedings of the IEEE International Sympoisum on Circuits andSystems (ISCAS) , 2015, number May.[33] S. Roy, S.K. Kar, and A. Basu, “Architectural exploration for on-chip, online learning in spiking neural networks,” in , Dec 2014, pp. 128–131.[34] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learningapplied to document recognition,”