PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning
PPT-Spike: A Precise-Time-Dependent Single SpikeNeuromorphic Architecture with Efficient Supervised Learning
Tao Liu ∗ , Lei Jiang † , Yier Jin ‡ , Gang Quan ∗ and Wujie Wen ∗∗ Florida International University , † Indiana University , ‡ University of Florida ∗ { tliu023, gang.quan, wwen } @fiu.edu, † [email protected], ‡ [email protected]fl.edu Abstract —One of the most exciting advancements in Artificial Intelli-gence (AI) over the last decade is the wide adoption of Artificial NeuralNetworks (ANNs), such as Deep Neural Network (DNN) and Convolu-tional Neural Network (CNN), in real world applications. However, theunderlying massive amounts of computation and storage requirementgreatly challenge their applicability in resource-limited platforms likedrone, mobile phone and IoT devices etc. The third generation of neuralnetwork model–Spiking Neural Network (SNN), inspired by the workingmechanism and efficiency of human brain, has emerged as a promisingsolution for achieving more impressive computing and power efficiencywithin light-weighted devices (e.g. single chip). However, the relevantresearch activities have been narrowly carried out on conventional rate-based spiking system designs for fulfilling the practical cognitive tasks,underestimating SNN’s energy efficiency, throughput and system flexibil-ity. Although the time-based SNN can be more attractive conceptually,its potentials are not unleashed in realistic applications due to lack ofefficient coding and practical learning schemes. In this work, a P recise- T ime-Dependent Single Spike
Neuromorphic Architecture, namely “PT-Spike” , is developed to bridge this gap. Three constituent hardware-favorable techniques: precise single-spike temporal encoding, efficientsupervised temporal learning and fast asymmetric decoding are proposedaccordingly to boost the energy efficiency and data processing capabilityof the time-based SNN at a more compact neural network model sizewhen executing real cognitive tasks. Simulation results show that “PT-Spike” demonstrates significant improvements in network size, processingefficiency and power consumption with marginal classification accuracydegradation, when compared with the rate-based SNN and ANN underthe similar network configuration.
I. I
NTRODUCTION
Deep learning enabled neural network system, i.e. deep neuralnetwork (DNN) or convolutional neural network (CNN), has foundbroad applications in realistic cognitive tasks such as speech recogni-tion, image processing, machine translation and object detection [1],[2]. However, performing high-accurate testings for complex DNNsor CNNs requires massive amounts of computation and memoryresources, leading to limited energy efficiency. For instance, therecognition implementation of CNN–AlextNet [3] involves not onlyhuge volumes of parameters (61 million) generating intensive off-chipmemory accesses but also a large number of computing-intensive highprecision floating-point operations (1.5 billion) [4]. Such a weaknessmakes these solutions less attractive for many emerging applicationsof mobile autonomous systems like smart device, Internet-of-Things(IoT), wearable device, robotics etc., where very tighten powerbudget, hardware resource and footprint are enforced [5], [6].Different from the CNN and DNN designs, spiking-based neu-romorphic computing, which is inspired from the biological spikingneural network (SNN), has featured as achieving tremendous comput-ing efficiency at much lower power of small footprint platforms, e.g.the famous IBM TrueNorth chip that has total 1 million synapsesand an operating power of ∼ This work is supported in part by NSF under project CNS-1423137and 2016-2017 Collaborative Seed Award Program of Florida Center forCybersecurity (FC ). more biological plausible time-based SNN may offer better energyefficiency and system throughput [10], since theoretically the infor-mation can be flexibly embedded in the time (temporal) domain ofshort and sparse spikes instead of the spiking count represented bya group of dense spikes in rate coding, e.g. the spike occurrencefrequency is proportional to the intensity of the input like each pixeldensity of the image [7], [11]. As a result, the rate-based SNN isnaturally more power-hungry than that of time-based SNN due tothe increased number of spikes and relevant spike operations, suchas synaptic weighting and Integrate-and-Fire (IFC) etc. Meanwhile,the processing efficiency of time-based SNN can be further enhancedby performing an early decision making based on the temporalinformation extracted from early fired spikes, while in rate coding, theclassification cannot be initiated until the last moment, e.g. winner-takes-all rule by sorting the number of spikes fired during the entireperiod of decoding time for each output neuron [12].However, the potentials of such an emerging architecture are signif-icantly underestimated due to lack of efficient hardware-favorable so-lutions for time-based information representation and complex spike-timing-dependent (temporal) training of biological synapses towardspractical cognitive applications [13]. On one hand, translating theinput stimulus (i.e. image pixels) to the delay of the spikes, namelytime-based encoding, is non-trivial because the coding efficiencycan be easily degraded by the biased spike delays distributed inthe limited coding intervals. Also, the hardware realization of timecoding is usually expensive, as the time-based spike kernel needsto be carefully designed to provide accurate time information (e.g.pre-synaptic/post-synaptic time [10]) for time-based training. On theother hand, realizing more biological plausible spiking-time basedtraining, i.e. unsupervised spiking-time-dependent plasticity (STDP),is very complex and costly due to the exponential time dependenceof weight change and difficult convergence of learning [14]. In real-world applications, training of the rate-based SNN can be usually per-formed off-line by directly borrowing the standard back-propagationalgorithm from artificial neural network (ANN) [11]. However, thistime-independent learning rule does not fit the time-dependent SNNbecause of a fundamentally different learning mechanism.In this work, we investigate the possibility of unleashing thepotentials of time-based single-spike SNN architecture in realisticapplications by orchestrating the efficient time-based coding/decodingand learning algorithm. A
Precise-Time-Dependent Single Spike Neu-romorphic Architecture , namely “
PT-Spike ”, is proposed to facilitatethe cognitive tasks like the MNIST digit recognition. Our “
PT-Spike ” incorporates three integrated techniques: precise single-spiketemporal encoding, efficient supervised temporal learning, and fastasymmetric decoding. Our major contributions are:1) We develop a precise-temporal encoding approach to efficientlytranslate the information into the temporal domain of a singlespike. The single spike solution dramatically reduces the en-ergy, while offering efficient model size reduction;2) We propose a supervised temporal learning algorithm to facil-itate synaptic plasticity on this single-spike system. The pro-posed algorithm significantly improves the learning capabilityand achieves comparable accuracy when compared to the ANNand rate-based SNN under the similar configuration;3) We design a novel asymmetric decoding to relieve the unique a r X i v : . [ c s . N E ] M a r e T i R e R i Stimuli
Time-coded SNN
Rate-coded
SNN t 164t T T R R t 102ttttt Class Class Fig. 1: The Conceptual View of Rate-coding and Time-coding in SNNs. and serious weight competition issue existing in this single-spike system, and significantly improve the efficacy and effi-ciency of synaptic weight updating.II. B
ACKGROUNDS AND M OTIVATIONS
A. Neural Coding in SNNs
The neural coding in SNNs can be generally categorized as ratecoding, time coding, rank coding and population coding etc. [15]. Inparticularly, the first two codings are the most attractive, since eachpiece of coded information is only associated with the spikes gener-ated by a single input neuron, offering simplified encoding/decodingprocedures and design complexity.Fig. 1 demonstrates an example of conceptual comparison betweenrate coding and time coding in SNNs. T e and T i ( R e and R i )denote two types of input neurons: the time-coded (rate-coded)excitatory and inhibitory neurons, respectively. The excitatory neuroncan exhibit an active response to the stimulus while the inhibitoryneuron intends to keep silent. T and T ( R and R ) denote twotime-coded (rate-coded) output neurons for the classification. Therate-based SNN generates far more number of spikes than that oftime-based SNN in both types of input neurons. After the input spikesare processed by the two different SNNs, a single spike firing at aspecific time interval can perform an inference task in the output layerof the time-based SNN. However, a considerable number of spikesare needed for fulfilling a rate-based classification in the rate-basedSNN, indicating a much higher power consumption. Moreover, therate-based SNN may exhibit a slower processing speed than that oftime-based SNN, since the output neuron of the former SNN needsto count the spiking numbers (i.e. through Integrate-and-Fire [16]) inthe whole predefined time window, while that of the latter one mayquickly suspend its computations once a spike is detected. B. Limitation of Existing Spiking Neuromorphic Computing Research
Neuromorphic Designs:
Many studies have been conducted tofacilitate the spiking based Neuromorphic Computing System (NCS)designs in real hardware implementations, including CMOS VLSIcircuit [7], [17], [18], [19], reconfigurable FPGA [8], and emergingmemristor crossbar [20], [11]. However, these works mainly focuson the rate- or time-based SNN model mapping and hardwareimplementations, rather than the SNN architecture optimization, i.e.coding, decoding and learning approaches etc.
Temporal Coding:
The concept of temporal coding, which relieson the arrival time or delay of a spike train for information repre-sentation, has been widely explored and proved in the developmentof time-based SNN [21], [22]. These theoretical studies, however,mainly emphasize on the biological explanations of time-based SNNmodels based on simple cognitive benchmarks (i.e. two inputs XORgate), which are far from the complicated real-world problems such asimage recognition. Recently, Zhao et al. [23] proposed an encodingcircuit to handle the temporal coding, however, this type of workstill concentrates on component-level hardware implementations withsimple case studies, and hence is lack of a holistic architecture-levelsolution set capable of handling realistic tasks. In [24], a completetime-based SNN design is proposed. However, their solution suffersfrom limited accuracy fundamentally constrained by existing codingand temporal learning rule, and is not optimized towards hardware-based neuromorphic system designs.
Temporal Learning:
Since the popular learning approaches suchas back-propagation [25] widely used in ANN or rate-based SNNare unable to handle precise-time-dependent information due to afundamentally different neural processing, many proposals dedicatedto the time-based learning have been developed [14], [26], [27].However, these learning algorithms are neither hardware-favorablenor applicable for realistic tasks due to the expensive convergence andtheoretical limitation. For example, in the unsupervised Spike-timingdependent plasticity (STDP) learning rule, the neural network struc-ture and synaptic computation will be exponentially increased due tothe expensive convergence and clustering. The proposed “Tempotron”and “Remote Supervised Method (ReSuMe)” can use the teachingspike to adjust desired spiking time for temporal learning, however,are not applicable to handle complicated patterns.Our proposed “PT-Spike” is substantially different from previousstudies: we explore how the time-based single-spike
SNN archi-tecture can be designed to perform the realistic tasks through aholistic efficient techniques spanning time-based coding, learning todecoding. A low cost and efficient temporal learning named “PT-Learning” is augmented from the “Tempotron” learning by consider-ing a synthesized contribution of the cost function and the hardware-favorable time-dependent kernel for weight updating. By integrat-ing with proposed “Precise Temporal Encoding” and “AsymmetricDecoding”, “
PT-Spike ” can improve the accuracy, power, learningefficiency, and the model size reduction through the spatial-temporalinformation conversion significantly.III. D
ESIGN D ETAILS
A. System Architecture
Fig. 2 shows a comprehensive data processing flow of proposed“
PT-Spike ”. First, the stimulus will be captured by the temporalperceptors to generate a sparse spike train (i.e. single spike ) through“Precise Temporal Encoding”. Each spike train will be furthermodulated in temporal domain by a linear-decayed spiking kernelto form time-dependent voltage pulse. Second, those voltage pulseswill be sent to the synaptic network for a weighting process, i.e. thememristor crossbar with IFC design can be employed for parallelprocessing. The output neurons will exhibit time-varying weightingresponses due to the time-dependent input information. After that,the output neuron will fire a spike if the weighted post-synapticvoltage crosses a threshold voltage. Then spike trains from the outputlayer will be transmitted to the “Asymmetric Decoding”. Finally,the target pattern will be classified by analyzing the synchronizedoutput spikes with a predefined asymmetric rule. During the learningprocedure, desired spike patterns are coded by following the similarasymmetric rule during decoding. The detected errors will be sent-back for synaptic plasticity through “PT-Learning”–a supervisedtemporal learning algorithm.
B. Precise Temporal Encoding
As discussed in Section. II, in traditional rate coding, a largenumber of spikes within a proper time window will be neededto precisely indicate the amplitude of an input signal, i.e. thepixel density of visual stimulus. To maximize the power efficiencywith minimized number of spikes, the input information will berepresented as an extreme sparse train– single spike and its occurringdelay in aforementioned coding approach. However, such a “one-to-one” mapping between each stimulus and spike train of each inputneuron can lead to a significant energy overhead. Meanwhile, thetime or temporal information of those spike trains are not fullyleveraged by each neuron, resulting in limited coding efficiencythus a dramatical accuracy reduction. As we shall present later, ourresults on “MNIST” benchmark show that the “one-to-one” mappingachieves very unacceptable training accuracy (( ∼ ) even undera large model size, that is, 784 input neurons for a × image. X X A A A N X M Flexible Precise Temporal Encoding
Error detection with
Desired Spiking Pattern
Input Stimulus
False missing
False firing
Synaptic Processing
Asymmetric
Decoding Scheme Correct
Asymmetric Decoding
Synaptic Plasticity – PT-Learning
Input Layer Output Layer V ( t ) t+1 K K Spiking Kernel t V ( t ) V t h Err + Err - T fal-fir T fal-mis V max - V max + Fig. 2: The overview of “
PT-Spike ” system architecture.
In “
PT-Spike ”, we further propose the “Precise Temporal Encod-ing”. As shown in Fig. 2, the “Precise Temporal Encoding” is inspiredfrom human visual cortex and Convolutional Neural Network (CNN),where a Temporal Kernel (i.e. a unit square matrix) will be applied onthe full image to capture the spatial information and then translatedinto a single spike delay in temporal domain as a neuron input byperceiving the localized information from multiple interested pixels,i.e. spiking delay is equal to the average density among severalselected pixels. In practice, by selecting a proper stride with whichwe slide the Temporal Kernel, e.g. smaller than the dimensionality ofTemporal Kernel, a portion of localized spatial information will beshared by adjacent kernel sliding. Consequently, the spatial localitiescan be further transformed into temporal localities, thus to uniformlyallocate the spiking delay assigned to each input neuron in timedomain, translating into improved coding efficiency and classificationaccuracy.Another unique advantage of the proposed “Precise TemporalEncoding” is to offer a flexible model size reduction. Differentfrom traditional “one-to-one” mapping, various choices of modelsize reduction can be easily achieved by reconfiguring the size ofTemporal Kernel. Fig. 3 illustrates such an interesting concept offeredby “Precise Temporal Encoding”. Increasing the Temporal Kernel sizecan enrich the temporal information (see encoding time frame from T = 16 ms to T = 256 ms in Fig. 3), and hence reduce the neededspatial information or input neurons, e.g. 169 input neurons for “PT-Spike (16)” v.s. 49 input neurons for “PT-Spike (256)”. The trainingand inference accuracies will be slightly changed according to theselected Temporal Kernel size (see Section. IV). C. Synaptic Processing and Linearized Spiking Kernel
Once the delay for the single spike is determined, as shown inFig. 2, a spiking kernel K will be applied to shape the associatedspikes for input neurons. The kernel plays an important role in thefollowing synaptic weighting for the output voltage V n ( t ) , as shown S pa t i a l Spatial InformationPT-Spike(8)PT-Spike(16)PT-Spike(256)
Encoding time frame T (ms)0 50 100 150 200 250 300 T e m po r a l Temoral Information
Fig. 3: Model size reduction through adjustable Temporal Kernel. in Eq ( 1): V n ( t ) = M (cid:88) m w mn T (cid:88) t s K ( t − t s ) (1)where weight V n ( t ) represents the voltage of output neuron n , w mn denotes the synaptic efficacy between input neuron X m and outputneuron A n . t s is the decoded spiking delay of X m . To providesufficient and accurate temporal information for the classification, theexponential decayed post-synaptic potential in the biological spikeresponse neural model [28] can be expressed as: K ( t − t s ) = µ ( exp [ − ( t − t s ) /τ ] − exp [ − ( t − t s ) /τ ]) (2)where τ ( τ and τ ) denotes decay time constant, and µ is thenormalizing constant. However, such an exponential decaying func-tion requires expensive computation and hardware resource. In “ PT-Spike ”, we employ a more hardware-favorable kernel function K –alinear decaying function (see K and K comparison in Fig. 2), tosimplify the costly dual-exponential function K : K ( t − t s ) = 1 − τ ( t − t s ) (3)As we shall show in Section. IV, such a linear approximation causevery marginal classification accuracy degradation. Besides, this linearkernel function will be also applied to detect the input voltagecontributions to the output spike in our proposed “PT-Learning”. D. Asymmetric Decoding
In ‘
PT-Spike ”, a novel Asymmetric decoding scheme, namely “A-Decoding”, is proposed for the classification. As the error signalcritical for the proposed supervised temporal learning will be alsogenerated through asymmetric decoding, we will discuss the “A-Decoding” technique first.In rate-based SNN, the target pattern can be determined by theoutput neuron with highest spiking numbers. The costly weightupdating will be performed in all synapses at each iteration oflearning. The subsequent neural competition (weight conflict) among P P P P i N N N N i M
10 10 - Fire & Cut Order
Firing
Not firing independent
Fig. 4: An overview of proposed “A-Decoding”. ifferent patterns can be rectified by enough information providedby the large number of input spikes. Hence a good classificationaccuracy may be achieved for all different patterns. However, thesimilar case cannot occur in our proposed “
PT-Spike ”, since its weightupdating solely relies on the very limited number of spare spikes (e.g.a single spike) in temporal domain. In “
PT-Spike ”, we further proposethe “A-Decoding” to alleviate the neural competition for accuracyimprovement.Fig. 4 illustrates the key idea of proposed “A-Decoding”, includingpattern readout and error detection. Pattern { P i } can be decodedbased on the firing status of output neuron { N i } . In our asymmetricdecoding, the output neuron can work on three different statuses:“firing”, “not firing” and “independent”, as shown in Fig. 4. Note“independent” means that the associated neurons will not participatein the learning process of a certain pattern, and it will only occur inlearning mode.In testing mode, the output neuron will be only in following twostatus: { − firing/ − notfiring } . The target pattern is scannedaccording to the order of the first firing neuron. Assume a binary code ˜ N ˜ N ˜ N · · · ˜ N i is generated by output neurons { N i } , a Huffman-style decoding procedure can be performed (See Fig. 4 left part).For example, if the first firing neuron is N , the corresponding codewill be ˜0˜0˜1 . Thus, the target pattern is P . In “ PT-Spike ”, the earlydetection of testing, namely “Fire&Cut”, can be realized based on thetemporal “winner-take-all” rule: Once the IFC of neuron N i triggersa spike, all the remained IFCs for other neurons will be shut downby following the “Fire&Cut Order”, which may save the additionalpower consumed by the IFCs.In learning mode, a desired spike pattern is reversely generatedaccording to the Huffman-style decoding of pattern { P i } (See Fig. 4right part). Once a participated neuron N i triggers an unexpectedfiring or a missing firing, an error will be detected and only the synap-tic weights of N i will be modified according to our proposed “PT-learning”. Note only “partial” output neurons (NOT in“independent”status), will be involved during the learning of pattern { P i } , namely“Partial Learning”. Such a mechanism significantly accelerates thelearning procedure and saves power consumed by the unnecessaryneural processing. Meanwhile, { N i } is “asymmetrically” correlatedwith { P i } and thus can ease the neural competition. For example,neuron N i only engages in the synaptic plasticity of pattern P i and will be ignored during the learning of all other patterns. Aswe shall show later, by taking advantages of “Fire&Cut”, “PartialLearning” and “Ease Competition”, our proposed “A-Decoding” cansignificantly enhance the weighting efficiency and learning accuracy. E. PT-Learning
Our proposed “PT-Learning” coordinates with the aforementioned“A-Decoding” to capture the errors needed for synaptic weightsupdating. An error detected by the “A-Decoding” will be processedby “PT-Learning” to generate corresponding weight changes and sendback for synapse updating. As shown in Fig. 2, based on the actualand expected spiking pattern, two types of errors may occur in theoutput neuron: “false missing” and “false fire”. Here “false missing”means that the integrated voltage can not reach the threshold in outputneuron to trigger the expected output spike, while “false fire” isdefined as an undesired spike firing.As shown in Algorithm. 1, once an error is detected, the errorspiking time ( T fal ) and the cost function ( Err ) will be extractedfrom T max and V th − V max . Here V max and T max are the maximumvoltage amplitude and its occurrence time, respectively. A negative(positive) Err means a false- fire (missing). Hence, the gradient of
Err with respect to each weight w c at pre-synaptic spiking time T c can be calculated as: − d Err d w c = Err (cid:88) T c ≤ T max K ( T max − T c )+ ∂V ( T max ) ∂T max d T max d w c (4) Algorithm 1:
Post-Synaptic Processing // Pseudocode of Asymmetric Decoding and PT-Learning Detecting: foreach output neuron N i in [ N .. N I ] do if testing mode then if firing then return P i // “Fire&Cut” else // learning mode if N i is independent to P i then return // “Partial Learning” and “Ease Competition” else if actual firing pattern (cid:54) = desired pattern then call Learning( V max , T max ) Learning: // change synaptic weights of N i Err ← V th − V max T fal ← T max foreach input neuron X c in [ X .. X M ] do if K ( T fal − T c ) (cid:54) then continue // “Partial Updating” else // pre-spiking at T c contributed to post-spiking ∆ w ← λErrK ( T fal − T c ) w ci ← ∆ w + w ci Here K is the linear decayed spike kernel defined in Eq.( 3).As pre-synaptic spikes are weighted through synaptic efficacy w c before T max , ∂V ( T max ) ∂T max = 0. By further considering Err into thechange of w c , ∆ w c can be expressed as: ∆ w c = λErr (cid:88) T c ≤ T fal K ( T fal − T c ) (5)where λ denotes the learning rate and spike kernel K can be usedagain to calculate the contributions from the input neuron X c at time T c .As discussed in “A-Decoding”, only partial output neurons will beinvolved during the learning of a certain pattern, meaning that onlypartial synaptic weights will be updated. The dual-level acceleration,contributed by both “A-Decoding” and “PT-Learning”, can improvethe learning efficiency significantly. As we shall show later, the synap-tic computation can be reduced more than 200% when compared withthe standard learning approach without accelerations. Moreover, “PT-Learning” together with “A-Decoding” can boost the accuracy forrealistic recognitions task significantly.IV. E VALUATIONS
To evaluate the accuracy, processing efficiency and power con-sumption of our proposed “
PT-Spike ” neuromorphic architecture,extensive experiments are conducted in the platforms like MATLABand heavily modified open-source simulator–Brian [29].
A. Simulation Setup
In our evaluation, a full MNIST database is adopted as thebenchmark [30]. A set of “
PT-Spike ” designs–“PT-Spike(R)” areimplemented to demonstrate the leveraged temporal encoding where
TABLE I: Structural Parameters of Selected Candidates.
Candidate Number ofinput neurons Number ofoutput neurons Number ofsynaptic weights neural processingtime-frame TPT-Spike(4) 196 10 1960 4msPT-Spike(16) 169 10 1690 16msPT-Spike(25) 144 10 1440 25msPT-Spike(100) 100 10 1000 100msDiehl-15 784 100 78400 500msLecun-98 784 10 7840 - (a) Training and Testing Accuracies of Selected Candidates. "Exponential Kernel" non "A-Decoding""Tempotron" Training Accuracy (%) (b) Training Accuracy with Different Designs.
Fig. 5: Accuracy Evaluations for Difference Candidates and Design Optimizations. “R” denotes the number of interested pixels per input neuron or thesize of Temporal Kernel in proposed “Precise Temporal Encoding”.We also assume the encoding time frame ( T ) is T = τ × R ( ms ) ,where τ = 1( ms ) is the fixed minimum time interval to fire the spike.The maximum temporal information T can be adjusted by tuning theparameter R . The number of input neurons (spatial domain) can beexpressed as M = (cid:100) P −√ R +1 S (cid:101) , where P and S represent the widthof an input image and the stride with which we slide the TemporalKernel. P = 28 and S = 2 are selected in our evaluations ofMNIST dataset. Two representative baselines under similar networkconfigurations, including the rate-coded SNN–“Diehl-15” [31] andthe ANN–“Lecun-98” [32], are also implemented for the energy andperformance comparisons with proposed “ PT-Spike ”.Table. I presents the detailed structural parameters of selectedcandidates. Compared with the “Diehl-15” and “Lecun-98”, ourproposed temporal encoding achieves significant model size reductionfor all “PT-Spike” designs, i.e. ∼ × (“PT-Spike(4)” v.s. “Diehl-15”) and ∼ × (“PT-Spike(4)” v.s. “Lecun-98”). B. Accuracy
Fig. 5a shows the accuracy comparison among different “PT-Spike(R)”, “Lecun-98” and “Diehl-15”. “PT-Spike(25)” can achieve verycomparable accuracy at much lower cost ( ∼ %, 1440 synapticweights) when compared with “Diehl-15” ( ∼ %, 78400 synapticweights) and “Lecun-98” ( ∼ %, 7840 synaptic weights). Mean-while, “PT-Spike(16)” and ‘PT-Spike(25)” also show a very closeaccuracy ( ∼ % and ∼ %), which is much better than “PT-Spike(4)” and “PT-Spike(100)” ( ∼ % and ∼ %).We also evaluated the individual training accuracy improvementcontributed by various proposed techniques, such as “linearizedspiking kernel”, “Precise Temporal Encoding”, “A-Decoding” and“PT-Learning”, receptively. Here, we choose the “PT-Spike(16)” asthe baseline design that employs all aforementioned techniques.“Exponential Kernel”, “one-to-one mapping”, “non A-Decoding” and“Tempotron” denote the designs that substitute only one out of thefour techniques. As shown in Fig. 5b, “PT-Spike(16)” shows a verymarginal accuracy degradation ( . %) because of the “linearizedspiking kernel” ( K in Eq.( 3)) when compared with the original costly “Exponential Kernel” design ( . %, K in Eq.( 2)). Fur-thermore, “PT-Spike(16)” boosts the accuracy by ∼ %, ∼ %and ∼ % when compared with the designs of “one-to-one map-ping” ( ∼ %), “non A-Decoding” ( ∼ %), and the theoretical“Tempotron” learning rule ( ∼ %), respectively, which clearlydemonstrates the effectiveness of the proposed “Precise TemporalEncoding”, “A-Decoding” and “PT-Learning”. C. Processing Efficiency
The occurrence frequency of synaptic events is calculated toevaluate the system processing efficiency, including both weightingand weights updating. Fig. 6a compares the number of weightingoperations among three designs in the feed-forward pass. Unlike theother candidates, the amount of weight operations of “PT-Spike(16)”is different between training and testing due to the “Fire&Cut”mechanism in“A-Decoding”. Hence, the weighting of the first testingiteration is also included in “PT-Spike(16)”. Even the “non A-Decoding”, i.e. “PT-Spike(16)” without the “A-Decoding” technique,gains ∼ × weighting operation reduction as compared with“Diehl-15” since rate-coded SNN requires a long time windowto process the spikes with enlarged neuron model size, causingtremendous weighting processes on each time slot. Compared with“non A-Decoding”, weighting operations of “PT-Spike(16)” can befurther reduced by ∼ % and ∼ % in first training iterationand testing iteration, respectively. As expected, the “early-detection”working mechanism in “A-Decoding” removes many unnecessaryweighting operations on both “initialized” weights and “well-trained”weights.We also characterize the occurrence frequency of weights up-dating during the first training iteration to evaluate the processingefficiency in the feed-back pass. As Fig. 6b shows, even “WorstCase” (i.e. “PT-Spike(16)” without employing “A-Decoding” and“PT-Learning”) achieves ∼ . × and × reductions on weightsupdating per image and per error, respectively, when comparedwith “Diehl-15”. Such impressive improvement is introduced by thesignificant compressed model size. Moreover, compared with the“worst case”, “PT-Learning” and “A-Decoding” contribute ∼ × and ∼ × weights updating reduction per error and per image for “PT- "Diehl-15" PT-Spike(16) non "A-Decoding" O cc u r a n c e o f W e i g h t i n g p e r I m a g e Average efficiency of feed-forward processing per image (a) Feed-forward Efficiency per Input Image.
784 85 169110100100010000100000 "Diehl-15" PT-Spike(16) "Worst Case" O cc u r a n c e o f w e i g h t s upd a t i n g i n s t t r a i n i n g i t e r a t i o n Average efficiency of feed-back processing per Image per Error (b) Feed-back Efficiencies. "Minitaur" "Diehl-15" PT-Spike(16) N u m b e r o f S p i k e s Average Spike Activities in Testing iteration per Input Imageper Input Neuron (c) Power Consumptions ( α Joules / spike). Fig. 6: Processing Efficiency and Power Consumption pike(16)”, respectively, demonstrating the effectiveness of “dual-level acceleration” from decoding and learning.
D. Power Consumption
To roughly evaluate the power efficiency contributed by theproposed architecture, we adopted a similar methodology used in[7], [18]. A new candidate “Minitaur” [8] is introduced for a faircomparison since it is a more hardware-oriented rate-coded SNN. AsFig. 6c shows, “PT-Spike(16)” saves ∼ × and ∼ × power foreach input neuron and each input image over “Diehl-15”, respectively,indicating the efficiency of our proposed single-spike coding tech-nique. Compared with the hardware-oriented rate-coded SNN design“Minitaur”, “PT-Spike(16)” can still achieve ∼ . × ( ∼ . × ) powerreduction on each input neuron (input image). E. Discussions
The research of the time-based SNN represented by extreme sparsespikes, i.e. single spike design , is still in its infancy, and to ourbest knowledge, we have not seen any exemplar large networkssuccessfully demonstrated for performing the realistic cognitive tasks.Due to the unique time-based learning and information representation,the research in this area is quite challenge and unique. In thiswork, we adopt a proof-of-concept simple design, i.e. Single-LayerPerceptron to illustrate the design optimizations of the time-basedSNN, and demonstrate its potentials for realistic applications,thoughthe classification accuracy is still lower than that of state-of-the artDNNs and CNNs.Extending our design to multi-layered network will enhance itscapability to handle more complicated cognitive tasks, however, isnon-trivial, as a multi-layer learning rule needs to be developedto facilitate the spatial information transfer among different layers.While our proposed approach cannot be directly applied for the multi-layered network in its current form, the novel techniques proposedin this paper, i.e. “Temporal Kernel Coding”, “PT-Learning” and “A-Decoding” form the basis for the time-based multi-layer network.We believe the initial architecture developed in this paper will serveas a basic framework to the multi-layer network design, and mayencourage more interesting researches in this domain.V. C
ONCLUSION
As the rate-based spiking neural network (SNN) is subject to powerand speed challenges due to processing large number of spikes, in thiswork, we systematically studied the possibility of utilizing the morepower-efficient time-based SNN in real-world cognitive tasks. Threeintegrated techniques–precise temporal encoding, efficient supervisedtemporal learning and fast asymmetric decoding, were proposed toconstruct the Precise-Time-Dependent Single Spike NeuromorphicArchitecture, namely, “PT-Spike” . The single-spike temporal en-coding offers an energy-efficient information representation solutionwith the potentials of model size reduction. The supervised learningand asymmetric decoding can work cooperatively to deliver a moreeffective and efficient synaptic weight updating and classification. Ourevaluations on the MNIST database well demonstrate the advantagesof “PT-Spike” over the rate-based SNN in terms of network size,speed and power, with a comparable accuracy.R
EFERENCES [1] Y. LeCun et al. , “Deep learning,”
Nature , vol. 521, no. 7553, pp. 436–444, 2015.[2] C. Szegedy, “An overview of deep learning,”
AITP 2016 , 2016.[3] A. Krizhevsky et al. , “Imagenet classification with deep convolutionalneural networks,” in
Advances in neural information processing systems ,2012, pp. 1097–1105.[4] A. Farmahini-Farahani et al. , “Nda: Near-dram acceleration architectureleveraging commodity dram devices and standard memory modules,”in
High Performance Computer Architecture (HPCA), 2015 IEEE 21stInternational Symposium on . IEEE, 2015, pp. 283–295. [5] R. Andri et al. , “Yodann: An ultra-low power convolutional neuralnetwork accelerator based on binary weights,” in
VLSI (ISVLSI), 2016IEEE Computer Society Annual Symposium on . IEEE, 2016, pp. 236–241.[6] S. Han et al. , “Mcdnn: An approximation-based execution framework fordeep stream processing under resource constraints,” in
Proceedings of the14th Annual International Conference on Mobile Systems, Applications,and Services . ACM, 2016, pp. 123–136.[7] F. Akopyan et al. , “Truenorth: Design and tool flow of a 65 mw 1million neuron programmable neurosynaptic chip,”
IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems , vol. 34,no. 10, pp. 1537–1557, 2015.[8] D. Neil et al. , “Minitaur, an event-driven fpga-based spiking networkaccelerator,”
IEEE Transactions on Very Large Scale Integration (VLSI)Systems , vol. 22, no. 12, pp. 2621–2628, 2014.[9] F. Corradi et al. , “A neuromorphic event-based neural recording systemfor smart brain-machine-interfaces,”
IEEE transactions on biomedicalcircuits and systems , vol. 9, no. 5, pp. 699–709, 2015.[10] S. Thorpe et al. , “Spike-based strategies for rapid processing,”
Neuralnetworks , vol. 14, no. 6, pp. 715–725, 2001.[11] C. Liu et al. , “A memristor crossbar based computing engine optimizedfor high speed and accuracy,” in
VLSI (ISVLSI), 2016 IEEE ComputerSociety Annual Symposium on . IEEE, 2016, pp. 110–115.[12] W. Maass, “On the computational power of winner-take-all,”
Neuralcomputation , vol. 12, no. 11, pp. 2519–2535, 2000.[13] Y. Wang et al. , “Energy efficient rram spiking neural network for realtime classification,” in
Proceedings of the 25th edition on Great LakesSymposium on VLSI . ACM, 2015, pp. 189–194.[14] J. Sj¨ostr¨om et al. , “Spike-timing dependent plasticity,”
Spike-timingdependent plasticity , p. 35, 2010.[15] A. Borst et al. , “Information theory and neural coding,”
Nature neuro-science , vol. 2, no. 11, pp. 947–957, 1999.[16] A.N. Burkitt, “A review of the integrate-and-fire neuron model: I.homogeneous synaptic input,”
Biological cybernetics , vol. 95, no. 1, pp.1–19, 2006.[17] J.s. Seo et al. , “A 45nm cmos neuromorphic chip with a scalablearchitecture for learning in networks of spiking neurons,” in
CustomIntegrated Circuits Conference (CICC), 2011 IEEE . IEEE, 2011, pp.1–4.[18] Y. Cao et al. , “Spiking deep convolutional neural networks for energy-efficient object recognition,”
International Journal of Computer Vision ,vol. 113, no. 1, pp. 54–66, 2015.[19] S.K. Esser et al. , “Convolutional networks for fast, energy-efficientneuromorphic computing,”
Proceedings of the National Academy ofSciences , p. 201604850, 2016.[20] M. Chu et al. , “Neuromorphic hardware system for visual patternrecognition with memristor array and cmos neuron,”
IEEE Transactionson Industrial Electronics , vol. 62, no. 4, pp. 2410–2419, 2015.[21] R. Kempter et al. , “Temporal coding in the sub-millisecond range:Model of barn owl auditory pathway,” in
Advances in neural informationprocessing systems , 1996, pp. 124–130.[22] D.A. Butts et al. , “Temporal precision in the neural code and thetimescales of natural vision,”
Nature , vol. 449, no. 7158, pp. 92–95,2007.[23] C. Zhao et al. , “Energy efficient spiking temporal encoder design forneuromorphic computing systems,”
IEEE Transactions on Multi-ScaleComputing Systems , vol. 2, no. 4, pp. 265–276, 2016.[24] Q. Yu et al. , “Precise-spike-driven synaptic plasticity: Learning hetero-association of spatiotemporal spike patterns,”
Plos one , vol. 8, no. 11,p. e78318, 2013.[25] D.E. Rumelhart et al. , “Learning representations by back-propagatingerrors,”
Cognitive modeling , vol. 5, no. 3, p. 1, 1988.[26] R. G¨utig et al. , “The tempotron: a neuron that learns spike timing–baseddecisions,”
Nature neuroscience , vol. 9, no. 3, pp. 420–428, 2006.[27] F. Ponulak, “Resume-new supervised learning method for spiking neuralnetworks,”
Institute of Control and Information Engineering, PoznanUniversity of Technology.(Available online at: http://d1. cie. put. poznan.pl/˜ fp/research. html) , 2005.[28] W. Gerstner, “A framework for spiking neuron models: The spikeresponse model,”
Handbook of Biological Physics , vol. 4, pp. 469–516,2001.[29] D.F. Goodman et al. , “The brian simulator,”
Frontiers in neuroscience ,vol. 3, p. 26, 2009.[30] Y. LeCun et al. , “The mnist database of handwritten digits,” 1998.[31] P.U. Diehl et al. , “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,”
Frontiers in computational neuroscience ,vol. 9, p. 99, 2015.[32] Y. LeCun et al. , “Gradient-based learning applied to document recogni-tion,”