[PDF] PT-Spike: A Precise-Time-Dependent Single Spike Neuromorphic Architecture with Efficient Supervised Learning

Abstract

One of the most exciting advancements in AI over the last decade is the wide adoption of ANNs, such as DNN and CNN, in many real-world applications. However, the underlying massive amounts of computation and storage requirement greatly challenge their applicability in resource-limited platforms like the drone, mobile phone, and IoT devices etc. The third generation of neural network model--Spiking Neural Network (SNN), inspired by the working mechanism and efficiency of human brain, has emerged as a promising solution for achieving more impressive computing and power efficiency within light-weighted devices (e.g. single chip). However, the relevant research activities have been narrowly carried out on conventional rate-based spiking system designs for fulfilling the practical cognitive tasks, underestimating SNN's energy efficiency, throughput, and system flexibility. Although the time-based SNN can be more attractive conceptually, its potentials are not unleashed in realistic applications due to lack of efficient coding and practical learning schemes. In this work, a Precise-Time-Dependent Single Spike Neuromorphic Architecture, namely "PT-Spike", is developed to bridge this gap. Three constituent hardware-favorable techniques: precise single-spike temporal encoding, efficient supervised temporal learning, and fast asymmetric decoding are proposed accordingly to boost the energy efficiency and data processing capability of the time-based SNN at a more compact neural network model size when executing real cognitive tasks. Simulation results show that "PT-Spike" demonstrates significant improvements in network size, processing efficiency and power consumption with marginal classification accuracy degradation when compared with the rate-based SNN and ANN under the similar network configuration.

Full PDF

PPT-Spike: A Precise-Time-Dependent Single SpikeNeuromorphic Architecture with Efﬁcient Supervised Learning

Tao Liu ∗ , Lei Jiang † , Yier Jin ‡ , Gang Quan ∗ and Wujie Wen ∗∗ Florida International University , † Indiana University , ‡ University of Florida ∗ { tliu023, gang.quan, wwen } @ﬁu.edu, † [email protected], ‡ [email protected]ﬂ.edu Abstract —One of the most exciting advancements in Artiﬁcial Intelli-gence (AI) over the last decade is the wide adoption of Artiﬁcial NeuralNetworks (ANNs), such as Deep Neural Network (DNN) and Convolu-tional Neural Network (CNN), in real world applications. However, theunderlying massive amounts of computation and storage requirementgreatly challenge their applicability in resource-limited platforms likedrone, mobile phone and IoT devices etc. The third generation of neuralnetwork model–Spiking Neural Network (SNN), inspired by the workingmechanism and efﬁciency of human brain, has emerged as a promisingsolution for achieving more impressive computing and power efﬁciencywithin light-weighted devices (e.g. single chip). However, the relevantresearch activities have been narrowly carried out on conventional rate-based spiking system designs for fulﬁlling the practical cognitive tasks,underestimating SNN’s energy efﬁciency, throughput and system ﬂexibil-ity. Although the time-based SNN can be more attractive conceptually,its potentials are not unleashed in realistic applications due to lack ofefﬁcient coding and practical learning schemes. In this work, a P recise- T ime-Dependent Single Spike

Neuromorphic Architecture, namely “PT-Spike” , is developed to bridge this gap. Three constituent hardware-favorable techniques: precise single-spike temporal encoding, efﬁcientsupervised temporal learning and fast asymmetric decoding are proposedaccordingly to boost the energy efﬁciency and data processing capabilityof the time-based SNN at a more compact neural network model sizewhen executing real cognitive tasks. Simulation results show that “PT-Spike” demonstrates signiﬁcant improvements in network size, processingefﬁciency and power consumption with marginal classiﬁcation accuracydegradation, when compared with the rate-based SNN and ANN underthe similar network conﬁguration.

I. I

NTRODUCTION

Deep learning enabled neural network system, i.e. deep neuralnetwork (DNN) or convolutional neural network (CNN), has foundbroad applications in realistic cognitive tasks such as speech recogni-tion, image processing, machine translation and object detection [1],[2]. However, performing high-accurate testings for complex DNNsor CNNs requires massive amounts of computation and memoryresources, leading to limited energy efﬁciency. For instance, therecognition implementation of CNN–AlextNet [3] involves not onlyhuge volumes of parameters (61 million) generating intensive off-chipmemory accesses but also a large number of computing-intensive highprecision ﬂoating-point operations (1.5 billion) [4]. Such a weaknessmakes these solutions less attractive for many emerging applicationsof mobile autonomous systems like smart device, Internet-of-Things(IoT), wearable device, robotics etc., where very tighten powerbudget, hardware resource and footprint are enforced [5], [6].Different from the CNN and DNN designs, spiking-based neu-romorphic computing, which is inspired from the biological spikingneural network (SNN), has featured as achieving tremendous comput-ing efﬁciency at much lower power of small footprint platforms, e.g.the famous IBM TrueNorth chip that has total 1 million synapsesand an operating power of ∼ This work is supported in part by NSF under project CNS-1423137and 2016-2017 Collaborative Seed Award Program of Florida Center forCybersecurity (FC ). more biological plausible time-based SNN may offer better energyefﬁciency and system throughput [10], since theoretically the infor-mation can be ﬂexibly embedded in the time (temporal) domain ofshort and sparse spikes instead of the spiking count represented bya group of dense spikes in rate coding, e.g. the spike occurrencefrequency is proportional to the intensity of the input like each pixeldensity of the image [7], [11]. As a result, the rate-based SNN isnaturally more power-hungry than that of time-based SNN due tothe increased number of spikes and relevant spike operations, suchas synaptic weighting and Integrate-and-Fire (IFC) etc. Meanwhile,the processing efﬁciency of time-based SNN can be further enhancedby performing an early decision making based on the temporalinformation extracted from early ﬁred spikes, while in rate coding, theclassiﬁcation cannot be initiated until the last moment, e.g. winner-takes-all rule by sorting the number of spikes ﬁred during the entireperiod of decoding time for each output neuron [12].However, the potentials of such an emerging architecture are signif-icantly underestimated due to lack of efﬁcient hardware-favorable so-lutions for time-based information representation and complex spike-timing-dependent (temporal) training of biological synapses towardspractical cognitive applications [13]. On one hand, translating theinput stimulus (i.e. image pixels) to the delay of the spikes, namelytime-based encoding, is non-trivial because the coding efﬁciencycan be easily degraded by the biased spike delays distributed inthe limited coding intervals. Also, the hardware realization of timecoding is usually expensive, as the time-based spike kernel needsto be carefully designed to provide accurate time information (e.g.pre-synaptic/post-synaptic time [10]) for time-based training. On theother hand, realizing more biological plausible spiking-time basedtraining, i.e. unsupervised spiking-time-dependent plasticity (STDP),is very complex and costly due to the exponential time dependenceof weight change and difﬁcult convergence of learning [14]. In real-world applications, training of the rate-based SNN can be usually per-formed off-line by directly borrowing the standard back-propagationalgorithm from artiﬁcial neural network (ANN) [11]. However, thistime-independent learning rule does not ﬁt the time-dependent SNNbecause of a fundamentally different learning mechanism.In this work, we investigate the possibility of unleashing thepotentials of time-based single-spike SNN architecture in realisticapplications by orchestrating the efﬁcient time-based coding/decodingand learning algorithm. A

Precise-Time-Dependent Single Spike Neu-romorphic Architecture , namely “

PT-Spike ”, is proposed to facilitatethe cognitive tasks like the MNIST digit recognition. Our “

PT-Spike ” incorporates three integrated techniques: precise single-spiketemporal encoding, efﬁcient supervised temporal learning, and fastasymmetric decoding. Our major contributions are:1) We develop a precise-temporal encoding approach to efﬁcientlytranslate the information into the temporal domain of a singlespike. The single spike solution dramatically reduces the en-ergy, while offering efﬁcient model size reduction;2) We propose a supervised temporal learning algorithm to facil-itate synaptic plasticity on this single-spike system. The pro-posed algorithm signiﬁcantly improves the learning capabilityand achieves comparable accuracy when compared to the ANNand rate-based SNN under the similar conﬁguration;3) We design a novel asymmetric decoding to relieve the unique a r X i v : . [ c s . N E ] M a r e T i R e R i Stimuli

Time-coded SNN

Rate-coded

SNN t 164t T T R R t 102ttttt Class Class Fig. 1: The Conceptual View of Rate-coding and Time-coding in SNNs. and serious weight competition issue existing in this single-spike system, and signiﬁcantly improve the efﬁcacy and efﬁ-ciency of synaptic weight updating.II. B

ACKGROUNDS AND M OTIVATIONS

A. Neural Coding in SNNs

The neural coding in SNNs can be generally categorized as ratecoding, time coding, rank coding and population coding etc. [15]. Inparticularly, the ﬁrst two codings are the most attractive, since eachpiece of coded information is only associated with the spikes gener-ated by a single input neuron, offering simpliﬁed encoding/decodingprocedures and design complexity.Fig. 1 demonstrates an example of conceptual comparison betweenrate coding and time coding in SNNs. T e and T i ( R e and R i )denote two types of input neurons: the time-coded (rate-coded)excitatory and inhibitory neurons, respectively. The excitatory neuroncan exhibit an active response to the stimulus while the inhibitoryneuron intends to keep silent. T and T ( R and R ) denote twotime-coded (rate-coded) output neurons for the classiﬁcation. Therate-based SNN generates far more number of spikes than that oftime-based SNN in both types of input neurons. After the input spikesare processed by the two different SNNs, a single spike ﬁring at aspeciﬁc time interval can perform an inference task in the output layerof the time-based SNN. However, a considerable number of spikesare needed for fulﬁlling a rate-based classiﬁcation in the rate-basedSNN, indicating a much higher power consumption. Moreover, therate-based SNN may exhibit a slower processing speed than that oftime-based SNN, since the output neuron of the former SNN needsto count the spiking numbers (i.e. through Integrate-and-Fire [16]) inthe whole predeﬁned time window, while that of the latter one mayquickly suspend its computations once a spike is detected. B. Limitation of Existing Spiking Neuromorphic Computing Research

Neuromorphic Designs:

Many studies have been conducted tofacilitate the spiking based Neuromorphic Computing System (NCS)designs in real hardware implementations, including CMOS VLSIcircuit [7], [17], [18], [19], reconﬁgurable FPGA [8], and emergingmemristor crossbar [20], [11]. However, these works mainly focuson the rate- or time-based SNN model mapping and hardwareimplementations, rather than the SNN architecture optimization, i.e.coding, decoding and learning approaches etc.

Temporal Coding:

The concept of temporal coding, which relieson the arrival time or delay of a spike train for information repre-sentation, has been widely explored and proved in the developmentof time-based SNN [21], [22]. These theoretical studies, however,mainly emphasize on the biological explanations of time-based SNNmodels based on simple cognitive benchmarks (i.e. two inputs XORgate), which are far from the complicated real-world problems such asimage recognition. Recently, Zhao et al. [23] proposed an encodingcircuit to handle the temporal coding, however, this type of workstill concentrates on component-level hardware implementations withsimple case studies, and hence is lack of a holistic architecture-levelsolution set capable of handling realistic tasks. In [24], a completetime-based SNN design is proposed. However, their solution suffersfrom limited accuracy fundamentally constrained by existing codingand temporal learning rule, and is not optimized towards hardware-based neuromorphic system designs.

Temporal Learning:

Since the popular learning approaches suchas back-propagation [25] widely used in ANN or rate-based SNNare unable to handle precise-time-dependent information due to afundamentally different neural processing, many proposals dedicatedto the time-based learning have been developed [14], [26], [27].However, these learning algorithms are neither hardware-favorablenor applicable for realistic tasks due to the expensive convergence andtheoretical limitation. For example, in the unsupervised Spike-timingdependent plasticity (STDP) learning rule, the neural network struc-ture and synaptic computation will be exponentially increased due tothe expensive convergence and clustering. The proposed “Tempotron”and “Remote Supervised Method (ReSuMe)” can use the teachingspike to adjust desired spiking time for temporal learning, however,are not applicable to handle complicated patterns.Our proposed “PT-Spike” is substantially different from previousstudies: we explore how the time-based single-spike

SNN archi-tecture can be designed to perform the realistic tasks through aholistic efﬁcient techniques spanning time-based coding, learning todecoding. A low cost and efﬁcient temporal learning named “PT-Learning” is augmented from the “Tempotron” learning by consider-ing a synthesized contribution of the cost function and the hardware-favorable time-dependent kernel for weight updating. By integrat-ing with proposed “Precise Temporal Encoding” and “AsymmetricDecoding”, “

PT-Spike ” can improve the accuracy, power, learningefﬁciency, and the model size reduction through the spatial-temporalinformation conversion signiﬁcantly.III. D

ESIGN D ETAILS

A. System Architecture

Fig. 2 shows a comprehensive data processing ﬂow of proposed“

PT-Spike ”. First, the stimulus will be captured by the temporalperceptors to generate a sparse spike train (i.e. single spike ) through“Precise Temporal Encoding”. Each spike train will be furthermodulated in temporal domain by a linear-decayed spiking kernelto form time-dependent voltage pulse. Second, those voltage pulseswill be sent to the synaptic network for a weighting process, i.e. thememristor crossbar with IFC design can be employed for parallelprocessing. The output neurons will exhibit time-varying weightingresponses due to the time-dependent input information. After that,the output neuron will ﬁre a spike if the weighted post-synapticvoltage crosses a threshold voltage. Then spike trains from the outputlayer will be transmitted to the “Asymmetric Decoding”. Finally,the target pattern will be classiﬁed by analyzing the synchronizedoutput spikes with a predeﬁned asymmetric rule. During the learningprocedure, desired spike patterns are coded by following the similarasymmetric rule during decoding. The detected errors will be sent-back for synaptic plasticity through “PT-Learning”–a supervisedtemporal learning algorithm.

B. Precise Temporal Encoding

As discussed in Section. II, in traditional rate coding, a largenumber of spikes within a proper time window will be neededto precisely indicate the amplitude of an input signal, i.e. thepixel density of visual stimulus. To maximize the power efﬁciencywith minimized number of spikes, the input information will berepresented as an extreme sparse train– single spike and its occurringdelay in aforementioned coding approach. However, such a “one-to-one” mapping between each stimulus and spike train of each inputneuron can lead to a signiﬁcant energy overhead. Meanwhile, thetime or temporal information of those spike trains are not fullyleveraged by each neuron, resulting in limited coding efﬁciencythus a dramatical accuracy reduction. As we shall present later, ourresults on “MNIST” benchmark show that the “one-to-one” mappingachieves very unacceptable training accuracy (( ∼ ) even undera large model size, that is, 784 input neurons for a × image. X X A A A N X M Flexible Precise Temporal Encoding

Error detection with

Desired Spiking Pattern

Input Stimulus

False missing

False firing

Synaptic Processing

Asymmetric

Decoding Scheme Correct

Asymmetric Decoding

Synaptic Plasticity – PT-Learning

Input Layer Output Layer V ( t ) t+1 K K Spiking Kernel t V ( t ) V t h Err + Err - T fal-fir T fal-mis V max - V max + Fig. 2: The overview of “

PT-Spike ” system architecture.

In “

PT-Spike ”, we further propose the “Precise Temporal Encod-ing”. As shown in Fig. 2, the “Precise Temporal Encoding” is inspiredfrom human visual cortex and Convolutional Neural Network (CNN),where a Temporal Kernel (i.e. a unit square matrix) will be applied onthe full image to capture the spatial information and then translatedinto a single spike delay in temporal domain as a neuron input byperceiving the localized information from multiple interested pixels,i.e. spiking delay is equal to the average density among severalselected pixels. In practice, by selecting a proper stride with whichwe slide the Temporal Kernel, e.g. smaller than the dimensionality ofTemporal Kernel, a portion of localized spatial information will beshared by adjacent kernel sliding. Consequently, the spatial localitiescan be further transformed into temporal localities, thus to uniformlyallocate the spiking delay assigned to each input neuron in timedomain, translating into improved coding efﬁciency and classiﬁcationaccuracy.Another unique advantage of the proposed “Precise TemporalEncoding” is to offer a ﬂexible model size reduction. Differentfrom traditional “one-to-one” mapping, various choices of modelsize reduction can be easily achieved by reconﬁguring the size ofTemporal Kernel. Fig. 3 illustrates such an interesting concept offeredby “Precise Temporal Encoding”. Increasing the Temporal Kernel sizecan enrich the temporal information (see encoding time frame from T = 16 ms to T = 256 ms in Fig. 3), and hence reduce the neededspatial information or input neurons, e.g. 169 input neurons for “PT-Spike (16)” v.s. 49 input neurons for “PT-Spike (256)”. The trainingand inference accuracies will be slightly changed according to theselected Temporal Kernel size (see Section. IV). C. Synaptic Processing and Linearized Spiking Kernel

Once the delay for the single spike is determined, as shown inFig. 2, a spiking kernel K will be applied to shape the associatedspikes for input neurons. The kernel plays an important role in thefollowing synaptic weighting for the output voltage V n ( t ) , as shown S pa t i a l Spatial InformationPT-Spike(8)PT-Spike(16)PT-Spike(256)

Encoding time frame T (ms)0 50 100 150 200 250 300 T e m po r a l Temoral Information

Fig. 3: Model size reduction through adjustable Temporal Kernel. in Eq ( 1): V n ( t ) = M (cid:88) m w mn T (cid:88) t s K ( t − t s ) (1)where weight V n ( t ) represents the voltage of output neuron n , w mn denotes the synaptic efﬁcacy between input neuron X m and outputneuron A n . t s is the decoded spiking delay of X m . To providesufﬁcient and accurate temporal information for the classiﬁcation, theexponential decayed post-synaptic potential in the biological spikeresponse neural model [28] can be expressed as: K ( t − t s ) = µ ( exp [ − ( t − t s ) /τ ] − exp [ − ( t − t s ) /τ ]) (2)where τ ( τ and τ ) denotes decay time constant, and µ is thenormalizing constant. However, such an exponential decaying func-tion requires expensive computation and hardware resource. In “ PT-Spike ”, we employ a more hardware-favorable kernel function K –alinear decaying function (see K and K comparison in Fig. 2), tosimplify the costly dual-exponential function K : K ( t − t s ) = 1 − τ ( t − t s ) (3)As we shall show in Section. IV, such a linear approximation causevery marginal classiﬁcation accuracy degradation. Besides, this linearkernel function will be also applied to detect the input voltagecontributions to the output spike in our proposed “PT-Learning”. D. Asymmetric Decoding

In ‘

PT-Spike ”, a novel Asymmetric decoding scheme, namely “A-Decoding”, is proposed for the classiﬁcation. As the error signalcritical for the proposed supervised temporal learning will be alsogenerated through asymmetric decoding, we will discuss the “A-Decoding” technique ﬁrst.In rate-based SNN, the target pattern can be determined by theoutput neuron with highest spiking numbers. The costly weightupdating will be performed in all synapses at each iteration oflearning. The subsequent neural competition (weight conﬂict) among P P P P i N N N N i M

10 10 - Fire & Cut Order

Firing

Not firing independent

Fig. 4: An overview of proposed “A-Decoding”. ifferent patterns can be rectiﬁed by enough information providedby the large number of input spikes. Hence a good classiﬁcationaccuracy may be achieved for all different patterns. However, thesimilar case cannot occur in our proposed “

PT-Spike ”, since its weightupdating solely relies on the very limited number of spare spikes (e.g.a single spike) in temporal domain. In “

PT-Spike ”, we further proposethe “A-Decoding” to alleviate the neural competition for accuracyimprovement.Fig. 4 illustrates the key idea of proposed “A-Decoding”, includingpattern readout and error detection. Pattern { P i } can be decodedbased on the ﬁring status of output neuron { N i } . In our asymmetricdecoding, the output neuron can work on three different statuses:“ﬁring”, “not ﬁring” and “independent”, as shown in Fig. 4. Note“independent” means that the associated neurons will not participatein the learning process of a certain pattern, and it will only occur inlearning mode.In testing mode, the output neuron will be only in following twostatus: { − firing/ − notfiring } . The target pattern is scannedaccording to the order of the ﬁrst ﬁring neuron. Assume a binary code ˜ N ˜ N ˜ N · · · ˜ N i is generated by output neurons { N i } , a Huffman-style decoding procedure can be performed (See Fig. 4 left part).For example, if the ﬁrst ﬁring neuron is N , the corresponding codewill be ˜0˜0˜1 . Thus, the target pattern is P . In “ PT-Spike ”, the earlydetection of testing, namely “Fire&Cut”, can be realized based on thetemporal “winner-take-all” rule: Once the IFC of neuron N i triggersa spike, all the remained IFCs for other neurons will be shut downby following the “Fire&Cut Order”, which may save the additionalpower consumed by the IFCs.In learning mode, a desired spike pattern is reversely generatedaccording to the Huffman-style decoding of pattern { P i } (See Fig. 4right part). Once a participated neuron N i triggers an unexpectedﬁring or a missing ﬁring, an error will be detected and only the synap-tic weights of N i will be modiﬁed according to our proposed “PT-learning”. Note only “partial” output neurons (NOT in“independent”status), will be involved during the learning of pattern { P i } , namely“Partial Learning”. Such a mechanism signiﬁcantly accelerates thelearning procedure and saves power consumed by the unnecessaryneural processing. Meanwhile, { N i } is “asymmetrically” correlatedwith { P i } and thus can ease the neural competition. For example,neuron N i only engages in the synaptic plasticity of pattern P i and will be ignored during the learning of all other patterns. Aswe shall show later, by taking advantages of “Fire&Cut”, “PartialLearning” and “Ease Competition”, our proposed “A-Decoding” cansigniﬁcantly enhance the weighting efﬁciency and learning accuracy. E. PT-Learning

Our proposed “PT-Learning” coordinates with the aforementioned“A-Decoding” to capture the errors needed for synaptic weightsupdating. An error detected by the “A-Decoding” will be processedby “PT-Learning” to generate corresponding weight changes and sendback for synapse updating. As shown in Fig. 2, based on the actualand expected spiking pattern, two types of errors may occur in theoutput neuron: “false missing” and “false ﬁre”. Here “false missing”means that the integrated voltage can not reach the threshold in outputneuron to trigger the expected output spike, while “false ﬁre” isdeﬁned as an undesired spike ﬁring.As shown in Algorithm. 1, once an error is detected, the errorspiking time ( T fal ) and the cost function ( Err ) will be extractedfrom T max and V th − V max . Here V max and T max are the maximumvoltage amplitude and its occurrence time, respectively. A negative(positive) Err means a false- ﬁre (missing). Hence, the gradient of

Err with respect to each weight w c at pre-synaptic spiking time T c can be calculated as: − d Err d w c = Err (cid:88) T c ≤ T max K ( T max − T c )+ ∂V ( T max ) ∂T max d T max d w c (4) Algorithm 1:

Post-Synaptic Processing // Pseudocode of Asymmetric Decoding and PT-Learning Detecting: foreach output neuron N i in [ N .. N I ] do if testing mode then if ﬁring then return P i // “Fire&Cut” else // learning mode if N i is independent to P i then return // “Partial Learning” and “Ease Competition” else if actual ﬁring pattern (cid:54) = desired pattern then call Learning( V max , T max ) Learning: // change synaptic weights of N i Err ← V th − V max T fal ← T max foreach input neuron X c in [ X .. X M ] do if K ( T fal − T c ) (cid:54) then continue // “Partial Updating” else // pre-spiking at T c contributed to post-spiking ∆ w ← λErrK ( T fal − T c ) w ci ← ∆ w + w ci Here K is the linear decayed spike kernel deﬁned in Eq.( 3).As pre-synaptic spikes are weighted through synaptic efﬁcacy w c before T max , ∂V ( T max ) ∂T max = 0. By further considering Err into thechange of w c , ∆ w c can be expressed as: ∆ w c = λErr (cid:88) T c ≤ T fal K ( T fal − T c ) (5)where λ denotes the learning rate and spike kernel K can be usedagain to calculate the contributions from the input neuron X c at time T c .As discussed in “A-Decoding”, only partial output neurons will beinvolved during the learning of a certain pattern, meaning that onlypartial synaptic weights will be updated. The dual-level acceleration,contributed by both “A-Decoding” and “PT-Learning”, can improvethe learning efﬁciency signiﬁcantly. As we shall show later, the synap-tic computation can be reduced more than 200% when compared withthe standard learning approach without accelerations. Moreover, “PT-Learning” together with “A-Decoding” can boost the accuracy forrealistic recognitions task signiﬁcantly.IV. E VALUATIONS

To evaluate the accuracy, processing efﬁciency and power con-sumption of our proposed “

PT-Spike ” neuromorphic architecture,extensive experiments are conducted in the platforms like MATLABand heavily modiﬁed open-source simulator–Brian [29].

A. Simulation Setup

In our evaluation, a full MNIST database is adopted as thebenchmark [30]. A set of “

PT-Spike ” designs–“PT-Spike(R)” areimplemented to demonstrate the leveraged temporal encoding where

TABLE I: Structural Parameters of Selected Candidates.

Candidate Number ofinput neurons Number ofoutput neurons Number ofsynaptic weights neural processingtime-frame TPT-Spike(4) 196 10 1960 4msPT-Spike(16) 169 10 1690 16msPT-Spike(25) 144 10 1440 25msPT-Spike(100) 100 10 1000 100msDiehl-15 784 100 78400 500msLecun-98 784 10 7840 - (a) Training and Testing Accuracies of Selected Candidates. "Exponential Kernel" non "A-Decoding""Tempotron" Training Accuracy (%) (b) Training Accuracy with Different Designs.

Fig. 5: Accuracy Evaluations for Difference Candidates and Design Optimizations. “R” denotes the number of interested pixels per input neuron or thesize of Temporal Kernel in proposed “Precise Temporal Encoding”.We also assume the encoding time frame ( T ) is T = τ × R ( ms ) ,where τ = 1( ms ) is the ﬁxed minimum time interval to ﬁre the spike.The maximum temporal information T can be adjusted by tuning theparameter R . The number of input neurons (spatial domain) can beexpressed as M = (cid:100) P −√ R +1 S (cid:101) , where P and S represent the widthof an input image and the stride with which we slide the TemporalKernel. P = 28 and S = 2 are selected in our evaluations ofMNIST dataset. Two representative baselines under similar networkconﬁgurations, including the rate-coded SNN–“Diehl-15” [31] andthe ANN–“Lecun-98” [32], are also implemented for the energy andperformance comparisons with proposed “ PT-Spike ”.Table. I presents the detailed structural parameters of selectedcandidates. Compared with the “Diehl-15” and “Lecun-98”, ourproposed temporal encoding achieves signiﬁcant model size reductionfor all “PT-Spike” designs, i.e. ∼ × (“PT-Spike(4)” v.s. “Diehl-15”) and ∼ × (“PT-Spike(4)” v.s. “Lecun-98”). B. Accuracy

Fig. 5a shows the accuracy comparison among different “PT-Spike(R)”, “Lecun-98” and “Diehl-15”. “PT-Spike(25)” can achieve verycomparable accuracy at much lower cost ( ∼ %, 1440 synapticweights) when compared with “Diehl-15” ( ∼ %, 78400 synapticweights) and “Lecun-98” ( ∼ %, 7840 synaptic weights). Mean-while, “PT-Spike(16)” and ‘PT-Spike(25)” also show a very closeaccuracy ( ∼ % and ∼ %), which is much better than “PT-Spike(4)” and “PT-Spike(100)” ( ∼ % and ∼ %).We also evaluated the individual training accuracy improvementcontributed by various proposed techniques, such as “linearizedspiking kernel”, “Precise Temporal Encoding”, “A-Decoding” and“PT-Learning”, receptively. Here, we choose the “PT-Spike(16)” asthe baseline design that employs all aforementioned techniques.“Exponential Kernel”, “one-to-one mapping”, “non A-Decoding” and“Tempotron” denote the designs that substitute only one out of thefour techniques. As shown in Fig. 5b, “PT-Spike(16)” shows a verymarginal accuracy degradation ( . %) because of the “linearizedspiking kernel” ( K in Eq.( 3)) when compared with the original costly “Exponential Kernel” design ( . %, K in Eq.( 2)). Fur-thermore, “PT-Spike(16)” boosts the accuracy by ∼ %, ∼ %and ∼ % when compared with the designs of “one-to-one map-ping” ( ∼ %), “non A-Decoding” ( ∼ %), and the theoretical“Tempotron” learning rule ( ∼ %), respectively, which clearlydemonstrates the effectiveness of the proposed “Precise TemporalEncoding”, “A-Decoding” and “PT-Learning”. C. Processing Efﬁciency

The occurrence frequency of synaptic events is calculated toevaluate the system processing efﬁciency, including both weightingand weights updating. Fig. 6a compares the number of weightingoperations among three designs in the feed-forward pass. Unlike theother candidates, the amount of weight operations of “PT-Spike(16)”is different between training and testing due to the “Fire&Cut”mechanism in“A-Decoding”. Hence, the weighting of the ﬁrst testingiteration is also included in “PT-Spike(16)”. Even the “non A-Decoding”, i.e. “PT-Spike(16)” without the “A-Decoding” technique,gains ∼ × weighting operation reduction as compared with“Diehl-15” since rate-coded SNN requires a long time windowto process the spikes with enlarged neuron model size, causingtremendous weighting processes on each time slot. Compared with“non A-Decoding”, weighting operations of “PT-Spike(16)” can befurther reduced by ∼ % and ∼ % in ﬁrst training iterationand testing iteration, respectively. As expected, the “early-detection”working mechanism in “A-Decoding” removes many unnecessaryweighting operations on both “initialized” weights and “well-trained”weights.We also characterize the occurrence frequency of weights up-dating during the ﬁrst training iteration to evaluate the processingefﬁciency in the feed-back pass. As Fig. 6b shows, even “WorstCase” (i.e. “PT-Spike(16)” without employing “A-Decoding” and“PT-Learning”) achieves ∼ . × and × reductions on weightsupdating per image and per error, respectively, when comparedwith “Diehl-15”. Such impressive improvement is introduced by thesigniﬁcant compressed model size. Moreover, compared with the“worst case”, “PT-Learning” and “A-Decoding” contribute ∼ × and ∼ × weights updating reduction per error and per image for “PT- "Diehl-15" PT-Spike(16) non "A-Decoding" O cc u r a n c e o f W e i g h t i n g p e r I m a g e Average efficiency of feed-forward processing per image (a) Feed-forward Efﬁciency per Input Image.

784 85 169110100100010000100000 "Diehl-15" PT-Spike(16) "Worst Case" O cc u r a n c e o f w e i g h t s upd a t i n g i n s t t r a i n i n g i t e r a t i o n Average efficiency of feed-back processing per Image per Error (b) Feed-back Efﬁciencies. "Minitaur" "Diehl-15" PT-Spike(16) N u m b e r o f S p i k e s Average Spike Activities in Testing iteration per Input Imageper Input Neuron (c) Power Consumptions ( α Joules / spike). Fig. 6: Processing Efﬁciency and Power Consumption pike(16)”, respectively, demonstrating the effectiveness of “dual-level acceleration” from decoding and learning.

D. Power Consumption

To roughly evaluate the power efﬁciency contributed by theproposed architecture, we adopted a similar methodology used in[7], [18]. A new candidate “Minitaur” [8] is introduced for a faircomparison since it is a more hardware-oriented rate-coded SNN. AsFig. 6c shows, “PT-Spike(16)” saves ∼ × and ∼ × power foreach input neuron and each input image over “Diehl-15”, respectively,indicating the efﬁciency of our proposed single-spike coding tech-nique. Compared with the hardware-oriented rate-coded SNN design“Minitaur”, “PT-Spike(16)” can still achieve ∼ . × ( ∼ . × ) powerreduction on each input neuron (input image). E. Discussions

The research of the time-based SNN represented by extreme sparsespikes, i.e. single spike design , is still in its infancy, and to ourbest knowledge, we have not seen any exemplar large networkssuccessfully demonstrated for performing the realistic cognitive tasks.Due to the unique time-based learning and information representation,the research in this area is quite challenge and unique. In thiswork, we adopt a proof-of-concept simple design, i.e. Single-LayerPerceptron to illustrate the design optimizations of the time-basedSNN, and demonstrate its potentials for realistic applications,thoughthe classiﬁcation accuracy is still lower than that of state-of-the artDNNs and CNNs.Extending our design to multi-layered network will enhance itscapability to handle more complicated cognitive tasks, however, isnon-trivial, as a multi-layer learning rule needs to be developedto facilitate the spatial information transfer among different layers.While our proposed approach cannot be directly applied for the multi-layered network in its current form, the novel techniques proposedin this paper, i.e. “Temporal Kernel Coding”, “PT-Learning” and “A-Decoding” form the basis for the time-based multi-layer network.We believe the initial architecture developed in this paper will serveas a basic framework to the multi-layer network design, and mayencourage more interesting researches in this domain.V. C

ONCLUSION

As the rate-based spiking neural network (SNN) is subject to powerand speed challenges due to processing large number of spikes, in thiswork, we systematically studied the possibility of utilizing the morepower-efﬁcient time-based SNN in real-world cognitive tasks. Threeintegrated techniques–precise temporal encoding, efﬁcient supervisedtemporal learning and fast asymmetric decoding, were proposed toconstruct the Precise-Time-Dependent Single Spike NeuromorphicArchitecture, namely, “PT-Spike” . The single-spike temporal en-coding offers an energy-efﬁcient information representation solutionwith the potentials of model size reduction. The supervised learningand asymmetric decoding can work cooperatively to deliver a moreeffective and efﬁcient synaptic weight updating and classiﬁcation. Ourevaluations on the MNIST database well demonstrate the advantagesof “PT-Spike” over the rate-based SNN in terms of network size,speed and power, with a comparable accuracy.R

EFERENCES [1] Y. LeCun et al. , “Deep learning,”

Nature , vol. 521, no. 7553, pp. 436–444, 2015.[2] C. Szegedy, “An overview of deep learning,”

AITP 2016 , 2016.[3] A. Krizhevsky et al. , “Imagenet classiﬁcation with deep convolutionalneural networks,” in

Advances in neural information processing systems ,2012, pp. 1097–1105.[4] A. Farmahini-Farahani et al. , “Nda: Near-dram acceleration architectureleveraging commodity dram devices and standard memory modules,”in

High Performance Computer Architecture (HPCA), 2015 IEEE 21stInternational Symposium on . IEEE, 2015, pp. 283–295. [5] R. Andri et al. , “Yodann: An ultra-low power convolutional neuralnetwork accelerator based on binary weights,” in

VLSI (ISVLSI), 2016IEEE Computer Society Annual Symposium on . IEEE, 2016, pp. 236–241.[6] S. Han et al. , “Mcdnn: An approximation-based execution framework fordeep stream processing under resource constraints,” in

Proceedings of the14th Annual International Conference on Mobile Systems, Applications,and Services . ACM, 2016, pp. 123–136.[7] F. Akopyan et al. , “Truenorth: Design and tool ﬂow of a 65 mw 1million neuron programmable neurosynaptic chip,”

IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems , vol. 34,no. 10, pp. 1537–1557, 2015.[8] D. Neil et al. , “Minitaur, an event-driven fpga-based spiking networkaccelerator,”

IEEE Transactions on Very Large Scale Integration (VLSI)Systems , vol. 22, no. 12, pp. 2621–2628, 2014.[9] F. Corradi et al. , “A neuromorphic event-based neural recording systemfor smart brain-machine-interfaces,”

IEEE transactions on biomedicalcircuits and systems , vol. 9, no. 5, pp. 699–709, 2015.[10] S. Thorpe et al. , “Spike-based strategies for rapid processing,”

Neuralnetworks , vol. 14, no. 6, pp. 715–725, 2001.[11] C. Liu et al. , “A memristor crossbar based computing engine optimizedfor high speed and accuracy,” in

VLSI (ISVLSI), 2016 IEEE ComputerSociety Annual Symposium on . IEEE, 2016, pp. 110–115.[12] W. Maass, “On the computational power of winner-take-all,”

Neuralcomputation , vol. 12, no. 11, pp. 2519–2535, 2000.[13] Y. Wang et al. , “Energy efﬁcient rram spiking neural network for realtime classiﬁcation,” in

Proceedings of the 25th edition on Great LakesSymposium on VLSI . ACM, 2015, pp. 189–194.[14] J. Sj¨ostr¨om et al. , “Spike-timing dependent plasticity,”

Spike-timingdependent plasticity , p. 35, 2010.[15] A. Borst et al. , “Information theory and neural coding,”

Nature neuro-science , vol. 2, no. 11, pp. 947–957, 1999.[16] A.N. Burkitt, “A review of the integrate-and-ﬁre neuron model: I.homogeneous synaptic input,”

Biological cybernetics , vol. 95, no. 1, pp.1–19, 2006.[17] J.s. Seo et al. , “A 45nm cmos neuromorphic chip with a scalablearchitecture for learning in networks of spiking neurons,” in

CustomIntegrated Circuits Conference (CICC), 2011 IEEE . IEEE, 2011, pp.1–4.[18] Y. Cao et al. , “Spiking deep convolutional neural networks for energy-efﬁcient object recognition,”

International Journal of Computer Vision ,vol. 113, no. 1, pp. 54–66, 2015.[19] S.K. Esser et al. , “Convolutional networks for fast, energy-efﬁcientneuromorphic computing,”

Proceedings of the National Academy ofSciences , p. 201604850, 2016.[20] M. Chu et al. , “Neuromorphic hardware system for visual patternrecognition with memristor array and cmos neuron,”

IEEE Transactionson Industrial Electronics , vol. 62, no. 4, pp. 2410–2419, 2015.[21] R. Kempter et al. , “Temporal coding in the sub-millisecond range:Model of barn owl auditory pathway,” in

Advances in neural informationprocessing systems , 1996, pp. 124–130.[22] D.A. Butts et al. , “Temporal precision in the neural code and thetimescales of natural vision,”

Nature , vol. 449, no. 7158, pp. 92–95,2007.[23] C. Zhao et al. , “Energy efﬁcient spiking temporal encoder design forneuromorphic computing systems,”

IEEE Transactions on Multi-ScaleComputing Systems , vol. 2, no. 4, pp. 265–276, 2016.[24] Q. Yu et al. , “Precise-spike-driven synaptic plasticity: Learning hetero-association of spatiotemporal spike patterns,”

Plos one , vol. 8, no. 11,p. e78318, 2013.[25] D.E. Rumelhart et al. , “Learning representations by back-propagatingerrors,”

Cognitive modeling , vol. 5, no. 3, p. 1, 1988.[26] R. G¨utig et al. , “The tempotron: a neuron that learns spike timing–baseddecisions,”

Nature neuroscience , vol. 9, no. 3, pp. 420–428, 2006.[27] F. Ponulak, “Resume-new supervised learning method for spiking neuralnetworks,”

Institute of Control and Information Engineering, PoznanUniversity of Technology.(Available online at: http://d1. cie. put. poznan.pl/˜ fp/research. html) , 2005.[28] W. Gerstner, “A framework for spiking neuron models: The spikeresponse model,”

Handbook of Biological Physics , vol. 4, pp. 469–516,2001.[29] D.F. Goodman et al. , “The brian simulator,”

Frontiers in neuroscience ,vol. 3, p. 26, 2009.[30] Y. LeCun et al. , “The mnist database of handwritten digits,” 1998.[31] P.U. Diehl et al. , “Unsupervised learning of digit recognition using spike-timing-dependent plasticity,”

Frontiers in computational neuroscience ,vol. 9, p. 99, 2015.[32] Y. LeCun et al. , “Gradient-based learning applied to document recogni-tion,”