[PDF] Combining Spiking Neural Network and Artificial Neural Network for Enhanced Image Classification

Abstract

With the continued innovations of deep neural networks, spiking neural networks (SNNs) that more closely resemble biological brain synapses have attracted attention owing to their low power consumption.However, for continuous data values, they must employ a coding process to convert the values to spike trains.Thus, they have not yet exceeded the performance of artificial neural networks (ANNs), which handle such values directly.To this end, we combine an ANN and an SNN to build versatile hybrid neural networks (HNNs) that improve the concerned performance.To qualify this performance, MNIST and CIFAR-10 image datasets are used for various classification tasks in which the training and coding methods changes.In addition, we present simultaneous and separate methods to train the artificial and spiking layers, considering the coding methods of each.We find that increasing the number of artificial layers at the expense of spiking layers improves the HNN performance.For straightforward datasets such as MNIST, it is easy to achieve the same performance as ANNs by using duplicate coding and separate learning.However, for more complex tasks, the use of Gaussian coding and simultaneous learning is found to improve the accuracy of HNNs while utilizing a smaller number of artificial layers.

Full PDF

CC OMBINING

SNN

AND

ANN

FOR E NHANCED I MAGE C LASSIFICATION

A P

REPRINT

Naoya Muramatsu

Graduate School of Library,Information and Media Studies,University of TsukubaKasuga 1-2, Tsukuba-shi, Ibaraki, Japan [email protected]

Hai-Tao Yu

Graduate School of Library,Information and Media Studies,University of TsukubaKasuga 1-2, Tsukuba-shi, Ibaraki, Japan [email protected]

March 2, 2021 A BSTRACT

With the continued innovations of deep neural networks, spiking neural networks (SNNs) thatmore closely resemble biological brain synapses have attracted attention owing to their low powerconsumption. However, for continuous data values, they must employ a coding process to convertthe values to spike trains. Thus, they have not yet exceeded the performance of artiﬁcial neuralnetworks (ANNs), which handle such values directly. To this end, we combine an ANN and an SNNto build versatile hybrid neural networks (HNNs) that improve the concerned performance. To qualifythis performance, MNIST and CIFAR-10 image datasets are used for various classiﬁcation tasks inwhich the training and coding methods changes. In addition, we present simultaneous and separatemethods to train the artiﬁcial and spiking layers, considering the coding methods of each. We ﬁndthat increasing the number of artiﬁcial layers at the expense of spiking layers improves the HNNperformance. For straightforward datasets such as MNIST, it is easy to achieve the same performanceas ANNs by using duplicate coding and separate learning. However, for more complex tasks, theuse of Gaussian coding and simultaneous learning is found to improve the HNN’s accuracy whileutilizing a smaller number of artiﬁcial layers.

Keywords spiking neural network · artiﬁcial neural network · machine learning Over the years, deep-learning methods have provided dramatic performance improvements for various tasks (e.g., imagerecognition [He et al., 2016, 2015], natural language processing [Hu et al., 2014, Young et al., 2018], and perfect-information gaming [Silver et al., 2016]). In fact, their performance has surpassed humans with certain tasks [Silveret al., 2016, He et al., 2015]. However, deep learning continues to face challenges in terms of energy efﬁciency. Forexample, in recent machine-learning models, graphical processing units (e.g., NVIDIA Tesla V100 and A100) havebeen used to solve single tasks while consuming

250 W . The human brain requires only

20 W [Drubach, 2000] for thesame task while also managing related and routine activities. For this reason, spiking neural networks (SNNs), whichrepresent information between neurons as spikes that more closely resemble the synaptic activity of biological neurons,have attracted attention.However, SNNs are inferior to artiﬁcial neural networks (ANNs) in terms of performance. There are several reasonsfor this. First, back-propagation cannot be directly applied to SNNs, because the spike generation mechanism is notdifferentiable. The main workaround is to use surrogate gradients. A few studies have shown performance improvementscomparable to ANNs for limited tasks [Shrestha and Orchard, 2018, Wu et al., 2018, Neftci et al., 2019]. In addition,we applied such a method to optimize weights using state-of-the-art approximated back-propagation [Fang et al., 2020]for SNNs. The second reason for the SNN inferiority is that most of the data handled by SNNs consist of continuous a r X i v : . [ c s . N E ] F e b ombining SNN and ANN for Enhanced Image Classiﬁcation A P

REPRINT values, such as those represented by ﬂoating point numbers. Furthermore, SNNs represent and process information viathe synaptic spikes. This type of input/output would be ideal for process information from sensors that output similarspike signals, such as event-based cameras [Orchard et al., 2015, Gallego et al., 2020]. Such a task would require anSNN-oriented visual image dataset, such as that of N-MNIST [Orchard et al., 2015], so that higher accuracy and lowercomputation could be attained [Deng et al., 2020]. In contrast, many datasets (e.g., MNIST and CIFAR-10) and, forthat matter, data handled in the real world often have normalized continuous values in (0 , . Therefore, for SNNs toprocess the most realistic data, coding is required to convert the continuous values into spike trains. Many methods fordoing this have been proposed, but their performance remains inferior to the ANNs, which process the original datadirectly [Deng et al., 2020].As a novel method to address this problem, we propose the conception of hybrid neural networks (HNNs) in whichan SNN are combined with an ANN. We use the ANN parts for the input and hidden layers and the SNN parts forthe middle and output layers. In this way, the network can directly receive continuous-valued data via the ANN andcompute them via the SNN, which is expected to realize a lower-energy and a higher-accuracy neural network. However,this structure of the network still requires the aforementioned coding process between the artiﬁcial layers (ALs) andthe spiking layers (SLs), owing to the continuous values treated in the ALs and the spike signals in the SLs. TheHNNs are evaluated using image classiﬁcation tasks to determine whether ALs can be trained simultaneously withSLs. Furthermore, the ratio of ALs to SLs on a network and the coding methods needed between the ALs and SLs areevaluated. The results show that using more ALs improves the classiﬁcation accuracy, and that the most straightforwardcoding method (i.e., duplicate coding) is effective in terms of network performance and computational cost.The following summarizes the major contributions of this work:1. Because SNNs deal with continuous-valued data, the proposed HNN combines an ANN and an SNN; theirALs and SLs are connected via coding methods and can be trained separately or simultaneously via back-propagation.2. We show that increasing the ratio of ALs in the network improves HNN performance, which can surpass thatof pure SNNs.3. We make a technical contribution by proposing the use of differentiable Gaussian coding. HNNs that use thismethod can achieve higher accuracy, even when the percentage of ALs is small. In this section, we explain the theory behind SNNs while focusing on their learning algorithms and their method ofinformational representation.

As shown in Figure 1(a), a biological neuron changes its membrane potential and internal voltage in the soma inresponse to an input signal from the synapse. When the membrane potential exceeds the threshold voltage, a spikesignal is generated from the soma and is transmitted to the next neuron through the axon and its terminals.As shown in Figure 1(b), in many SNN studies, synapses and neurons are treated as separate modules. Therefore,synapses receive spike signals from the pre-synaptic neurons and generate a postsynaptic potential (PSP). The PSP isweighted for each synapse and ﬂows into the postsynaptic neuron. Biologically, this is the input current to the neuron.The membrane potential of each neuron varies with the input current (i.e., weighted PSP) over time. When this valueexceeds the threshold voltage, an output spike is generated from the neuron. Immediately afterward, its membranepotential plummets to the resting voltage. Because each neuron repeats the above temporal behavior, information ﬂowsonly via the spike signals, ultimately realizing lower power consumption over time.

In this study, we use the leaky integrate-and-ﬁre (LIF) neuron model [Stein, 1965], which is the most basic and widelyused phenomenological neuron model [Schuman et al., 2017]. The synapses are based on the IIR equations [Fang et al.,2020]. Because the form in the discrete-time domain, SNNs can be interpreted as a network of IIR ﬁlters: V li [ t ] = λV li [ t −

1] + I li [ t ] − V th R li [ t ] (1a)2ombining SNN and ANN for Enhanced Image Classiﬁcation A P

REPRINT

SomaDendriteSynapse Axon Axon Terminal (a) 𝑥 ! Input spike trains

Synaptic Filter 𝑤 !" PSP

Neuron

Synapse 𝑥 Synaptic Filter 𝑤 PSP … Output spikes …… (b) Figure 1: (a) Biological-neuron and (b) spiking-neuron models as inﬁnite impulse response (IIR) ﬁlters. I li [ t ] = N l − (cid:88) j w li,j F lj [ t ] (1b) R li [ t ] = θR li [ t −

1] + O li [ t − (1c) F lj [ t ] = β lj, O l − j [ t ] (1d) O li [ t ] = U ( V li [ t ] − V th ) (1e) U ( x ) = (cid:26) if x < otherwise , (1f)where l and i denote the index of the layer and the neuron, respectively, j denotes the input index, t is the time step,and N l is the number of neurons in the l -th layer. V li [ t ] is the neuron membrane potential, and V th is the threshold.A neuron ﬁres when its membrane potential overcomes it. I li [ t ] is the weighted PSP input. R li [ t ] is the reset voltageused to decrease the membrane potential to the resting voltage after the neuron ﬁres. F lj [ t ] is the PSP. O li [ t ] is a spikefunction that describes the conditions of ﬁring, and U ( x ) is a Heaviside step function. P and Q denote the feedback andfeed-forward orders, respectively. λ, θ, α lj,p , and β lj,q are the coefﬁcients of the neuron ﬁlter, reset ﬁlter, and synapseﬁlter, respectively. By changing these coefﬁcients, this model can represent various types of neurons.3ombining SNN and ANN for Enhanced Image Classiﬁcation A P

REPRINT

Because this study focuses on image classiﬁcation tasks, we utilize cross-entropy loss E with the probability of eachneuron in the last layer ﬁring in a ﬁxed-length time window, T : E = − N L (cid:88) i y i ln p i (2a) p i = exp( (cid:80) Tt O Li [ t ]) (cid:80) N L j =1 exp( (cid:80) Tt O Lj [ t ]) , (2b)where y i is a one-hot vector representing the truth label, L is the number of layers of the SNN, and O Li [ t ] denotes theoutput of the last layer.Using the chain rule, the gradient can be represented as ∂E∂w l = T (cid:88) t =1 δ l [ t ] (cid:15) l [ t ]  F l [ t ] + t − (cid:88) i =1 F l [ i ] t − (cid:89) j = i κ l [ j ]  , (3)where (4a) δ l [ t ] = ∂E∂O li [ t ]= Q (cid:88) q =0 N l +1 (cid:88) j ∂E∂O l +1 j [ t + q ] ∂O l +1 j [ t + q ] ∂O li [ t ] + ∂E∂O li [ t + 1] ∂O li [ t + 1] ∂O li [ t ] (4b) κ li [ t ] = ∂V li [ t + 1] ∂V li [ t ]= λ − V th (cid:15) li [ t ] (4c) (cid:15) li [ t ] = ∂U ( V li [ t ] − V th ) ∂V li [ t ] . Here, U ( x ) is not differentiable, which is the main obstacle to SNN learning. Fang et al . assumed that when Gaussiannoise N (0 , σ ) was nipped in, LIF neurons could be approximated by Poisson neurons [Fang et al., 2020], such thatthe ﬁring rate becomes P ( v ) = 12 erfc( V th − v √ σ ) , (5)where erfc( · ) represents a complementary error function. Thus, the derivative of U ( x ) can be approximated to [Neftciet al., 2019] ∂U ( x ) ∂v ≈ ∂P ( v ) ∂v = e − ( Vth − v )22 σ √ πσ . (6) Although the form of the spiking signal varies per neuron type, SNNs are designed based on the assumption thatinformation is represented by spikes. Therefore, we must consider how to represent the concerned information in thisfashion. 4ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT

There are two types of information coding used for this purpose: rate and time. Rate coding expresses informationin terms of the frequency of the ﬁring of spikes over time. However, the time interval between spikes is meaningless.Conversely, time coding represents information based on the time interval of each spike.Although the time-coding representation is valid [Kumar et al., 2010, Rullen and Thorpe, 2001], we apply rate codingfor this research, because it is more commonly used in this ﬁeld to convert continuous-valued input data to event-drivendata prior to the ﬁrst layer in an SNN.

HNNs have a coding module and two types of neural layers. Owing to the format of the data handled by the AL, inwhich the data are continuous values, and the SL, in which the spike trains exist along the time dimension, the datashould be converted from continuous values into spike trains. There are two types of ALs that can be incorporated intoan SNN: those with ﬁxed weights and those that are trained simultaneously with SLs.In the following sections, we detail these two points.

There are two learning methods regarding ALs and SLs for HNNs: train them separately or simultaneously.In the separate method, the ANN is trained in advance. However, when building an HNN, the corresponding ALs areremoved and connected to the SLs while maintaining the AL weights. As shown in Figure 2(a), when training the HNN,the weights of the extracted ALs are ﬁxed, and only the weights of the SLs are updated. In this type of network, becausethe ALs do not need to learn with the SLs, non-differentiable coding functions can be used to convert the latent vectorsfrom the ALs into spikes. Notably, this method requires less time.In the simultaneous method, both ALs and SLs are combined from the beginning, and all layers are trained fromscratch via back-propagation (Figure 2(b)). Between the last AL and the ﬁrst SL, coding is required, and this should bedifferentiable to support back-propagation. Therefore, some coding methods are not available from the spatio-temporalback-propagation method (see Section 2.1.3).

Rate-based coding is utilized at the interface between ALs and SLs, as shown in Figures 2(a) and 2(b).Network performance in this study is evaluated using continuous-data image input direct to the ALs. Hence, the spikinglayers require an additional time dimension. We explore three methods to handle this: duplicate, Gaussian, and Poissoncoding. This section describes the details.

Expanding the output vector, a , along the time axis is the most straightforward way to perform rate-based coding. Theinput for the ﬁrst SL, which is also the coded output of the last AL, is O ALi [ t ] = a i . Instead of producing a spikingtrain, this continuous value train is the input for the spiking neurons in the ﬁrst SL. The duplicating method deterministically codes a continuous value to a continuous-valued train, whereas biologicalnetworks are stochastic, owing to uncertainty factors in nature (e.g., noise). This method stochastically converts thecontinuous values generated from the artiﬁcial layer output using a reparameterization trick [Kingma and Welling,2014] for connectivity to back-propagation.The last artiﬁcial layer outputs µ and ln σ instead of a vector, and the ﬁrst SL receives re-parameterized continuous-valued trains, O ALi [ t ] ∼ N ( µ, σ ) (Figure 2(c)). Via the reparameterization trick, the input of the ﬁrst SL is deﬁnedas O ALi [ t ] = µ i + σ i ε, (7)where ε is an auxiliary noise variable, ε ∼ N (0 , . 5ombining SNN and ANN for Enhanced Image Classiﬁcation A P

REPRINT

The parameterization trick makes the boundary between AL and SL differentiable. However, compared with othercoding methods, it requires twice the number of neurons in the last AL to adequately measure both performances. Thus,although this coding method can theoretically be used in any form of network, SLs cannot be connected with pretrainedALs in this way.

Poisson coding, based on the Poisson-process probability theory, is a widely used coding method [Diehl and Cook,2015, Querlioz et al., 2013, Bing, 2019] that generates a spike train from a continuous value according to the Poissondistribution, which stochastically converts a continuous value along the time dimension.With Poisson coding, the converting function outputs a sequence of spikes such that the time difference between themfollows a Poisson distribution, which is not differentiable, unlike the above two methods (Section 3.2.1 and 3.2.2). Thus,the Poisson coding cannot be used with the simultaneous learning method.

As described, the two types of AL (i.e., ﬁxed and trainable) and the three coding methods (i.e., duplicating, Gaussian,and Poisson) have limited combinations, owing to their traits. In this study, we investigate the possibility of HNNs thatuse the four methods described in the subsequent sections. For simplicity, we assume that all networks in this studyhave a feed-forward architecture.Figures 2(a)–(d) show all of the neural networks implemented in this paper.The main limiting factor in developing an HNN is determining whether or not the coding method is differentiable.When ﬁxed or pretrained ALs are incorporated into the HNN, all coding methods can be adopted. Conversely, when atrainable AL is used, only differentiable methods can be adopted.Moreover, for evaluation, the Gaussian coding implemented by the reparameterization trick cannot be combined withHNNs having pretrained ALs, because the reparameterization trick requires more neurons than are available in the basicnetwork architecture.Trainable ALs and SLs in a duplicate coding network are shown in Figure 2(a). Both the ALs and the SLs are trainedsimultaneously via back-propagation using the duplicating method for coding after the last AL.Fixed ALs and trainable SLs within a duplicating coding network (Figure 2(b)) optimize the weights only for SLsduring training. To facilitate the addition of the time dimension between the AL and the SL, the network merely copiesthe output from the last AL over a certain number of time steps.As shown in Figure 2(c), the trainable ALs and SLs within the Gaussian-coding network are trained for all layerstogether, and, even with stochastic coding, the training phase follows a Gaussian distribution. This network takes thelongest time to train, because, unlike networks having ﬁxed ALs, all layers should be trained, and the reparameterizationtrick used with Gaussian coding becomes more computationally complex than the straightforward duplication.With ﬁxed ALs and trainable SLs within a Poisson coding network (Figure 2(d)), Poisson coding is used to generatespike trains from the latent vector rendered from the pretrained ALs. Owing to the undifferentiable coding method, onlythe SLs are trained.

In this study, two datasets (i.e., MNIST[LeCun et al., 1998] and CIFAR-10[Krizhevsky et al., 2009]) were used tomeasure network performance. Each dataset was divided into three subsets for training, validation, and testing at a ratioof , respectively.Each network was trained with the training dataset for and epochs on MNIST and CIFAR-10, respectively. Atthe end of each epoch, network performance was evaluated using the validation dataset, and the network achieving thebest performance on the validation dataset was used to compare performance based on the testing dataset.6ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT 𝐿 !" 𝐿 𝐿 $" 𝐿 !% 𝐿 BP approx-BPapprox-BPSpiking LayersConventional Layers

INPUT OUTPUT

Duplicating (a) Trainable AL and SL network with duplicate coding 𝐿 !" 𝐿 𝐿 $" approx-BPapprox-BPSpiking Layers INPUT OUTPUT

Duplicating 𝐿 !% 𝐿 Pretrained Layers (b) Fixed AL and Trainable SL network with duplicate coding 𝐿 !" 𝐿 𝐿 $" 𝐿 !% 𝐿 BP approx-BPapprox-BPSpiking LayersConventional Layers

INPUT OUTPUT 𝜇𝜎 𝑂 !" 𝜀 (c) Trainable AL and SL network with Gaussian coding 𝐿 !" 𝐿 𝐿 $" approx-BPapprox-BPSpiking Layers INPUT OUTPUT 𝐿 !% 𝐿 Pretrained Layers P o i ss o n s p i k e s (d) Fixed AL and Trainable SL network with Poisson coding Figure 2: Hybrid Neural Networks.7ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT

There are two ways in which a continuous value train can be inserted into a spiking neuron: as a spike train through theaxon or directly.The ﬁrst was used in [Fang et al., 2020], utilizing back-propagation. The continuous-valued input is converted to a PSPthrough the axon and is treated as a spike train. The spiking neurons in the ﬁrst layer receives the PSP.In the second way, spiking neurons in the ﬁrst layer directly receives continuous input values instead of the PSP. This ismore biologically plausible, because the sensory-nerve endings in animals are sensory neurons rather than axons.Using MNIST with a multi-layer perceptron (MLP)(

S784-S500-S10 ), the axon-input network achieved .

56 % , andthe direct-input network achieved .

71 % .The results reveal that the axon-inputting method does not improve network performance. Therefore, we used thedirect-inputting method in the following experiments.

HNNs can be regulated with rate-based coding (e.g., duplication, Poisson, and Gaussian) and trainable layers, where theSLs are combined with ﬁxed or trainable ALs. Notably, the ﬁxed-AL and trainable-SL network with Gaussian coding isnot compared here, because it relies on the reparameterization trick and different architecture.In the network having a ﬁxed AL and a trainable SL, the ﬁxed ALs were generated from a pure ANN, which has thehighest accuracy when trained on the training dataset and evaluated on the validation dataset.In Tables 1(a), 2(b), 2(a), 1(b), 3(a), and 3(b), the network architectures are shown using a notation in which A n and S m indicate a layer having n artiﬁcial neurons and m spiking neurons, respectively. Additionally, A n C k or S n C k indicatean spiking convolutional AL with kernel k × k . Here, the coding method is implicitly inserted between A n -S m .A three-layer MLP having ALs, SLs, and a coding layer in the interface between the AL and SL was used to classifyimages on the MNIST dataset to consider the effect of combining ALs and SLs.As the baseline, a pure SNN using a dual exponential post-synaptic potential kernel with the S784-S500-S10 architec-ture was trained. Moreover, the last row in Table 1(a) shows the result of the pure ANN fully comprising ALs. Here,the respective results of the pure SNN and ANN with separate learning cycles are not described in all tables, becausethey do not need to train each layer separately.As shown in Table 1(a), the network having ﬁxed ALs and trainable SLs using duplicating coding achieved the highestscore. Notably, the

A784-A500-S10 architecture recorded higher performance than did the ANN, although bothnetworks shared weights in the ﬁrst and second layers.

To compare separate and simultaneous learning for classiﬁcation accuracy, we conducted comparison experiments usingthe above networks with duplicate coding, which can be used for both types of learning.Table 1(a) and 1(b) show the results of MLPs and convolutional neural networks (CNNs), each having the same numberof neurons. However, the ratio of ALs to SLs was different. In each table, the accuracy of the pure SNN and ANNwith ﬁxed ALs is not listed, because neither had pre-training layers. Notably, the results of the pure SNN and ANN aresimilar to those in Table 2, because they used the same architectures with duplicate coding.The results in both tables show that, generally, using pure networks achieves higher accuracy. Additionally, in HNNs,increasing the percentage of ALs improves performance, revealing a trade-off between computational accuracy andenergy efﬁciency. With rate-based coding, an artiﬁcial neuron can represent more accurate information. However, aspiking neuron can reduce power consumption. Furthermore, although the separately trained networks showed higheraccuracy than those with simultaneous learning, the performance of the more complex tasks (CIFAR-10) was reducedwhen the ratio of SLs in the trainable network was too small. Thus, for complex tasks in huge networks, training ALsand SLs at the same time gives better accuracy than using only ﬁxed-ALs in a pure SNN.

The results of MLPs and CNNs for MNIST and CIFAR-10, respectively, with Gaussian coding are shown in Tables 2(a)and 2(b). As mentioned, Gaussian coding can only be employed with trainable ALs, owing to the additional neuronsneeded for reparameterization. 8ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT

Traning methodSeparate Simultaneous

S784-S500-S10 - .

13 %

A784-S500-S10 .

22 % 97 .

78 %

A784-A500-S10 .

17 % 98 .

07 %

A784-A500-A10 - .

23 % (a)

Traning methodSeparate Simultaneous

S32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 - .

96 %

A32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 .

01 % 53 .

23 %

A32C3-A32C3-A64C3-P2-S64C3-P2-S512-S10 .

30 % 55 .

22 %

A32C3-A32C3-A64C3-P2-A64C3-P2-S512-S10 .

78 % 66 .

08 %

A32C3-A32C3-A64C3-P2-A64C3-P2-A512-A10 - .

09 % (b)

Table 1: (a) MLPs with duplicate coding for MNIST and (b) CNNs with duplicate coding forCIFAR-10Here, the

S784-S500-S10 architecture (Table 2(a)) and

S32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 network(Table 2(b)) coded the input data into spiking trains using the duplicating method, because it was not possible to useGaussian coding without varying the network architecture. The accuracies of both networks compared with the baselineare similar to those shown in Tables 1(a) and 1(b). Moreover, the

A784-A500-A10 architecture (Table 2(a)) and

A32C3-A32C3-A64C3-P2-A64C3-P2-A512-A10 network (Table 2(b)), listed for only comparison, do not have the codinglayer; these networks were directly inferred from the original data, and the results in both tables are the same as those inTables 1(a) and 1(b).From the results (Figures 1(b) and 2(a)), except for the

A32C3-A32C3-A64C3-P2-S64C3-P2-S512-S10 network,the trend of improving accuracy with an increasing AL ratio was maintained using duplicate coding (Section 4.3.1).Additionally, compared with the model with trainable ALs and SLs and duplicate coding (Tables 1(a) and 1(b)), theaccuracies of the models with Gaussian coding (Tables 2(a) and 1(b)) were higher for lower AL ratios.Network architecture Accuracy

S784-S500-S10 (baseline) .

13 %

A784-S500-S10 .

02 %

A784-A500-S10 .

03 %

A784-A500-A10 .

23 % (a)

Network architecture Accuracy

S32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 (baseline) .

96 %

A32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 .

35 %

A32C3-A32C3-A64C3-P2-S64C3-P2-S512-S10 .

30 %

A32C3-A32C3-A64C3-P2-A64C3-P2-S512-S10 .

82 %

A32C3-A32C3-A64C3-P2-A64C3-P2-A512-A10 .

09 % (b)

Table 2: (a) MLPs with Gaussian coding for MNIST and (b) CNNs with Gaussian coding forCIFAR-10

Poisson coding is a mathematical technique used to generate spike trains that follow a Poisson distribution fromcontinuous values. The Poisson process is usually not differentiable. Thus, trainable ALs and SLs cannot be used. Toapply Poisson coding, the continuous values should be normalized to (0 , . In this experiment, for simpliﬁcation, weused the sigmoid function. The output values are (0 , for the activation function instead of the rectiﬁed linear unit,which are utilized in the pretrained model with ALs. 9ombining SNN and ANN for Enhanced Image Classiﬁcation A P

REPRINT

As shown in Tables 3(a) and 3(b), using Poisson coding instead of duplicate or Gaussian coding dramatically de-creased the performance of every network architecture. Unlike previous results, the continuous-valued data wereconverted into spike trains with Poisson coding for pure SNNs, the

S784-S500-S10 architecture (Table 3(a)), and the

S32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 network (Table 3(b)). The last row of each table shows the result ofthe pretrained ANN, which is the basis of ALs in HNNs.From the results in Table 3(a), even such low accuracies follow this tendency: ALs lead to higher accuracy. In contrast,the results in Table 3(b) show the opposite, and it is observed that Poisson coding generates too much noise, leading toa decrease in accuracy rather than improving the generalization. Notably, these noisy latent vectors reduce the accuracyof image recognition. Network architecture Accuracy

S784-S500-S10 (baseline) .

09 %

A784-S500-S10 .

67 %

A784-A500-S10 .

93 %

A784-A500-A10 (pretrained) .

67 % (a)

Network architecture Accuracy

S32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 (baseline) .

83 %

A32C3-S32C3-S64C3-P2-S64C3-P2-S512-S10 .

28 %

A32C3-A32C3-A64C3-P2-S64C3-P2-S512-S10 .

94 %

A32C3-A32C3-A64C3-P2-A64C3-P2-S512-S10 .

32 %

A32C3-A32C3-A64C3-P2-A64C3-P2-A512-A10 (pretrained) .

43 % (b)

Table 3: (a) MLPs with Poisson coding for MNIST and (b) CNNs with Poisson coding forCIFAR-10

The results (Figures 1(a)–(b)) largely delineate a tendency of improving accuracy with the increase in the per-centage of ALs in the hybrid model. This was true, except for CNNs having two ALs and three SLs (i.e.,

A32C3-A32C3-A64C3-P2-S64C3-P2-S512-S10 ). This conﬁrms our expectation that ANNs are better suited tohandle continuous values.The highest scores of the HNNs that used Gaussian coding were not greater than those using duplicate coding. However,networks having fewer ALs (Tables 2(a) and 2(b)) performed better than networks with trainable ALs and SLs usingduplicate coding, especially for CIFAR-10. Therefore, although Gaussian coding did not show a clear advantage in theexperiments, high performance can be expected in more complex tasks in the future.In these experiments, we did not ﬁnd any advantage from using Poisson coding. In fact, when used with an HNN, itsaccuracy was greatly reduced. This is probably because there was too much information missing, owing to the noisegenerated during the coding process. In contrast, pure SNNs showed competitive accuracy with other coding methods,as demonstrated in previous studies [Deng et al., 2020]. Interestingly, for HNNs having pretrained ALs (Figure 1(b)),the CNNs having fewer ALs outperformed those with more. This suggests that the use of a spiking conventional layerimmediately after coding contributes to performance improvement, and this inclination is not seen in the MLP results.

In this study, we proposed hybrid neural networks that combined features of the conventional continuous-input ANNwith those of the more bio-plausible, event-driven SNN. The elements comprising the HNN, including the variousALs that were combined with the SNN features and respective coding methods, were evaluated on image classiﬁcationtasks using MNIST and CIFAR-10 datasets. Several HNNs exceeded the performance of pure SNNs and demonstratednotable effectiveness. Moreover, it was shown that the performance changes depended on the ratio of ALs and SLsand the coding method used to transform the continuous values to spiking trains for connecting the ALs to the SLs.Additionally, even when the most straightforward duplicate coding method was used, the performance was equal to orbetter than the other coding methods. 10ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT

We believe that our work helps further increase the potential of SNNs, which are more energy-efﬁcient than ANNs butare disadvantageous for handling continuous-valued data input.

References

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 770–778, 2016.Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving Deep into Rectiﬁers: Surpassing Human-LevelPerformance on ImageNet Classiﬁcation. arXiv:1502.01852 [cs] , February 2015.Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. Convolutional Neural Network Architectures for MatchingNatural Language Sentences. 27:2042–2050, 2014.Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. Recent Trends in Deep Learning Based NaturalLanguage Processing. arXiv:1708.02709 [cs] , November 2018.David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser,Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, NalKalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and DemisHassabis. Mastering the game of Go with deep neural networks and tree search.

Nature , 529(7587):484–489, January2016. ISSN 1476-4687. doi:10.1038/nature16961.Daniel Drubach.

The Brain Explained . Prentice Hall, 2000.Sumit Bam Shrestha and Garrick Orchard. SLAYER: Spike layer error reassignment in time. In S. Bengio, H. Wallach,H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,

Advances in Neural Information ProcessingSystems , volume 31, pages 1412–1421. Curran Associates, Inc., 2018.Yujie Wu, Lei Deng, Guoqi Li, Jun Zhu, and Luping Shi. Spatio-temporal backpropagation for training high-performancespiking neural networks.

Frontiers in Neuroscience , 12:331, 2018. ISSN 1662-453X. doi:10.3389/fnins.2018.00331.Emre O. Neftci, Hesham Mostafa, and Friedemann Zenke. Surrogate gradient learning in spiking neural networks.2019.Haowen Fang, Amar Shrestha, Ziyi Zhao, and Qinru Qiu. Exploiting neuron and synapse ﬁlter dynamics in spatialtemporal learning of deep spiking neural network. 2020.Garrick Orchard, Ajinkya Jayawant, Gregory K. Cohen, and Nitish Thakor. Converting static image datasetsto spiking neuromorphic datasets using saccades.

Frontiers in Neuroscience , 9:437, 2015. ISSN 1662-453X.doi:10.3389/fnins.2015.00437.Guillermo Gallego, Tobi Delbruck, Garrick Michael Orchard, Chiara Bartolozzi, Brian Taba, Andrea Censi, Ste-fan Leutenegger, Andrew Davison, Jorg Conradt, Kostas Daniilidis, and et al. Event-based vision: A sur-vey.

IEEE Transactions on Pattern Analysis and Machine Intelligence , pages 1–1, 2020. ISSN 1939-3539.doi:10.1109/tpami.2020.3008413.Lei Deng, Yujie Wu, Xing Hu, Ling Liang, Yufei Ding, Guoqi Li, Guangshe Zhao, Peng Li, and Yuan Xie. Rethinkingthe performance comparison between SNNS and ANNS.

Neural Networks , 121:294–307, January 2020. ISSN18792782. doi:10.1016/j.neunet.2019.09.005.R. B. Stein. A THEORETICAL ANALYSIS OF NEURONAL VARIABILITY.

Biophysical Journal , 5:173–194,March 1965. ISSN 0006-3495. doi:10.1016/s0006-3495(65)86709-1.Catherine D. Schuman, Thomas E. Potok, Robert M. Patton, J. Douglas Birdwell, Mark E. Dean, Garrett S. Rose, andJames S. Plank. A Survey of Neuromorphic Computing and Neural Networks in Hardware. arXiv:1705.06963 [cs] ,May 2017.Arvind Kumar, Stefan Rotter, and Ad Aertsen. Spiking activity propagation in neuronal networks: Reconciling differentperspectives on neural coding.

Nature Reviews Neuroscience , 11(9):615–627, September 2010. ISSN 1471-0048.doi:10.1038/nrn2886.Ruﬁn Van Rullen and Simon J. Thorpe. Rate Coding Versus Temporal Order Coding: What the Retinal GanglionCells Tell the Visual Cortex.

Neural Computation , 13(6):1255–1283, June 2001. ISSN 0899-7667, 1530-888X.doi:10.1162/08997660152002852.Diederik P Kingma and Max Welling. Auto-encoding variational bayes. 2014.Peter Diehl and Matthew Cook. Unsupervised learning of digit recognition using spike-timing-dependent plasticity.

Frontiers in Computational Neuroscience , 9:99, 2015. ISSN 1662-5188. doi:10.3389/fncom.2015.00099.11ombining SNN and ANN for Enhanced Image Classiﬁcation

A P

REPRINT

Damien Querlioz, Olivier Bichler, Philippe Dollfus, and Christian Gamrat. Immunity to Device Variations in a SpikingNeural Network With Memristive Nanodevices.

IEEE Transactions on Nanotechnology , 12(3):288–295, May 2013.doi:10.1109/TNANO.2013.2250995.Zhenshan Bing.

Biological-Inspired Hierarchical Control of a Snake-like Robot for Autonomous Locomotion . Disserta-tion, Technische Universität München, München, 2019.Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to documentrecognition.