[PDF] Semi-supervised learning combining backpropagation and STDP: STDP enhances learning by backpropagation with a small amount of labeled data in a spiking neural network

Abstract

A semi-supervised learning method for spiking neural networks is proposed. The proposed method consists of supervised learning by backpropagation and subsequent unsupervised learning by spike-timing-dependent plasticity (STDP), which is a biologically plausible learning rule. Numerical experiments show that the proposed method improves the accuracy without additional labeling when a small amount of labeled data is used. This feature has not been achieved by existing semi-supervised learning methods of discriminative models. It is possible to implement the proposed learning method for event-driven systems. Hence, it would be highly efficient in real-time problems if it were implemented on neuromorphic hardware. The results suggest that STDP plays an important role other than self-organization when applied after supervised learning, which differs from the previous method of using STDP as pre-training interpreted as self-organization.

Full PDF

aa r X i v : . [ c s . N E ] F e b Journal of the Physical Society of Japan

FULL PAPERS

STDP enhances learning by backpropagation in a spiking neural network

Kotaro Furuya and Jun Ohkubo , School of Computing, Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo 152-8552, Japan Graduate School of Science and Engineering, Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama-shi338-8570, Japan JST, PREST, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan

A semi-supervised learning method for spiking neural networks is proposed. The proposed method consists of super-vised learning by backpropagation and subsequent unsupervised learning by spike-timing-dependent plasticity (STDP),which is a biologically plausible learning rule. Numerical experiments show that the proposed method improves theaccuracy without additional labeling when a small amount of labeled data is used. This feature has not been achievedby existing semi-supervised learning methods of discriminative models. It is possible to implement the proposed learn-ing method for event-driven systems. Hence, it would be highly e ﬃ cient in real-time problems if it were implementedon neuromorphic hardware. The results suggest that STDP plays an important role other than self-organization whenapplied after supervised learning, which di ﬀ ers from the previous method of using STDP as pre-training interpreted asself-organization.

1. Introduction

The brain is a large network of neurons. The neurons areconnected by synapses, and information is transmitted byspikes. A spike is a short pulse signal, and its waveform doesnot carry any information. Rather, the number and timing ofspikes are thought to transmit information. In recent years,artiﬁcial neural networks (ANNs) inspired by this structure ofthe brain have achieved great success when applied to variousproblems of machine learning using techniques such as deeplearning. However, the transmission of information in ANNsis analog-valued, such as outputs of the sigmoid function, andANNs do not transmit spikes like real neurons in the brain.Furthermore, ANNs do not have the dynamics that the mem-brane potential rises in time series until it exceeds the thresh-old to ﬁre. Therefore, they are very di ﬀ erent from real brains.As an alternative, spiking neural networks (SNNs), which of-fer high energy e ﬃ ciency and performance in real-time prob-lems when used with neuromorphic hardware, have beenproposed. SNNs use models that focus on action potentials,such as the integrate-and-ﬁre model, to transmit informationby increasing the membrane potential and ﬁring in response toeach input at a speciﬁc time. It is known that the output spiketrains are sparse in time and each spike has high informationcontent. Thus, SNNs are more biologically plausible, capa-ble of detailed behaviors, and energy e ﬃ cient. In recent years,research that combines ﬁelds such as high-energy physicswith machine learning, especially deep learning in ANNs, hasbeen actively conducted. Even in ﬁelds of physics in whichANNs have shown promising results, SNNs are expected toenable the application of advanced and accurate algorithmsowing to their high e ﬃ ciency in real-time processing. Despite theoretically demonstrating that the computationalpower of SNNs is at least equal to that of ANNs, theirperformance has been signiﬁcantly inferior to that of ANNsin various machine learning problems. The reason for thispoor performance is that there has been no suitable learningmethod for SNNs. The spike is a discontinuous process and isnondi ﬀ erentiable. Therefore, we cannot use straightforwardly error backpropagation (BP), a very powerful supervised learn-ing algorithm that is widely used in ANNs. However, in recentyears, many approximate methods of BP have been proposedfor SNNs, and they are beginning to show performance closeto that of ANNs. For example, Bohte et al. treated the ﬁr-ing time as a nonlinear function based on the membrane po-tential, Ledinauskas et al. tuned a surrogate gradient func-tion,

Mostafa used a transformation of time variables, and Lee et al. treated nondi ﬀ erentiable points as noise. Ineach of these studies, SNNs were trained by BP, and high per-formance was achieved.In addition, because neurons of SNNs ﬁre in time seriesand transmit information using spikes, they can be trained byspike-timing-dependent plasticity (STDP), which is a biolog-ically plausible unsupervised learning mechanism. STDP is aphenomenon discovered by observing biological synapses invivo that manipulates the strength of synaptic connections be-tween connected neurons depending on pre- and post-synapticspike timings.

Several studies have reported that unsuper-vised learning, such as image classiﬁcation, can be performedby applying STDP to SNNs.

14, 15)

STDP is an unsupervisedlearning rule, but it can be used for supervised learning withvarious mechanisms.

16, 17)

In supervised learning, such as learning by BP, a largeamount of labeled data is required to obtain high performance.However, it is not easy to obtain labeled data because man-ual labeling is expensive. In contrast, in unsupervised learn-ing such as learning by STDP, labels are not required. Unla-beled data are easy to obtain. Hence, in the ﬁeld of machinelearning, semi-supervised learning that uses both labeled andunlabeled data has been developed to reduce the cost of datalabeling.

In semi-supervised learning, a small amount oflabeled data and a large amount of unlabeled data are gener-ally used for learning. Although self-training is a method ofdiscriminative models in semi-supervised learning, the accu-racy is improved by labeling unlabeled data and increasing theamount of labeled data. In such a methodology, if the labelingis incorrect, the learning will fail.

1. Phys. Soc. Jpn.

FULL PAPERS

In ANNs, layer-wise unsupervised pre-training followed bysupervised ﬁne-tuning can improve the accuracy, and this ap-proach has achieved high performance.

Such a pre-trainingmethodology was introduced by Hinton et al.

In that study,the deep brief network was trained by greedy layer-wise train-ing. In the spiking domain, some methods have been proposedin which STDP is applied as unsupervised pre-training, fol-lowed by ﬁne-tuning using BP.

22, 23)

Lee et al. succeeded inimproving the robustness and speeding up the learning pro-cedure using pre-training by STDP.

Dorogyy and Kolis-nichenko reported that the accuracy is improved by pre-training with STDP when a small amount of labeled data isused.

The ﬂow of supervised learning after STDP-basedpre-training is thought to model the situation in which liv-ing things observe the world around them and are then taughtabout a certain thing such as a name by their teachers. In con-trast, we can also consider the situation in which after beingtaught a certain thing, they observe it many times to improvetheir understanding. In other words, the sequence of STDP-based unsupervised learning following supervised learningalso models real-world situations, and such research has notbeen previously reported.In this paper, we propose a semi-supervised learningmethod consisting of STDP-based unsupervised learning af-ter BP-based supervised learning. Numerical experiments tosolve the task of handwritten digit recognition were per-formed to evaluate the capability of the proposed method.The results show that the proposed method improves accuracywhen a small amount of labeled data is available and solvesthe problem of self-training because there is no need to labelunlabeled data.The remainder of this paper is organized as follows. In Sect.2, we present the prior knowledge necessary to explain andprovide the context for the proposed method. Section 3 de-scribes our proposed method, Sect. 4 presents the numericalresults, Sect. 5 discusses the results, and Sect. 6 summarizesthe conclusions of this study.

2. Prior Knowledge

In this study, we use fully connected feed-forward SNNs,in which each neuron is connected to all neurons in the nextlayer and information transfers from the input layer to the out-put layer in one direction. In addition, we assume that eachlayer has a winner-take-all (WTA) circuit, which is describedin Sect. 2.1.2.

Let N be the number of neurons and M bethe number of synapses in a layer of the SNN. The numberof active neurons that output spikes is denoted by n , and thenumber of active synapses that receive input spikes is denotedby m . In addition, a variable x in the l -th layer is expressed as x ( l ) . The leaky integrate-and-ﬁre (LIF) neuron, one of the best-known spiking neural models, integrates the input spikesweighted by the strength of synaptic connections and changesits membrane potential accordingly. This model approximatesthe biological phenomenon in which the membrane potentialrises with time integration of the input, and when the mem-brane potential is above a critical voltage, an action potentialis triggered and the membrane potential is reset. Because the

Fig. 1.

Conceptual diagram of a winner-take-all circuit. state of the LIF model is updated only when an input is re-ceived, the update of the membrane potential for a given inputspike is expressed as follows: V mp ( t p ) = V mp ( t p − ) e tp − − tp τ mp + w i w dyn , (1)where V mp is the membrane potential, τ mp is the membranetime constant, t p and t p − are the p and p − w i is theweight of the i -th synapse that the input spike passes through. w dyn is a variable that controls the refractory period and isexpressed as follows: w dyn =  (cid:16) ∆ t T ref (cid:17) if ∆ t < T ref , , where T ref is the maximum refractory period. We deﬁne ∆ t = t out − t p , where t out is the most recent ﬁring time of the neuron.The refractory period is a period during which a neuron doesnot respond to the stimulus (input) immediately after spiking.When the membrane potential V mp exceeds the threshold V th ,the LIF neuron generates a spike and the membrane potentialdecreases sharply: V mp ( t + p ) = V mp ( t p ) − V th , (2)where t + p is the time immediately after spiking. A WTA circuit is a principle generally used in recurrentneural networks that inhibits the ﬁring of other neurons in thesame layer when one neuron ﬁres.

A conceptual diagram ofa WTA circuit is shown in Fig. 1. Biologically, such circuitsplay a role in cortical processing models such as a hierarchi-cal model of vision in the cortex.

In SNNs, it is used toimprove accuracy, stability, and learning speed as well as inunsupervised learning with STDP.

Organisms perceive changes in themselves and their envi-ronment by stimulus, process the stimuli, and communicatewith others. However, they can only respond to stimuli in aspeciﬁc zone. This limited area is called the receptive ﬁeld. The receptive ﬁeld of a visual neuron refers to the area of theretina where cells can react to light. There are various typesof receptive ﬁelds in the visual system, for example, on-centerand o ﬀ -center receptive ﬁelds. An on-center receptive ﬁeldshows an excitatory response when stimulated at the centerof the receptive ﬁeld and an inhibitory response when stimu-lated at the peripheral part. In contrast, the o ﬀ -center receptiveﬁeld shows an inhibitory response when stimulated centrallyand an excitatory response when stimulated peripherally.

2. Phys. Soc. Jpn.

FULL PAPERS

The elements of the response matrix R of a receptive ﬁeldcan be expressed as the Frobenius inner product of the sub-matrix of the input stimulus matrix S and the receptive ﬁeldstructure matrix F , and are formulated as follows: R i j = K − X k = K − X k = S ( i + k )( j + k ) F k k , (3)where S ∈ R H × W , F ∈ R K × K , and R ∈ R ( H − K + × ( W − K + . H and W are the height and width of the input stimulus matrix S ,respectively, and K is the size of the receptive ﬁeld structurematrix F . The spikes that carry information in SNNs are not di ﬀ er-entiable because of their discontinuity. Therefore, to calcu-late the derivative required for the BP algorithm, an approx-imation or other method must be used. In this section, weintroduce an approximation method that enables the deriva-tion of di ﬀ erentiable transfer functions and their derivativesin SNNs. First, we deﬁne two variables: x k ( t ), which is the accumu-lated e ﬀ ect of the k -th active input synapse onto the membranepotential of a target neuron, and a i ( t ), which is the generationof spikes in neuron i acting on its own membrane potential.These two variables are deﬁned as sums of terms with expo-nential decays: x k ( t ) = X p exp t p − t τ mp ! , (4) a i ( t ) = X q exp t q − t τ mp ! . (5)Note that these two summations have di ﬀ erent meanings: theﬁrst sum is over all input spike times t p < t at the k -th inputsynapse, and the second sum is over the output spike times t q < t for a i . Using these deﬁnitions, in the method by Leeet al., ignoring the e ﬀ ect of refractory periods, the mem-brane potential of the i -th LIF neuron at time t is expressedas follows because of the properties of LIF neurons and WTAcircuits: V mp , i ( t ) = m X k = w ik x k ( t ) − V th , i a i ( t ) + σ V th , i n X j = , j , i κ i j a j ( t ) , (6)where w ik is the weight of the synapse between the k -th neu-ron in the previous layer and the i -th neuron in the currentlayer. In Eq. (6), the ﬁrst term represents inputs, the secondterm represents membrane potential resets, and the third termrepresents lateral inhibitions by the WTA circuit. κ i j is thestrength of the lateral inhibition parameter by a WTA mecha-nism from neuron j to neuron i and in [ − , σ is a parameterthat controls the e ﬀ ect of lateral inhibition. From Eq. (6), if alllayers have the same time constant τ mp , the output ( a i ) fromthe current layer becomes the input ( x i ) of the next layer. Thisis the basis for deriving an error BP algorithm via the chainrule.Because there are discontinuous jumps, Eqs. (4), (5), and(6) are not di ﬀ erentiable at these points. However, in themethod proposed by Lee et al., these nondi ﬀ erentiablepoints are considered as noise, and these equations are treated as di ﬀ erentiable continuous signals. Owing to this approxi-mation, the calculation of gradients includes errors, but theirresults show that the inﬂuence of these errors is very small.The transfer function of the LIF neuron in WTA circuits isapproximated as follows by setting the residual V mp term tozero: a i ≈ s i V th , i + σ n X j = , j , i κ i j a j , (7)where s i = m X k = w ik x k . We derive the following equations by directly di ﬀ erentiatingEq. (7): ∂ a i ∂ s i ≈ V th , i , ∂ a i ∂ w ik ≈ ∂ a i ∂ s i x k ,∂ a i ∂ V th , i ≈ ∂ a i ∂ s i ( − a i + σ n X j , j , i κ i j a j ) , (8) ∂ a i ∂κ ih ≈ ∂ a i ∂ s i ( σ V th , i a h ) ,  ∂ a i ∂ x k ... ∂ a n ∂ x k  ≈ σ  q · · · − κ n ... . . . ... − κ n · · · q  −  w k V th , ... w nk V th , n  , (9)where q = σ . Here, assuming that the strengths of lateral in-hibitions are all the same as µ ( κ i j = µ, ∀ i , j ), we can simplifyEq. (9) to ∂ a i ∂ x k ≈ ∂ a i ∂ s i − µσ  w ik − µσ V th , i + µσ ( n − n X j = w jk V th , j  . (10)By substituting Eqs. (8) and (10) into the ordinary BP algo-rithm, it is possible to perform the BP algorithm in SNNs.Further techniques such as initialization, normalization, andregularization have been performed in prior research to im-prove accuracy and e ﬃ ciency. To deal with chaotic convergence behavior and facilitatestable training convergence in SNNs, appropriate network ini-tialization and optimization tools are important.

In this sec-tion, we describe the weight initialization and BP error nor-malization used in the error BP method introduced in Sect.2.3.

The thresholds of neurons and the weights of synaptic con-nections in the l -th layer are initialized as follows: w ( l ) ∼ U h − p / M ( l ) , p / M ( l ) i , V ( l )th = α p / M ( l ) , (11)where α > U [ − a , a ] denotes a random numbergenerated from a normal distribution in ( − a , a ). The weightinitialized by Eq. (11) satisﬁes the following condition: E  M ( l ) X i ( w ( l ) ji )  = E h ( w ( l ) ji ) i = M ( l ) . (12)This condition is used in BP error normalization.The main purpose of BP error normalization is to adjust the

3. Phys. Soc. Jpn.

FULL PAPERS update the magnitudes of the weights and thresholds. In the l -th layer, the error backpropagating through the i -th neuron isdeﬁned as follows: δ ( l ) i = g ( l ) i g ( l ) r M ( l + m ( l + m ( l + X j w l + ji δ l + j , (13)where g ( l ) i = / V ( l )th , i , g ( l ) = E qh ( g ( l ) i ) i ≃ q n ( l ) P n ( l ) i ( g ( l ) i ) .From Eq. (12), the expected value of the squared sum of er-rors is constant for all layers and thus the updating of theweights and thresholds can be balanced. Therefore, the weightand threshold are updated as ∆ w ( l ) i j = − η w r M ( l + m ( l ) δ ( l ) i x ( l ) j , ∆ V th , i = − η th r M ( l + m ( l ) M ( l + δ ( l ) i ˆ a ( l ) i , (14)where η w and η th are the learning rates of the weight andthreshold, respectively. Let ˆ a i = γ a i − σ P nj , i κ i j a j and γ be aparameter. By performing the above normalization, in the ini-tial stage of learning, the magnitude of the updates of weightsand thresholds is determined according to the expected valueof each active synapse, regardless of the number of activesynapses and neurons. Therefore, the update of all layers inSNNs can be balanced. Threshold regularization is applied to improve the ﬁringbalance of neurons in SNNs. This regularization has the ef-fect of suppressing the generation of dead neurons and canimprove the accuracy. In particular, when the network hasWTA mechanisms, lateral inhibition occurs at each layer, andthus threshold regularization is important. The details of thethreshold regularization are described in this section.

When N w neurons in a layer ﬁre after receiving an input,the thresholds of the ﬁring neurons are increased by ρ N , andthe thresholds of all neurons in that layer are reduced by ρ N w .This process makes highly active neurons less sensitive to in-puts because their thresholds increase, while less active neu-rons become more sensitive to inputs because their thresholdsdecrease. Therefore, the ﬁring of neurons can be balanced,and the accuracy can be improved. STDP is a learning rule discovered by observing biolog-ical synapses that changes the strength of synaptic connec-tions according to the temporal correlations of spikes betweenthe connected neurons.

The outline of STDP learning isdescribed in this section.

Although there are symmetricand asymmetric STDP rules for changing the strength of thesynapse, we consider only an asymmetric one in this paper.Here, we assume that neurons A and B are connected bya synapse and a spike signal is transmitted from neuron A toneuron B. If neuron A ﬁres and then neuron B ﬁres withina certain time window, long-term potentiation (LTP) is trig-gered, and thus the synaptic connection between them isstrengthened. If the order is reversed, long-term depression(LTD) is triggered, and thus the synaptic connection betweenthem is decreased. We deﬁne the ﬁring time of neuron A as t pre and the ﬁring time of neuron B as t post . Then, the timingdi ﬀ erence between the pre- and post-synaptic spikes can beexpressed as ∆ s = t pre − t post . The amount of synaptic modiﬁ- !" Fig. 2.

The ﬂow of self-training. cation ∆ w can be written as ∆ w =  A + exp (cid:18) ∆ s τ plus (cid:19) ( ∆ s < , A − exp (cid:16) − ∆ s τ minus (cid:17) ( ∆ s ≥ , (15)where A + and A − are positive and negative constants, respec-tively, which determine the maximum amount of synapticchange, and τ plus and τ minus are time constants. The weightchange is described with the STDP learning rate σ stdp : w new = w old + σ stdp ∆ w . (16)We can train SNNs by applying the above unsupervised learn-ing rule with WTA mechanisms. In other words, it is possibleto use the temporal correlations of neuronal ﬁring for learn-ing.

3. Proposed Semi-supervised Learning Method

In SNNs, some methods of using unsupervised learningby STDP as pre-training have been proposed,

22, 23) and theperformance has been improved the same as pre-training inANNs. Such a learning ﬂow can correspond to the situationin which organisms observe the world in advance and arethen taught speciﬁc labels. In contrast, we can assume a situ-ation in which the labels are taught in advance and then sub-sequent observation deepens understanding. Moreover, in thesemi-supervised learning method based on the discriminativemodels, the accuracy is generally improved by labeling un-labeled data and re-learning. This methodology has a prob-lem in that learning does not work properly if incorrect label-ing is applied.

The proposed method is a semi-supervisedlearning method in which unsupervised learning is performedby STDP after supervised learning by BP. Because the pro-posed method improves accuracy by STDP, we do not needto label additional data. Also, labeling and re-learning are notrepeated, and learning is completed in only two steps: learn-ing by BP and learning by STDP. Figures 2 and 3 show thelearning ﬂows of self-training and the proposed method. Inthis section, we describe the learning scheme of the proposedmethod and the network architecture used in the numericalexperiments.

In STDP, the temporal correlations of ﬁring are used forlearning. Thus, after learning to some extent by BP, learn-ing by STDP can improve the results of learning using la-

4. Phys. Soc. Jpn.

FULL PAPERS !" !" ! $ Fig. 3.

The ﬂow of the proposed semi-supervised learning method. beled data with learning using unlabeled data. This can modelorganisms observing their surroundings many times to im-prove their understanding after being taught something. In ourmethod, we ﬁrst train the SNN by supervised learning by BP,and then unsupervised learning by STDP is applied.The method of Lee et al. described in Sect. 2.3 was usedas the BP method. In addition, weight and threshold initial-ization and BP error normalization were performed accordingto Lee et al.

Because the STDP-based unsupervised initial-ization scheme has an equivalent e ﬀ ect on learning as clas-sic regularization techniques such as early stopping, L1 / L2weight decay, and dropout, we can expect a similar e ﬀ ectof STDP-based post learning. Thus, weight regularization wasnot performed in our method to clarify the results.As STDP learning rules, we use the following modiﬁcationof the learning rules described in Sect. 2.6: ∆ w =  A + exp (cid:18) ∆ s τ plus (cid:19) ( ∆ s ≤ − , A − exp (cid:16) − ∆ s τ minus (cid:17) ( ∆ s ≥ . (17)Considering the synaptic conduction velocity, no update oc-curs when the time interval is less than 1 ms. By making theabove modiﬁcations, we can expect that the symmetry of up-dates will improve the accuracy because the weights are lessbiased in either the positive and negative directions and theﬁrings of neurons are balanced. In particular, when simulat-ing in discrete time, as in our case, our numerical experimentsshowed that when | ∆ s | is less than 1 ms, only positive updatesoccur, and thus the weights are biased and the accuracy doesnot improve compared to the case where the updates are sym-metric. We use a weight update formula similar to Eq. (16).The time window is [1 ms, ..., 20 ms] in both positive andnegative directions. Based on the result of biological experi-ments, the modiﬁcation is assumed only when the intervalis equal to 20 ms or less in both directions. Threshold reg-ularization is applied during forward propagation in BP andSTDP learning. Threshold regularization is almost the sameas that used in Lee et al., but the threshold is reduced onlyfor the neurons that do not ﬁre. We use a fully connected feed-forward SNN with one hid-den layer in the numerical experiments. The input layer has784 neurons because we input the image of 28 pixels ×

28 pixels in numerical experiments. The hidden layer has300 neurons, and the output layer has 10 neurons in whicheach neuron corresponds to a correct label from 0 to 9 be-cause we evaluated the result with the classiﬁcation of 0 to

Fig. 4.

Network architecture.

Table I.

Values of parameters used in numerical experiments.

Used in Parameter Value

Network τ mp µ (0) − . µ (1) − . σ T ref α η w η th . η w γ ρ τ plus τ minus + − − . σ stdp Table II.

Computing environment.OS Ubuntu 18.04.3 LTSCPU Intel Core i7-9700K 3.6 GHzMemory 64 GBLanguage Python 3.7 × − . , − . , . , . ,

1] based on the Manhattan dis-tance to the center of the ﬁeld. The ﬁring frequency of eachneuron in the input layer is obtained by scaling the normalizedpixel value of the convolved image with the maximum ﬁringfrequency. We set the maximum frequency to 150 / s. The hid-den and output layers have a WTA mechanism. An outline ofthe network is shown in Fig. 4.

4. Experimental Procedures and Results

In this section, we present the numerical results obtained bytraining an SNN using the proposed method. We evaluated theperformance by solving the classiﬁcation problem of hand-

5. Phys. Soc. Jpn.

FULL PAPERSTable III.

The accuracy and learning time of the proposed method and self-training.Method Best accuracy (%) Learning time to bestaccuracy (s) Accuracy after overalllearning (%) Total learningtime (s)Proposed method

Self-training 63.2 18054.187 59.5 33781.616 written numeric characters from 0 to 9. We used the MNISTdataset, which is a grayscale handwritten character image setof 28 pixels ×

28 pixels.

In this study, we simulated in dis-crete time and the timestep was 1 ms. The input time was 50ms for learning and 150 ms for testing. For BP, we used 10 or30 labeled training data for each number, 100 or 300 in total.For STDP, we used 500 unlabeled training data for each num-ber, 5000 in total. For the test data, 10 sets of 10 samples pernumber, 1000 in total, were used, and the mean of the accu-racy for the test datasets was used for evaluation. The trainingdata for BP, training data for STDP, and the test data were alldi ﬀ erent. In the proposed method, learning by BP and STDPis performed as a whole for a total of 200 epochs. First, wetrain the network by BP for 150 epochs, and then learningby STDP is performed for 50 epochs. The batch size of BPwas 25. Table I shows the value of each parameter used in theexperiments. The STDP parameters A + and A − were set tothe same as those in the previous research. Although a gridsearch was performed, we ﬁnally found that these parametersin the previous research are enough to stabilize the learningprocess. The learning rate of STDP was set small enough forlearning to be stable in the example here. The refractory pe-riod is not considered because we set T ref = We evaluate the test data for eachepoch and compare the result of learning by only BP withthat of learning by the proposed method. In addition, the re-sults of learning by self-training, which is the existing semi-supervised method, are also compared with those of learningby the proposed method. The mean for 10 sets of test data ineach epoch was used as the accuracy, and a standard devia-tion of 1 σ was used for conﬁdence intervals. The computingenvironment used in experiments is shown in Table II. ×

10 samples

Figure 5 shows the results when 10 training samples × Fig. 5. (Color online) Plot of the accuracy against epoch (10 ×

10 sampleslabeled data).

Fig. 6. (Color online) Number of ﬁrings before and after STDP learning.The blue bar denotes the number of ﬁrings before STDP learning and the redbar denotes the number of ﬁrings after STDP learning. data. Figure 7 shows the result of learning by self-training.We can see that the accuracy does not improve much evenif the amount of labeled data increases because of mislabel-ing. Table III shows the accuracy and the learning time forthe proposed method and self-training. The accuracy of theproposed method is higher than that of self-training, and thelearning time of the proposed method is shorter than that ofself-training. ×

10 samples

Figure 8 shows the results when 30 training samples × ×

10 samples,it can be seen that the combination of BP and STDP decreasesthe accuracy.Next, we compared the results with those of learning byself-training. Figure 9 shows the result of learning by self-training. Because there is a lot of labeled training data, misla-beling does not occur at such a high rate and the accuracy isimproved.

6. Phys. Soc. Jpn.

FULL PAPERSFig. 7. (Color online) Plot of the accuracy against epoch of self-training(10 ×

10 samples labeled data). The blue dotted lines show the labeling steps.

Fig. 8. (Color online) Plot of the accuracy against epoch (30 ×

10 sampleslabeled data).

Fig. 9. (Color online) Plot of the accuracy against epoch of self-training(30 ×

10 samples labeled data). The blue dotted lines show the labeling steps.

5. Discussion

The results of simulations using our proposed semi-supervised learning method demonstrate that using it can im-prove the accuracy of SNNs when there is a small amountof labeled data. Moreover, in this study, we did not needto label unlabeled data. Thus, the problem that learningis not performed well due to mislabeling in the existingsemi-supervised learning method for discriminative modelsis solved. This method can be applied e ﬀ ectively when thereare few labeled training data points and the accuracy is low.In addition, the learning time can be reduced because it does I m p r o v e m e n t r a t e A cc u r a c y ( % ) Fig. 10. (Color online) Plot of the improvement rate by applying STDP andthe accuracy after applying BP and STDP against the amount of labeled data.Each white circle indicates the mean improvement rate of 10 sets of test data,and a standard deviation of 1 σ is used for the error bar. The improvementrate is deﬁned as the ratio of the accuracy after applying STDP to that beforeit. The blue and orange lines correspond to the accuracy at the end of the BPstage and that after the STDP learning stage, respectively. not need to re-learn like self-training and consists of onlytwo steps: BP learning and STDP learning. Because the num-ber of epochs to reach the best accuracy is lower than thatin self-training, unlabeled training data can be used for e ﬃ -cient learning. On the other hand, when there is a relativelylarge amount of labeled data, the accuracy decreases. Hence,this method has the property that when the amount of labeledtraining data is extremely small, the performance can be im-proved, but when the amount of labeled training data is rel-atively large, the accuracy can decrease. Such a feature hasbeen reported in an existing semi-supervised learning methodas well, and this supports the fact that semi-supervisedlearning is being performed in this work.Pre-training by unsupervised learning has been interpretedas a type of self-organization, and STDP is considered tobe involved in self-organization in living organisms. There-fore, the method of pre-training by STDP followed by ﬁne-tuning with BP

22, 23) can be interpreted as the sequence oflearning the structure of input data by self-organization andthen performing supervised learning by BP. In contrast, fromour results, it can be inferred that STDP plays an importantrole in learning other than self-organization, such as ﬁne-tuning or conﬁrmation, and thus the result is interesting froman engineering perspective as well as a biological one.Although we have discussed the proposed method from abiological point of view, there are various arguments aboutthe biological plausibility of BP.

In this research, to focuson the e ﬀ ect of the application of STDP after BP, we usedthe BP method of Lee et al., which has high performancein SNNs but does not seem to be biologically plausible. Usingthe supervised STDP

16, 17) would allow us to devise a more bi-ologically plausible learning scheme. In addition, we can de-velop an end-to-end STDP-based learning method for SNNsby combining STDP-based pre-training

22, 23) with these meth-ods.From an engineering point of view, it is important to judgewhether the proposed method should be applied or not be-cause it can induce the improvement but the deterioration insome cases. Figure 10 shows the improvement rate before andafter applying STDP against the amount of labeled data. Fig-

7. Phys. Soc. Jpn.

FULL PAPERSFig. 11. (Color online) Plot of the accuracy for the cases in which themean improvement rates are greater than or equal to 1. The blue dottedline indicates the epoch to start learning by STDP.

Fig. 12. (Color online) Plot of the accuracy for the cases in which themean improvement rates are less than 1. The blue dotted line indicatesthe epoch to start learning by STDP. ure 11 shows the change in accuracy for the cases in which theimprovement rates are greater than or equal to 1, and Fig. 12shows the change in accuracy for the cases in which the im-provement rates are less than 1. From these results, the degra-dation of accuracy occurs only when there is a relatively largeamount of labeled data. Furthermore, it is easy to detect thedegradation after the application by STDP. Therefore, for en-gineering purposes, we should monitor the change in accuracyfor a few epochs after applying STDP, and if a deterioration isobserved, we can terminate the training like early stopping.

Although we simulated in discrete time in this study, all thealgorithms and architectures used in our work can be event-driven. Thus, the proposed method can be applied in an event-driven manner. We expect that it would be highly e ﬃ cient inreal-world problems if implemented in neuromorphic hard-ware.

6. Conclusions

We proposed a semi-supervised learning method with un-supervised learning by STDP after supervised learning byBP. This method reverses the order of the learning steps ina previously reported semi-supervised learning method.

22, 23)

We developed the proposed method based on its parallel tothe biological situation of an organism learning a label andsubsequently observing its environment to conﬁrm this learn-ing. The results of numerical simulations show that our pro-posed method displays good accuracy, particularly when onlya small amount of labeled data is used. However, when arelatively large amount of labeled data is used, the accuracydecreases. Our results also show that STDP plays an impor-tant role in learning other than self-organization, such as ﬁne-tuning or conﬁrmation.This research was devised from an intuitive sense of or-ganisms and the learning rules that have been discoveredin them, and mathematical veriﬁcation is required to under-stand why STDP improves accuracy. Moreover, now thereare three types of STDP-based learning methods; STDP-based pre-training,

22, 23) supervised STDP learning,

16, 17) andour method. Thus, by combining these methods, we can de-velop an end-to-end STDP-based learning method for SNNs.Such veriﬁcation and expansion can lead to an improved un-derstanding of the brain and e ﬃ cient processing in real-time.

1) W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski,

Neuronal Dy-namics: From Single Neurons to Networks and Models of Cognition (Cambridge University Press, Cambridge, 2014).2) Y. LeCun, Y. Bengio, and G. Hinton, Nature , 436 (2015).3) S. K. Esser, R. Appuswamy, P. A. Merolla, J. V. Arthur, and D. S.Modha, NIPS’15: Proc. 28th Int. Conf. Neural Information ProcessingSystems, 2015, Vol. 1, Montreal, p. 1117.4) W. Maass, Neural Netw. , 1659 (1997).5) A. Tavanaei, M. Ghodrati, S. R. Kheradpisheh, T. Masquelier, andA. Maida, Neural Netw. , 47 (2019).6) D. Guest, K. Cranmer, and D. Whiteson, Annu. Rev. Nucl. Part. Sci. ,161 (2018).7) B. P. Borzyszkowski, presented at Neuromorphic Computing in HighEnergy Physics. Second CERN openlab summer student lightning talksession, 2019.8) W. Maass and H. Markram, J. Comput. Syst. Sci. , 593 (2004).9) S. M. Bohte, J. N. Kok, and H. La Poutre, Neurocomputing , 17(2002).10) E. Ledinauskas, J. Ruseckas, A. Jurˇs˙enas, and G. Buraˇcas,arXiv:2006.04436.11) H. Mostafa, IEEE Trans. Neural Netw. Learn. Syst. , 3227 (2017).12) J. H. Lee, T. Delbruck, and M. Pfei ﬀ er, Front. Neurosci. , 508 (2016).13) H. Markram, W. Gerstner, and P. J. Sj¨ostr¨om, Front. Synaptic Neurosci. , 2 (2012).14) T. Iakymchuk, A. Rosado-Mu˜noz, J. F. Guerrero-Mart´ınez, M. Bataller-Mompe´an, and J. V. Franc´es-V´ıllora, EURASIP J. Image Video Pro-cess. , 4 (2015).15) P. U. Diehl and M. Cook, Front. Comput. Neurosci. , 99 (2015).16) Y. Zeng, K. Devincentis, Y. Xiao, Z. I. Ferdous, X. Guo, Z. Yan, andY. Berdichevsky, 2018 IEEE Int. Conf. Acoustics, Speech and SignalProcessing, 2018, Alberta, p. 1154.17) A. Tavanaei and A. Maida, Neurocomputing , 39 (2019).18) Y. P. Reddy, P. Viswanath, and B. E. Reddy, Int. J. Eng. Technol. Innov. , 81 (2018).19) J. E. Van Engelen and H. H. Hoos, Machine Learning , 373 (2020).20) D. Erhan, A. Courville, Y. Bengio, and P. Vincent, Proc. Thirteenth Int.Conf. Artiﬁcial Intelligence and Statistics, 2010, Vol. 9, Sardinia, p. 201.21) G. E. Hinton, S. Osindero, and Y.-W. Teh, Neural Comput. , 1527(2006).22) C. Lee, P. Panda, G. Srinivasan, and K. Roy, Front. Neurosci. , 435(2018).23) Y. Dorogyy and V. Kolisnichenko, 2018 IEEE First Int. Conf. SystemAnalysis Intelligent Computing, 2018, Kiev, p. 1.24) M. Oster, R. Douglas, and S.-C. Liu, Neural Comput. , 2437 (2009).25) W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neu-rons, Populations, Plasticity (Cambridge University Press, Cambridge,2002).26) M. Riesenhuber and T. Poggio, Nat. Neurosci. , 1019 (1999).27) S. Song, K. D. Miller, and L. F. Abbott, Nat. Neurosci. , 919 (2000).28) Y. LeCun, L. Bottou, Y. Bengio, and P. Ha ﬀ ner, Proc. IEEE , 22788. Phys. Soc. Jpn. FULL PAPERS (1998).29) L. I. Zhang, H. W. Tao, C. E. Holt, W. A. Harris, M. Poo, Nat. , 37(1998).30) B. Merialdo, Comput. Linguist. , 155 (1994).31) M. Ohzeki, J. Phys. Soc. Jpn. , 034003 (2015). 32) F. E ﬀ enberger, J. Jost, and A. Levina, PLoS Comput. Biol. , (2015).33) Y. Bengio, D. H. Lee, J. Bornschein, T. Mesnard and Z. Lin,arXiv:1502.04156.34) L. Prechelt, Neural Netw.:Tricks of the trade.1524