[PDF] Biologically Plausible Learning of Text Representation with Spiking Neural Networks

Abstract

This study proposes a novel biologically plausible mechanism for generating low-dimensional spike-based text representation. First, we demonstrate how to transform documents into series of spikes spike trains which are subsequently used as input in the training process of a spiking neural network (SNN). The network is composed of biologically plausible elements, and trained according to the unsupervised Hebbian learning rule, Spike-Timing-Dependent Plasticity (STDP). After training, the SNN can be used to generate low-dimensional spike-based text representation suitable for text/document classification. Empirical results demonstrate that the generated text representation may be effectively used in text classification leading to an accuracy of 80.19% on the bydate version of the 20 newsgroups data set, which is a leading result amongst approaches that rely on low-dimensional text representations.

Full PDF

BBiologically Plausible Learning of TextRepresentation with Spiking Neural Networks

Marcin Białas − − − , Marcin MichałMirończuk − − − , and Jacek Mańdziuk − − − X ] National Information Processing Institute, al. Niepodległości 188 b, 00-608Warsaw, Poland, {marcin.bialas,marcin.mironczuk}@opi.org.pl Faculty of Mathematics and Information Sciences, Warsaw University ofTechnology, Koszykowa 75, 00-662 Warsaw, Poland, [email protected]

Abstract.

This study proposes a novel biologically plausible mechanismfor generating low-dimensional spike-based text representation. First, wedemonstrate how to transform documents into series of spikes ( spiketrains ) which are subsequently used as input in the training process of aspiking neural network (SNN). The network is composed of biologicallyplausible elements, and trained according to the unsupervised Hebbianlearning rule, Spike-Timing-Dependent Plasticity (STDP). After train-ing, the SNN can be used to generate low-dimensional spike-based textrepresentation suitable for text/document classiﬁcation. Empirical re-sults demonstrate that the generated text representation may be eﬀec-tively used in text classiﬁcation leading to an accuracy of . on the bydate version of the

20 newsgroups data set, which is a leading resultamongst approaches that rely on low-dimensional text representations.

Keywords: spiking neural network · STDP · Hebbian learning · textprocessing · text representation · spike-based representation · represen-tation learning · feature learning · text classiﬁcation ·

20 newsgroupsbydate

Spiking neural networks (SNNs) are an example of biologically plausible artiﬁ-cial neural networks (ANNs). SNNs, like their biological counterparts, processsequences of discrete events occurring in time, known as spikes. Traditionally,spiking neurons, due to their biological validity, have been studied mostly bytheoretical neuroscientists, and have become a standard tool for modeling brainprocesses on a micro scale. However, recent years have shown that spiking com-putation can also successfully address common machine learning challenges [35].Another interesting aspect of SNNs is the adaptation of such algorithms to neu-romorphic hardware which is a brain-inspired alternative to the traditional vonNeumann machine. Thanks to mimicking processes observed in brain synap-tic connections, neuromorphic hardware is a highly fault-tolerant and energy-eﬃcient substitute for classical computation [27]. a r X i v : . [ c s . N E ] J un Marcin Białas et al.

Recently we have witnessed signiﬁcant growth in the volume of research intoSNNs. Researchers have successfully adapted SNNs for the processing of im-ages [35], audio signals [9,39,40], and time series [18,31]. However, to the bestof the authors knowledge, there is only one work related to text processing withSNNs [37]. This state of aﬀairs is caused by the fact that text, due to its structureand high dimensionality, presents a signiﬁcant challenge to tackle by the SNNapproach. The motivation of this study is to broaden the current knowledge ofthe application of SNNs to text processing. More speciﬁcally, we have developedand evaluated a novel biologically inspired method for generation of spike-basedtext representation that may be used in text/document classiﬁcation task [25].

This paper proposes an

Spike Encoder for Text (SET) which generates spike-based text representation suitable for classiﬁcation task. Text data is highlydimensional (the most common text representation is in the form of a vector withmany features) which, due to the curse of dimensionality [20,2,17,26], usuallyleads to overﬁtted classiﬁcation models with poor generalisation [38,32,30,13,4].Processing highly dimensional data is also computationally expensive. There-fore, researchers have sought text representations which may overcome this draw-back [3]. One of possible approaches is based on transformation of high dimen-sional feature space to low-dimensional representation [5,36,6].In the above context we propose the following two-phase approach to SNNbased text classiﬁcation. Firstly, the text is transformed into spike trains . Sec-ondly, spike trains representation is used as the input in the SNN training pro-cess performed according to biologically plausible unsupervised learning rule,and generating the spike-based text representation. This representation has sig-niﬁcantly lower dimensionality than the spike trains representation and can beused eﬀectively in subsequent SNN text classiﬁcation. The proposed solution hasbeen empirically evaluated on the publicly available version, bydate [21] of thereal data set known as

20 newsgroups , which contains

18 846 text documentsfrom twenty diﬀerent newsgroups of

Usenet , a worldwide distributed discussionsystem.Both the input and output of the SNN rely on spike representations, thoughof very diﬀerent forms. For the sake of clarity, throughout the paper the formerrepresentation (SNN input) will be referred to as spike trains , and the latter one(SNN output) as spike-based , or spiking encoding , or low-dimensional . The main contribution of this work can be summarized as follows: – To propose an original approach to document processing using SNNs and itssubsequent classiﬁcation based on generated spike-based text representation; – To experimentally evaluate the inﬂuence of various parameters on the qual-ity of generated representation, which leads to better understanding of thestrengths and limitations of SNN-based text classiﬁcation approaches; pike Encoder for Text 3 – To propose an SNN architecture which may potentially contribute to de-velopment of other SNN based approaches. We believe that the solutionpresented may serve as a building block for larger SNN architectures, inparticular deep spiking neural networks (DSNNs) [35];

As mentioned above, we are aware of only one paper related to text processingin the context of SNNs context [37] which, nevertheless, diﬀers signiﬁcantly fromour approach. The authors of [37] focus on transforming word embeddings [23,28]into spike trains, whilst our focus is not only on representation of text in theform of spike trains, but also on training the SNN encoder which generates llow-dimensional text representation. In other words, our goal is to generate alow-dimensional text representation with the use of SNN base, whereas in [37]the transformation of an existing text embedding into spike trains is proposed.This remainder of the paper is structured as follows. Section 2 presents anoverview of the proposed method; Section 3 describes the evaluation process ofthe method and experimental results; and Section 4 presents the conclusions.

The proposed method transforms input text to spike code and uses it as traininginput for the SNN to achieve a meaningful spike-based text representation. Themethod is schematically presented in Fig. 1. In phase I , text is transformed into a spike trains spiking encoder spiking encoding phase I text to spike transoformation phase II - processing spikes by SNN text vectorsraw text documents t e x t v e c t o r i z a t i o n v e c t o r t o s p i k e t r a n s f o r m a t i o n Fig. 1.

A schema of the proposed method for generating spike-based low-dimensionaltext representation. vector representation and afterwards each vector is encoded as spike trains. Oncethe text is encoded in the form of neural activity, it can be used as input to thecore element of our method - a spiking encoder . The encoder is a two-layered SNNwith adaptable synapses. During the learning phase ( II ), the spike trains arepropagated through the encoder in a feed-forward manner and synaptic weightsare modiﬁed simultaneously according to unsupervised learning rule. After the Marcin Białas et al. learning process, the output layer of the spiking encoder provides spike-basedrepresentation of the text presented to the system.In the remainder of this section all elements of the system described aboveare discussed in more detail.

During a text to spike transformation phase like the one illustrated in Fig. 1 textis preprocessed for further spiking computation. Text input data ( data corpus ) isorganized as a set D of documents d i , i = 1 , . . . , K . In the ﬁrst step a dictionary T containing all unique words t j , j = 1 , . . . , | T | from the corpus data is built.Next, each document d i is transformed into an M -dimensional ( M = | T | ) vector W i , the elements of which, W i [ j ] := w ij , j = 1 , . . . , M represent the relevance ofwords t j to document d i . In eﬀect, the corpus data is represented by a real-valuedmatrix W K × M also called document-term matrix .The typical weighting functions are term-frequency (TF), inverse documentfrequency (IDF), or their combination TF-IDF [22,12]. In TF the weight w ij isequal to the number of times the j -th word appears in d i with respect to thelength of | d i | (the number of all non-unique words in d i ). IDF takes into accountthe whole corpus D and sets w ij as the logarithm of a ratio between | D | andthe number of documents containing word t j . Consequently, IDF mitigates theimpact of words that occur very frequently in a given corpus and are presumablyless informative from the point of view of document classiﬁcation than the wordsoccurring in a small fraction of the documents. TF-IDF sets w ij as a product ofTF and IDF weights. In this paper we use TF-IDF weighting which is the mostpopular approach in text processing domain. In order to transform a vector representation to spike trains one, presentationtime t p which establishes for how long each document is presented to the network,and the time gap between two consecutive presentations ∆t p , must be deﬁned. Atime gap period, without any input stimuli is necessary to eliminate interferencebetween documents and allow dynamic parameters of the system to decay and“be ready” for the next input.Technically, for a given document d i , represented as M dimensional vector ofweights w ij , for each weight w ij in every millisecond of document presentation aspike is generated with probability proportional to w ij . Thanks to this procedure,we ultimately derive a spiking representation of the text.In our experiments each document is presented for t p = 600[ ms ] and ∆t p =300[ ms ] , and proportionality coeﬃcient α is set to . .For a better clariﬁcation, let’s consider a simple example and assume thatfor a word baseball the corresponding weight w ij in some document d i is equalto . . Then for each millisecond of a presentation time a probability of emitting pike Encoder for Text 5 a spike P ( spike | baseball ) equals α · . . . Hence, spikes during ms ] presentation time are expected, on average, to be generated. A spiking encoder is the key element of the proposed method. The encoder,presented in Fig. 2, is a two layered SNN equipped with an additional inhibitoryneuron. The ﬁrst layer contains M neurons (denoted by blue circles) and each of inhibitoryneuron A ... N ... M spiking encoder input layer spiking encodingspike trains inhibitionrequest signal Fig. 2.

Spiking encoder architecture. them represents one word t j from the dictionary T . Neuron dynamics is deﬁnedby the spike trains generated based on weights w ij corresponding to documents d i , i = 1 , . . . , K . Higher numbers of spikes are emitted by neurons representingwords which are statistically more relevant for a particular document, accordingto the chosen TF-IDF measure. The spike trains for each neuron are presentedin Fig. 2 as a row of short vertical lines.In the brain spikes are transmitted between neurons via synaptic connec-tions. A neuron which generates a spike is called a presynaptic neuron , whilsta target neuron (spike receiver) is a postsynaptic neuron . In the proposed SNNarchitecture (cf. Fig. 2) two diﬀerent types of synaptic connections are utilised: excitatory ones and inhibitory ones. Spikes transmitted through excitatory con-nections (denoted by green circles in Fig. 2) leads to ﬁring of postsynaptic neuron,while impulses traveling through inhibitory ones (red circles in Fig. 2) hinder postsynaptic neuron activity. Each time an encoder neuron ﬁres its weights aremodiﬁed according to the proposed learning rule. The neuron simultaneouslysends an inhibition request signal to the inhibitory neuron and activates it. Thenthe inhibitory neuron suppresses the activity of all encoder output layer neurons Marcin Białas et al. using recursive inhibitory connection (red circles). The proposed architecturesatisﬁes the competitive learning paradigm [19] with a winner-takes-all (WTA)strategy.In this work we consider a biologically plausible neuron model known as leakyintegrate and ﬁre (LIF) [11]. The dynamics of such a neuron is described in termsof changes of its membrane potential (MP). If the neuron is not receiving anyspikes its potential is close to the value of u rest = − mV ] known as rest-ing membrane potential. When the neuron receives spikes transmitted throughexcitatory synapses, the MP moves towards excitatory equilibrium potential, u exc = 0[ mV ] . When many signals are simultaneously transmitted throughexcitatory synapses the MP rises and at some point can reach a thresholdvalue of u th = − mV ] , in which case the neuron ﬁres. After ﬁring, the neu-ron resets its MP to u rest and becomes inactive for t ref = 3[ ms ] (the refrac-tory period). In the opposite scenario, when the neuron receives spikes throughthe inhibitory synapse, its MP moves towards inhibitory equilibrium potential u inh = − mV ] , i.e. further away from its threshold value, which decreases thechance of ﬁring. The dynamics of the membrane potential u in the LIF model isdescribed by the following equation: τ dudt = ( u rest − u ) + g e ( u exc − u ) + g i ( u inh − u ) (1)where g e and g i denote excitatory and inhibitory conductance, resp. and τ =100[ ms ] is membrane time constant. The values of g e and g i depend on presy-naptic activity. Each time a signal is transmitted through the synapse the con-ductance is incremented by the value of weight corresponding to that synapse,and decays with time afterwards according to equation (4) τ e dg e dt = − g e , τ i dg i dt = − g i (2), where τ e = 2[ ms ] , τ i = 2[ ms ] are decay time constants. In summary, if there isno presynaptic activity the MP converges to u rest . Otherwise, its value changesaccording to the signals transmitted through the neuron synapses. We utilise a modiﬁed version of the Spike-Timing-Dependent Plasticity (STDP)learning process [33]. STDP is a biologically plausible unsupervised learning pro-tocol belonging to the family of Hebbian learning (HL) methods [14]. In short,the STDP process results in an increase of the synaptic weight if the postsy-naptic spike is observed soon after presynaptic one (’pre-before-post’), and ina decrease of the synaptic weight in the opposite scenario (’post-before-pre’).The above learning scheme increases the relevance of those synaptic connectionswhich contribute to the activation of the postsynaptic neuron, and decreases theimportance of the ones which do not. We modify STDP in a manner similarto [8,29], i.e. by skipping the weight modiﬁcation in the post-before-pre scenario pike Encoder for Text 7 and introducing an additional scaling mechanism. The plasticity of the excita-tory synapse s ij connecting a presynaptic neuron i from the input layer withpostsynaptic neuron j from the encoder layer can be expressed as follows: ∆s ij = η ( A ( t ) − ( R ( t ) + 0 . s ij ) (3)where A ( t ) = − τ A dA ( t ) dt , R ( t ) = − τ R dR ( t ) dt (4)and η = 0 . is a small learning constant. In eqs. (3)-(4) A ( t ) represents apresynaptic trace and R ( t ) is a scaling factor which depends on the history ofpostsynaptic neuron activity. Every time the presynaptic neuron i ﬁres A ( t ) isset to and exponentially decays in time ( τ A = 5[ ms ] ). If the postsynapticneuron ﬁres just after the presynaptic one (’pre-before-post’) A ( t ) is close to and the weight increase is high. The other component of eq. (3) ( R ( t ) + 0 . s ij is a form of synaptic scaling [24]. Every time the postsynaptic neuron ﬁres R ( t ) is incremented by 1 and afterwards decays with time ( τ R = 70[ ms ] ). The roleof the small constant factor . , is to maintain scaling even when activity isrelatively small. The overall purpose of synaptic scaling is to decrease the weightsof the synapses which are not involved in ﬁring the postsynaptic neuron. Anotherbeneﬁt of synaptic scaling is to restrain weights from uncontrolled growth whichcan be observed in HL [1]. For a given data corpus (set of documents) D the training procedure is performedas follows. Firstly, we divide D into s subsets u i , i = 1 , . . . , s in the manner de-scribed in Section 3.1. Secondly, each subset u i is transformed to spike trainsand used as input for a separate SNN encoder H i , i = 1 , . . . , s composed of N neurons. Please note that each encoder is trained with the use of one subsetonly. Such a training setup allows the processing of the data in parallel man-ner. Another advantage is that this limits the number of excitatory connectionsper neuron, which reduces computational complexity (the number of diﬀerentialequations that need to be evaluated for each spike) as during training encoder H i is exposed only to the respective subset, T i of the training set dictionary T and the number of its excitatory connections is limited to | T i | < | T | . Spike trainsare presented to the network four times (in four training epochs).Once the learning process is completed, the connection pruning procedure isapplied. Please observe that HL combined with competitive learning should leadto highly specialised neurons which are activated only for some subset of theinputs. The specialisation of a given neuron depends on the set of its connectionweights. If the probability of ﬁring should be high for some particular subset ofthe inputs, the weights representing words from those inputs must be high. Theother weights should be relatively low due to the synaptic scaling mechanism.Based on this assumption, after training, for each output layer neuron we prune θ per cent of its incoming connections with the lowest weights. θ is a hyperparameter of the method empirically evaluated in the experimental section. Marcin Białas et al.

This section presents experimental evaluation of the method proposed. In sub-section 3.1 the technical issues related to the setup of experiment and imple-mentation of the training and evaluation procedures are discussed. The ﬁnal twosubsections focus respectively on the experimental results and compare themwith the literature.

The bydate version of

20 newsgroups is a well known benchmark set in thetext classiﬁcation domain. The set contains newsgroups post related to diﬀerentcategories (topics) gathered from

Usenet , in which each category corresponds toone newsgroup. Categories are organised into a hierarchical structure with themain categories being computers , recreation and entertainment , science , religion , politics , and forsale . The corpus consists of

18 846 documents nearly equallydistributed among twenty categories and explicitly divided into two subsets: thetraining one ( ) and the test one ( ).The dynamics of the spiking neurons (including the plasticity mechanism)was implemented using the

BRIAN 2 simulator [34].

Scikit-learn Python library was used for processing the text and creating the TF-IDF matrix. As mentioned in Section 2.4 the training set was divided into s = 11 subsets u i each of which, except for u , contained documents. Division was performedrandomly. Firstly the entire training set was shuﬄed, and then consecutivelyassigned to the subsets according to the resulting order with a documentredundancy (overlap) between the neighbouring subsets, as described in Table 1.The overlap between subsequent subsets resulted from preliminary experi-ments which suggested that such an approach improves classiﬁcation accuracy.While we found the concept of partial data overlap to be reasonably eﬃcient, itby no means should be regarded as an optimal choice. The optimal division ofdata into training subsets remains an open question and a subject of our futureresearch. The outputs of all SNNs H i , i = 1 , . . . , s , i.e. spike-based encodings representedas sums of spikes per document were joined to form a single matrix (a ﬁnallow-dimensional text representation) which was evaluated in the context of aclassiﬁcation task. The joined matrix of spike rates was used as an input tothe Logistic Regression (LR) [15,17] classiﬁer with accuracy as the performancemeasure. http://qwone.com/~jason/20Newsgroups/20news-bydate.tar.gz https://scikit-learn.org/ pike Encoder for Text 9 Table 1.

Division of the training set into subsets u - u .Subset u u u u u u u u u u u First index 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000Last index 1500 2500 3500 4500 5500 6500 7500 8500 9500 1050011314Size 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 1314

In the ﬁrst experiment we looked more closely at the weights of neurons aftertraining and the relationship between the inhibition mechanism and the qual-ity/eﬃcacy of resulting text representation. We trained eleven SNN encoderswith neurons each according to the procedure presented above. After train-ing, neurons from the ﬁrst encoder ( H ) was randomly sampled and theirweights were used for further analysis. Fig. 3 illustrates the highest weightssorted in descending order. Weight number W e i g h t v a l u e

200 highest neurons weights sorted in descending order. weights of neuron 1weights of neuron 2weights of neuron 3weights of neuron 4weights of neuron 5

Fig. 3.

Weights extracted from the encoder’s neurons.

The weights of each neuron are presented with a diﬀerent colour. The plotsshow that every neuron has a group of dominant connections represented by theweights with the highest values - the ﬁrst several dozen connections. It meansthat each neuron will be activated more easily by the inputs that contain wordscorresponding to these weights. For example neuron will potentially producemore spikes for documents related to religion because its highest weightscorresponds to words ’jesus’, ’god’, ’paul’, ’faith’, ’law’, ’christians’, ’christ’,’sabbath’, ’sin’, ’jewish’ . A diﬀerent behaviour is expected from neuron whose highest weights corresponds to words ’drive’, ’scsi’, ’disk’, ’hard’, ’controller’,’ide’, ’drives’, ’help’, ’mac’, ’edu’ . This one will be more likely activated for com-puter related documents. On the other hand, not all neurons can be classiﬁedso easily. For instance highest weights of neuron are linked to words ’cs’, ’serial’, ’ac’, ’edu’, ’key’, ’bit’, ’university’, ’windows’, ’caronni’, ’uk’ , hence adesignation of this neuron is less obvious. We have repeated the above samplingand weigh inspection procedure several times and the observations are qualita-tively the same. For the sake of space savings we do not report them in detail.Hence, a question arises as to how well documents can be encoded with theuse of neurons trained in the manner described above? Intuitively, in practice thequality of encoding may be related to the level of competition amongst neuronsin the evaluation phase. If the inhibition value is kept high enough to satisfyWTA strategy then only a few neurons will be activated and the others will beimmediately suppressed. This scenario will lead to highly sparse representationsof the input documents, with just a few or (in extreme cases) only one neurondominating the rest. Since diﬀerences between documents belonging to diﬀerentclasses may be subtle, such a sparse representation may not be the optimal setup.In order to check the inﬂuence of the inhibition level on the resulting spike-basedrepresentation we tested the performance of the trained SNNs H - H for variousinhibition levels by adjusting the value of the neurons’ inhibitory synapses. Theresults are illustrated in Fig. 4 (top). A cc u r a c y % Accuracy as a function of inhibition.

110 330 550 770 1100 2200 330062.565.067.570.072.575.077.580.0 A cc u r a c y % Accuracy as a function of encoder size.

50% connections removed80% connections removed90% connections removed99% connections removed

Fig. 4.

Accuracy for various inhibition levels (top) and encoder sizes (bottom).

Clearly the accuracy strongly depends on the inhibition level. The best out-comes ( ≈ ) are accomplished with inhibition set to and rapidly decrease pike Encoder for Text 11 along with the inhibition raise. For the inhibition values higher than . theaccuracy plot enters a plateau at the level of approximately . The resultsshow that the most eﬀective representation of documents is generated with theabsence of inhibition during the evaluation phase, i.e. when all neurons have thesame chance of being activated and contribute to the document representation.The second series of experiments aimed at exploring the relationship betweenthe eﬃcacy of document representation and the size of the encoders. Further-more, the sensitivity of the trained encoders to connection pruning , with respectto their eﬃciency, was veriﬁed. The results of both experiments are shown inthe bottom plot of Fig. 4. Seven encoders of various sizes (between and neurons) were trained, and once the training was completed the before the connection pruning procedure took place.In the plot, four colored curves illustrate particular pruning scenarios andtheir impact on classiﬁcation accuracy for various encoder sizes. , , ,and of the weakest weights were respectively removed in the four discussedcases. Overall, for smaller SNN encoders (between and neurons) theaccuracy rises rapidly along with the encoder size increase. For larger SSNs,changes in the accuracy are slower and for all four curves stay within the range [77 . , . .In terms of the connection pruning degree the biggest changes in accuracy(between and ) are observed when of connections have been deleted(the red curve). In particular, the results of the encoders with fewer than neurons demonstrate that this range of pruning heavily aﬀects classiﬁcation ac-curacy. In larger networks additional neurons compensate the features removedby the connection pruning mechanism and the results are getting closer to otherpruning setups.Interestingly, for the networks smaller than neurons, the diﬀerences in ac-curacy between , , and pruning setups are negligible, which suggeststhat relatively high redundancy of connections still exist in the networks prunedin the range of to . Apparently, retaining as few as of the weightsdoes not impact the quality of representation and does not cause deteriorationof results. This result well correlates with the outcomes of the weight analysisreported above and conﬁrms that a meaningful subset of connections is suﬃcientfor proper encoding the input. The best overall classiﬁcation result ( . ) wasachieved by the SNN encoder with neurons and level of pruning set to (the green curve). It proves that SEM can eﬀectively reduce dimensionality ofthe text input from initial ≈

130 000 (the size of

20 newsgroups training vo-cabulary) to the size of − , and maintain classiﬁcation accuracy above77.5%. Since to our knowledge this paper presents the ﬁrst attempt of using SNN ar-chitecture to text classiﬁcation, in order to make some comparisons we selectedresults reported for other neural networks trained with similar input ( document-term matrix ) and yielding low-dimensional text representation as the output.

The results are presented in Table 2. SET achieved . accuracy and outper- Table 2.

Accuracy [%] comparison for several low-dimensional text representationmethods on bydate version of

20 newsgroups data set.

Method Accuracy

SET (this paper)

K-competitive Autoencoder for TExt (KATE) [7] 76.14Class Preserving Restricted Boltzmann Machine (CPr-RBM) [16] 75.39Variational Autoencoder [7] 74.30 formed the remaining shallow approaches. While this result looks promising webelieve that there is still room for improvement with further tuning of the method(in particular a division of samples into training subsets), as well as extensionof the SNN encoder by adding more layers. Another interesting direction wouldbe to learn semantic relevance between diﬀerent words and documents [10,41].

This work oﬀers a novel approach to text representation relying on SpikingNeural Networks. Using the proposed low-dimensional text representation theLR classiﬁer accomplished . accuracy on a standard benchmark set ( ) which is a leading result among shallow approaches relyingon low-dimensional representations.We have also examined the inﬂuence of the inhibition mechanism and synap-tic connections sparsity on the quality of the representation showing that (i) itis recommended that inhibition be disabled during the SNN evaluation phase,and (ii) pruning out as many as of connections with lowest weights did notaﬀect the representation quality while heavily reducing the SNN computationalcomplexity, i.e. the number of diﬀerential equations describing the network.There are a few lines of potential improvement that we plan to explore inthe further work. Most notably, we aim to expand the SNN encoder towardsDeep SNN architecture by adding more layers of spiking neurons which shouldpossibly allow to learn more detailed features of the input data. References

1. Abbott, L.F., Nelson, S.B.: Synaptic plasticity: taming the beast. Nature Neuro-science , 1178–1183 (2000)2. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015).https://doi.org/10.1007/978-3-319-14142-83. Aggarwal, C.C.: Machine Learning for Text. Springer Publishing Company, Incor-porated, 1st edn. (2018)pike Encoder for Text 134. Asif, M., Ishtiaq, A., Ahmad, H., Aljuaid, H., Shah, J.: Sentiment analysis ofextremism in social media from textual information. Telematics and Informatics , 101345 (2020). https://doi.org/10.1016/j.tele.2020.1013455. Ayesha, S., Hanif, M.K., Talib, R.: Overview and comparative study of dimension-ality reduction techniques for high dimensional data. Information Fusion , 44 –58 (2020)6. Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and newperspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence (8), 1798–1828 (Aug 2013). https://doi.org/10.1109/tpami.2013.507. Chen, Y., Zaki, M.J.: KATE: k-competitive autoencoder for text. In: Proceedingsof the 23rd ACM SIGKDD International Conference on Knowledge Discovery andData Mining, Halifax, NS, Canada, August 13 - 17, 2017. pp. 85–94. ACM (2017).https://doi.org/10.1145/3097983.30980178. Diehl, P., Cook, M.: Unsupervised learning of digit recognition using spike-timing-dependent plasticity. Frontiers in Computational Neuroscience , 99 (2015).https://doi.org/10.3389/fncom.2015.000999. Dominguez-Morales, J.P., Liu, Q., James, R., Gutierrez-Galan, D., Jimenez-Fernandez, A., Davidson, S., Furber, S.: Deep spiking neural network model fortime-variant signals classiﬁcation: a real-time speech recognition approach. In: 2018International Joint Conference on Neural Networks (IJCNN). IEEE (Jul 2018).https://doi.org/10.1109/ijcnn.2018.848938110. Gao, Y., Wang, W., Qian, L., Huang, H., Li, Y.: Extending embedding repre-sentation by incorporating latent relations. IEEE Access , 52682–52690 (2018).https://doi.org/10.1109/ACCESS.2018.286653111. Gerstner, W., Kistler, W.M.: Spiking Neuron Models. Cambridge University Press(Aug 2002). https://doi.org/10.1017/cbo978051181570612. Haddoud, M., Mokhtari, A., Lecroq, T., Abdeddaïm, S.: Combining super-vised term-weighting metrics for SVM text classiﬁcation with extended termrepresentation. Knowledge and Information Systems (3), 909–931 (2016).https://doi.org/10.1007/s10115-016-0924-113. Hartmann, J., Huppertz, J., Schamp, C., Heitmann, M.: Comparing automatedtext classiﬁcation methods. International Journal of Research in Marketing (1),20–38 (2019). https://doi.org/10.1016/j.ijresmar.2018.09.00914. Hebb, D.O.: The organization of behavior: A neuropsychological theory. New York(1949)15. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression, Second Edition. Wiley(2000). https://doi.org/10.1002/047172214616. Hu, J., Zhang, J., Ji, N., Zhang, C.: A new regularized restricted boltzmannmachine based on class preserving. Knowledge-Based Systems , 1–12 (2017).https://doi.org/10.1016/j.knosys.2017.02.01217. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to StatisticalLearning: with Applications in R. Springer (2013)18. Kasabov, N., Capecci, E.: Spiking neural network methodology for mod-elling, classiﬁcation and understanding of EEG spatio-temporal data mea-suring cognitive processes. Information Sciences , 565–575 (Feb 2015).https://doi.org/10.1016/j.ins.2014.06.02819. Kaski, S., Kohonen, T.: Winner-take-all networks for physiological mod-els of competitive learning. Neural Networks , 973–984 (12 1994).https://doi.org/10.1016/S0893-6080(05)80154-620. Keogh, E., Mueen, A.: Curse of Dimensionality, pp. 314–315. Springer US, Boston,MA (2017). https://doi.org/10.1007/978-1-4899-7687-1_1924 Marcin Białas et al.21. Lang, K.: Newsweeder: Learning to ﬁlter netnews. In: Proceedings of the TwelfthInternational Conference on Machine Learning. pp. 331–339 (1995)22. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval.Cambridge University Press, New York, NY, USA (2008)23. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-sentations of words and phrases and their compositionality. In: Burges, C.J.C.,Bottou, L., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Infor-mation Processing Systems 26: 27th Annual Conference on Neural InformationProcessing Systems 2013. Proceedings of a meeting held December 5-8, 2013, LakeTahoe, Nevada, United States. pp. 3111–3119 (2013)24. Miller, K.D., MacKay, D.J.C.: The role of constraints in hebbian learning. NeuralComputation (1), 100–126 (1994). https://doi.org/10.1162/neco.1994.6.1.10025. Mladenić, D., Brank, J., Grobelnik, M.: Document Classiﬁcation, pp. 372–377.Springer US, Boston, MA (2017). https://doi.org/10.1007/978-1-4899-7687-1_7526. Murphy, K.P.: Machine learning - a probabilistic perspective. Adaptive computa-tion and machine learning series, MIT Press (2012)27. Nawrocki, R.A., Voyles, R.M., Shaheen, S.E.: A mini review of neuromorphic ar-chitectures and implementations. IEEE Transactions on Electron Devices (10),3819–3829 (Oct 2016). https://doi.org/10.1109/TED.2016.259841328. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-sentation. In: Moschitti, A., Pang, B., Daelemans, W. (eds.) Proceedings of the2014 Conference on Empirical Methods in Natural Language Processing, EMNLP2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special InterestGroup of the ACL. pp. 1532–1543. ACL (2014). https://doi.org/10.3115/v1/d14-116229. Querlioz, D., Bichler, O., Dollfus, P., Gamrat, C.: Immunity to de-vice variations in a spiking neural network with memristive nanode-vices. IEEE Transactions on Nanotechnology (3), 288–295 (May 2013).https://doi.org/10.1109/TNANO.2013.225099530. Raza, M., Hussain, F.K., Hussain, O.K., Zhao, M., ur Rehman, Z.: A comparativeanalysis of machine learning models for quality pillar assessment of saas servicesby multi-class text classiﬁcation of users’ reviews. Future Generation ComputerSystems , 341–371 (2019)31. Reid, D., Hussain, A.J., Tawﬁk, H.: Financial time series prediction us-ing spiking neural networks. PLoS ONE (8), e103656 (Aug 2014).https://doi.org/10.1371/journal.pone.010365632. Silva, R.M., Almeida, T.A., Yamakami, A.: Mdltext: An eﬃcient andlightweight text classiﬁer. Knowledge-Based Systems , 152–164 (2017).https://doi.org/https://doi.org/10.1016/j.knosys.2016.11.01833. Song, S., Miller, K., Abbott, L.: Competitive hebbian learning throughspike timing-dependent plasticity. Nature neuroscience , 919–26 (10 2000).https://doi.org/10.1038/7882934. Stimberg, M., Goodman, D., Benichoux, V., Brette, R.: Equation-oriented speciﬁ-cation of neural models for simulations. Frontiers in Neuroinformatics , 6 (2014).https://doi.org/10.3389/fninf.2014.0000635. Tavanaei, A., Ghodrati, M., Kheradpisheh, S.R., Masquelier, T., Maida, A.: Deeplearning in spiking neural networks. Neural Networks , 47–63 (Mar 2019).https://doi.org/10.1016/j.neunet.2018.12.00236. Vlachos, M.: Dimensionality Reduction, pp. 354–361. Springer US, Boston, MA(2017). https://doi.org/10.1007/978-1-4899-7687-1_71pike Encoder for Text 1537. Wang, Y., Zeng, Y., Tang, J., Xu, B.: Biological neuron coding inspired bi-nary word embeddings. Cognitive Computation (5), 676–684 (Jul 2019).https://doi.org/10.1007/s12559-019-09643-138. Webb, G.I.: Overﬁtting, pp. 947–948. Springer US, Boston, MA (2017).https://doi.org/10.1007/978-1-4899-7687-1_96039. Wu, J., Chua, Y., Zhang, M., Li, H., Tan, K.C.: A spiking neural network frame-work for robust sound classiﬁcation. Frontiers in Neuroscience (Nov 2018).https://doi.org/10.3389/fnins.2018.0083640. Wysoski, S.G., Benuskova, L., Kasabov, N.: Evolving spiking neural networks foraudiovisual information processing. Neural Networks23