IEEE Transactions on Emerging Topics in Computational Intelligence | 2021

STDP Based Unsupervised Multimodal Learning With Cross-Modal Processing in Spiking Neural Networks

 
 

Abstract


Spiking neural networks perform reasonably well in recognition applications for single modality (e.g., images, audio, or text). In this paper, we propose a multimodal spiking neural network that combines two modalities (image and audio). The two unimodal ensembles are connected with cross-modal connections and the entire network is trained with unsupervised learning. The network receives inputs in both modalities for the same class and predicts the class label. The excitatory connections in the unimodal ensemble and the cross-modal connections are trained with power-law weight-dependent spike timing dependent plasticity learning rule. The cross-modal connections capture the correlation between neurons of different modalities. The multimodal network learns features of both modalities and improves the classification accuracy compared to unimodal topology, even when one of the modality is distorted by noise. The cross-modal connections suppress the effect of noise on classification accuracy. The well-learned cross-modal connections invoke additional spiking activity in neurons of the correct label. The cross-modal connections are only excitatory and do not inhibit the normal activity of the unimodal ensembles. We evaluated our multimodal network on images from MNIST dataset and utterances of digits from TI46 speech corpus. The multimodal network achieved a classification accuracy of 98% on the combined MNIST and TI46 dataset.

Volume 5
Pages 143-153
DOI 10.1109/TETCI.2018.2872014
Language English
Journal IEEE Transactions on Emerging Topics in Computational Intelligence

Full Text