Journal of Circuits, Systems and Computers | 2021

Speech Emotion Recognition on Small Sample Learning by Hybrid WGAN-LSTM Networks

 
 
 

Abstract


The speech emotion recognition based on the deep networks on small samples is often a very challenging problem in natural language processing. The massive parameters of a deep network are much difficult to be trained reliably on small-quantity speech samples. Aiming at this problem, we propose a new method through the systematical cooperation of Generative Adversarial Network (GAN) and Long Short Term Memory (LSTM). In this method, it utilizes the adversarial training of GAN’s generator and discriminator on speech spectrogram images to implement sufficient sample augmentation. A six-layer convolution neural network (CNN), followed in series by a two-layer LSTM, is designed to extract features from speech spectrograms. For accelerating the training of networks, the parameters of discriminator are transferred to our feature extractor. By the sample augmentation, a well-trained feature extraction network and an efficient classifier could be achieved. The tests and comparisons on two publicly available datasets, i.e., EMO-DB and IEMOCAP, show that our new method is effective, and it is often superior to some state-of-the-art methods.

Volume None
Pages None
DOI 10.1142/s0218126622500736
Language English
Journal Journal of Circuits, Systems and Computers

Full Text