Journal of Circuits, Systems and Computers | 2021

Speech Emotion Recognition on Small Sample Learning by Hybrid WGAN-LSTM Networks

Abstract

The speech emotion recognition based on the deep networks on small samples is often a very challenging problem in natural language processing. The massive parameters of a deep network are much difficult to be trained reliably on small-quantity speech samples. Aiming at this problem, we propose a new method through the systematical cooperation of Generative Adversarial Network (GAN) and Long Short Term Memory (LSTM). In this method, it utilizes the adversarial training of GAN’s generator and discriminator on speech spectrogram images to implement sufficient sample augmentation. A six-layer convolution neural network (CNN), followed in series by a two-layer LSTM, is designed to extract features from speech spectrograms. For accelerating the training of networks, the parameters of discriminator are transferred to our feature extractor. By the sample augmentation, a well-trained feature extraction network and an efficient classifier could be achieved. The tests and comparisons on two publicly available datasets, i.e., EMO-DB and IEMOCAP, show that our new method is effective, and it is often superior to some state-of-the-art methods.

Volume None

Journal of Circuits, Systems and Computers | 2021

Speech Emotion Recognition on Small Sample Learning by Hybrid WGAN-LSTM Networks

Abstract

Volume None

Pages None

DOI 10.1142/s0218126622500736

Language English

Journal Journal of Circuits, Systems and Computers

Full Text