Archive | 2019

Improving Emotion Identification Using Phone Posteriors in Raw Speech Waveform Based DNN

 
 
 
 
 
 

Abstract


We propose to exploit phone posteriors as an additional feature in Deep Neural Network (DNN) to recognize emotions from raw speech waveform. The proposed DNN setup uses a time domain approach of learning filters within the network. The frame-level phone posteriors are combined with the learned feature representation through the network. Appended learned time domain features and phone posteriors are used as an input to the temporal context modeling layers which interleaves TDNN-LSTM with time-restricted self-attention. We achieve 16.48% relative error rate improvement in IEMOCAP categorical problem (with a final weighted accuracy of 75.03%) using phone posteriors compared to DNN setup which uses only learned time domain features for temporal context modeling. Further, we study the effect of learning emotion categories leveraging dimensional primitives in multi-task learning DNN model.

Volume None
Pages 3925-3929
DOI 10.21437/interspeech.2019-2093
Language English
Journal None

Full Text