2021 IEEE Spoken Language Technology Workshop (SLT) | 2021

Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition

 

Abstract


Domain generalization is a major challenge for cross-corpus speech emotion recognition. The recognition performance built on seen source corpora is inevitably degraded when the systems are tested against unseen target corpora that have different speakers, channels, and languages. We present a novel framework based on a triplet network to learn more generalized features of emotional speech that are invariant across multiple corpora. To reduce the intrinsic discrepancies between source and target corpora, an explicit feature transformation based on the triplet network is implemented as a preprocessing step. Extensive comparison experiments are carried out on three emotional speech corpora; two English corpora, and one Japanese corpus. Remarkable improvements of up-to 35.61% are achieved for all cross-corpus speech emotion recognition, and we show that the proposed framework using the triplet network is effective for obtaining more generalized features across multiple emotional speech corpora.

Volume None
Pages 389-396
DOI 10.1109/SLT48900.2021.9383534
Language English
Journal 2021 IEEE Spoken Language Technology Workshop (SLT)

Full Text