2021 IEEE Spoken Language Technology Workshop (SLT) | 2021

Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition

Abstract

Domain generalization is a major challenge for cross-corpus speech emotion recognition. The recognition performance built on seen source corpora is inevitably degraded when the systems are tested against unseen target corpora that have different speakers, channels, and languages. We present a novel framework based on a triplet network to learn more generalized features of emotional speech that are invariant across multiple corpora. To reduce the intrinsic discrepancies between source and target corpora, an explicit feature transformation based on the triplet network is implemented as a preprocessing step. Extensive comparison experiments are carried out on three emotional speech corpora; two English corpora, and one Japanese corpus. Remarkable improvements of up-to 35.61% are achieved for all cross-corpus speech emotion recognition, and we show that the proposed framework using the triplet network is effective for obtaining more generalized features across multiple emotional speech corpora.

Volume None

2021 IEEE Spoken Language Technology Workshop (SLT) | 2021

Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition

Abstract

Volume None

Pages 389-396

DOI 10.1109/SLT48900.2021.9383534

Language English

Journal 2021 IEEE Spoken Language Technology Workshop (SLT)

Full Text