IEEE Transactions on Multimedia | 2021

Speech Personality Recognition Based on Annotation Classification Using Log-Likelihood Distance and Extraction of Essential Audio Features

 
 
 
 
 

Abstract


Speech personality recognition relies on training models that require an excessive number of features and are, in most cases, designed specifically for certain databases. As a result, when tested on different datasets, overfitted classifier models are not always reliable because their accuracy changes with changes in the domain of speakers. Moreover, personality annotations are often subjective, which creates variability in raters perception during labeling. These problems inhibit the effectiveness of speech personality recognition applications. To reduce the unexplained variance caused by unknown differences in raters perception, a structure that uses Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm is proposed. Furthermore, a feature extraction method is proposed to filter out undesirable adulterations be it noise, silence, or uncertain pitch segments, while extracting essential audio features, i.e., signal power roll-off, pitch, and pause rate. Experiments on the standard SSPNet dataset records a relative 4% increase in overall accuracy when log-likelihood based annotations are used. Moreover, improved consistency in accuracy is observed when this method is tested on male and female subsets.

Volume 23
Pages 3414-3426
DOI 10.1109/tmm.2020.3025108
Language English
Journal IEEE Transactions on Multimedia

Full Text