2021 IEEE Spoken Language Technology Workshop (SLT) | 2021

Efficient Large Scale Semi-Supervised Learning for CTC Based Acoustic Models

 
 
 
 
 
 

Abstract


Semi-supervised learning (SSL) is an active area of research which aims to utilize unlabeled data to improve the accuracy of speech recognition systems. While the previous studies have established the efficacy of various SSL methods on varying amounts of data, this paper presents largest ASR SSL experiment ever conducted till date where 75K hours of labeled and 1.2 million hours of unlabeled data is used for model training. In addition, the paper introduces couple of novel techniques to facilitate such a large scale experiment: 1) a simple scalable Teacher-Student based SSL method for connectionist temporal classification (CTC) objective and 2) effective data selection mechanisms for leveraging massive amounts of unlabeled data to boost the performance of student models. Further, we apply SSL in all stages of the acoustic model training, including final stage sequence discriminative training. Our experiments indicate encouraging word error rate (WER) gains up to 14% in such a large transcribed data regime due to the SSL training.

Volume None
Pages 148-155
DOI 10.1109/SLT48900.2021.9383536
Language English
Journal 2021 IEEE Spoken Language Technology Workshop (SLT)

Full Text