Tom Sercu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tom Sercu is active.

Explore More

Publication

Featured researches published by Tom Sercu.

conference of the international speech communication association | 2016

Advances in Very Deep Convolutional Neural Networks for LVCSR.

Tom Sercu; Vaibhava Goel

Very deep CNNs with small 3x3 kernels have recently been shown to achieve very strong performance as acoustic models in hybrid NN-HMM speech recognition systems. In this paper we investigate how to efficiently scale these models to larger datasets. Specifically, we address the design choice of pooling and padding along the time dimension which renders convolutional evaluation of sequences highly inefficient. We propose a new CNN design without timepadding and without timepooling, which is slightly suboptimal for accuracy, but has two significant advantages: it enables sequence training and deployment by allowing efficient convolutional evaluation of full utterances, and, it allows for batch normalization to be straightforwardly adopted to CNNs on sequence data. Through batch normalization, we recover the lost peformance from removing the time-pooling, while keeping the benefit of efficient convolutional evaluation. We demonstrate the performance of our models both on larger scale data than before, and after sequence training. Our very deep CNN model sequence trained on the 2000h switchboard dataset obtains 9.4 word error rate on the Hub5 test-set, matching with a single model the performance of the 2015 IBM system combination, which was the previous best published result.

international conference on acoustics, speech, and signal processing | 2017

Knowledge distillation across ensembles of multilingual models for low-resource languages

Jia Cui; Brian Kingsbury; Bhuvana Ramabhadran; George Saon; Tom Sercu; Kartik Audhkhasi; Abhinav Sethy; Markus Nussbaum-Thom; Andrew Rosenberg

This paper investigates the effectiveness of knowledge distillation in the context of multilingual models. We show that with knowledge distillation, Long Short-Term Memory(LSTM) models can be used to train standard feed-forward Deep Neural Network (DNN) models for a variety of low-resource languages. We then examine how the agreement between the teachers best labels and the original labels affects the student models performance. Next, we show that knowledge distillation can be easily applied to semi-supervised learning to improve model performance. We also propose a promising data selection method to filter un-transcribed data. Then we focus on knowledge transfer among DNN models with multilingual features derived from CNN+DNN, LSTM, VGG, CTC and attention models. We show that a student model equipped with better input features not only learns better from the teachers labels, but also outperforms the teacher. Further experiments suggest that by learning from each other, the original ensemble of various models is able to evolve into a new ensemble with even better combined performance.

international conference on acoustics, speech, and signal processing | 2017

Network architectures for multilingual speech representation learning

Tom Sercu; George Saon; Jia Cui; Xiaodong Cui; Bhuvana Ramabhadran; Brian Kingsbury; Abhinav Sethy

Multilingual (ML) representations play a key role in building speech recognition systems for low resource languages. The IARPA sponsored BABEL program focuses on building speech recognition (ASR) and keyword search (KWS) systems in over 24 languages with limited training data. The most common mechanism to derive ML representations in the BABEL program has been with the use of a two-stage network, the first stage being a convolutional network (CNN) from where multilingual features are extracted, expanded contextually and used as input to the second stage which can be a feed-forward DNN or a CNN. The final multilingual representations are derived from the second network. This paper presents two novel methods for deriving ML representations. The first is based on Long-Short Term Memory (LSTM) networks and the second is based on a very deep CNN (VGG-net). We demonstrate that ML features extracted from both models show significant improvement over the baseline CNN-DNN based ML representations, in terms of both speech recognition and keyword search performance and draw the comparison between the LSTM model itself and the ML representations derived from it on Georgian, the surprise language for the OpenKWS evaluation.

conference of the international speech communication association | 2016

The IBM 2016 English Conversational Telephone Speech Recognition System

George Saon; Tom Sercu; Steven J. Rennie; Hong-Kwang Jeff Kuo

international conference on acoustics, speech, and signal processing | 2016

Very deep multilingual convolutional neural networks for LVCSR

Tom Sercu; Christian Puhrsch; Brian Kingsbury; Yann LeCun

conference of the international speech communication association | 2017

English Conversational Telephone Speech Recognition by Humans and Machines.

George Saon; Gakuto Kurata; Tom Sercu; Kartik Audhkhasi; Samuel Thomas; Dimitrios Dimitriadis; Xiaodong Cui; Bhuvana Ramabhadran; Michael Picheny; Lynn-Li Lim; Bergul Roomi; Phil Hall

international conference on machine learning | 2017