David Mrva | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Mrva is active.

Explore More

Publication

Featured researches published by David Mrva.

international conference on acoustics, speech, and signal processing | 2005

Training LVCSR systems on thousands of hours of data

Gunnar Evermann; Ho Yin Chan; Mark J. F. Gales; Bin Jia; David Mrva; Philip C. Woodland; Kai Yu

Typical systems for large vocabulary conversational speech recognition (LVCSR) have been trained on a few hundred hours of carefully transcribed acoustic training data. The paper describes an LVCSR system for the conversational telephone speech (CTS) task trained on more than 2000 hours of data for which only approximate transcriptions were available. The challenges of dealing with such a large data set and the accuracy improvements over the small baseline system are discussed. The effect on both acoustic and language modelling performance is studied. Overall, increasing the training data size from 360 h to 2200 h and optimising the training procedure reduced the word error rate on the DARPA/NIST 2003 evaluation set by about 20% relative.

international conference on acoustics, speech, and signal processing | 2015

Scaling recurrent neural network language models

Will Williams; Niranjani Prasad; David Mrva; Tom Ash; Tony Robinson

This paper investigates the scaling properties of Recurrent Neural Network Language Models (RNNLMs). We discuss how to train very large RNNs on GPUs and address the questions of how RNNLMs scale with respect to model size, training-set size, computational costs and memory. Our analysis shows that despite being more costly to train, RNNLMs obtain much lower perplexities on standard benchmarks than n-gram models. We train the largest known RNNs and present relative word error rates gains of 18% on an ASR task. We also present the new lowest perplexities on the recently released billion word language modelling benchmark, 1 BLEU point gain on machine translation and a 17% relative hit rate gain in word prediction.

ieee automatic speech recognition and understanding workshop | 2003

Recent advances in broadcast news transcription

Do Yeong Kim; Gunnar Evermann; Thomas Hain; David Mrva; S. E. Tranter; Lan Wang; Philip C. Woodland

Th paper describes recent advances in the CU-HTK Broadcast News English (BN-E) transcription system and its performance in the DARPA/NIST Rich Transcription 2003 Speech-to-Text (RT-03) evaluation. Heteroscedastic linear discriminant analysis (HLDA) and discriminative training, which were previously developed in the context of the recognition of conversational telephone speech, have been successfully applied to the BN-E task for the first time. A number of new features have also been added. These include gender-dependent (GD) discriminative training and modified discriminative training using lattice regeneration and combination. On the 2003 evaluation set, the system gave an overall word error rate of 10.7% in less than 10 times real time (10/spl times/RT).

international conference on acoustics, speech, and signal processing | 2005

Development of the CU-HTK 2004 broadcast news transcription systems

Do Yeong Kim; Ho Yin Chan; Gunnar Evermann; Mark J. F. Gales; David Mrva; Khe Chai Sim; Philip C. Woodland

The paper describes our recent work on improving broadcast news transcription and presents details of the CU-HTK broadcast news English (BN-E) transcription system for the DARPA/NIST rich transcription 2004 speech-to-text (RT04) evaluation. A key focus has been building a system using an order of magnitude more acoustic training data than we have previously attempted. We have also investigated a range of techniques to improve both minimum phone error (MPE) training and the efficient creation of MPE-based narrow-band models. The paper describes two alternative system structures that run in under 10/spl times/RT and a further system that runs in less than 1/spl times/RT. This final system gives lower word error rates than our 2003 system that ran in 10/spl times/RT.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Progress in the CU-HTK broadcast news transcription system

Mark J. F. Gales; Do Yeong Kim; Philip C. Woodland; Ho Yin Chan; David Mrva; Rohit Sinha; S. E. Tranter

conference of the international speech communication association | 2006

Unsupervised language model adaptation for Mandarin broadcast conversation transcription

David Mrva; Philip C. Woodland

conference of the international speech communication association | 2004

A PLSA-based language model for conversational telephone speech.

David Mrva; Philip C. Woodland

Archive | 2004

SuperEARS: Multi-Site Broadcast News System

Philip C. Woodland; Ricky Ho Yin Chan; Gunnar Evermann; Mark J. F. Gales; Do Yeong Kim; Xiao Liu; David Mrva; Khe Chai Sim; Liqiang Wang; Kin Man Yu; John Makhoul; Richard M. Schwartz; Luong T. Nguyen; S. Masoukas; Bing Xiang; Mohamed Afify; Sherif M. Abdou; Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Fabrice Lefèvre; Dimitra Vergyri; Wei Wang; Jingfang Zheng; Anand Venkataraman; Ramana Rao Gadde; Andreas Stolcke

Archive | 2004