Tim Ng | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Ng is active.

Explore More

Publication

Featured researches published by Tim Ng.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Recent innovations in speech-to-text transcription at SRI-ICSI-UW

Andreas Stolcke; Barry Y. Chen; H. Franco; Venkata Ramana Rao Gadde; Martin Graciarena; Mei-Yuh Hwang; Katrin Kirchhoff; Arindam Mandal; Nelson Morgan; Xin Lei; Tim Ng; Mari Ostendorf; M. Kemal Sönmez; Anand Venkataraman; Dimitra Vergyri; Wen Wang; Jing Zheng; Qifeng Zhu

We summarize recent progress in automatic speech-to-text transcription at SRI, ICSI, and the University of Washington. The work encompasses all components of speech modeling found in a state-of-the-art recognition system, from acoustic features, to acoustic modeling and adaptation, to language modeling. In the front end, we experimented with nonstandard features, including various measures of voicing, discriminative phone posterior features estimated by multilayer perceptrons, and a novel phone-level macro-averaging for cepstral normalization. Acoustic modeling was improved with combinations of front ends operating at multiple frame rates, as well as by modifications to the standard methods for discriminative Gaussian estimation. We show that acoustic adaptation can be improved by predicting the optimal regression class complexity for a given speaker. Language modeling innovations include the use of a syntax-motivated almost-parsing language model, as well as principled vocabulary-selection techniques. Finally, we address portability issues, such as the use of imperfect training transcripts, and language-specific adjustments required for recognition of Arabic and Mandarin

ieee automatic speech recognition and understanding workshop | 2013

Score normalization and system combination for improved keyword spotting

Damianos Karakos; Richard M. Schwartz; Stavros Tsakalidis; Le Zhang; Shivesh Ranjan; Tim Ng; Roger Hsiao; Guruprasad Saikumar; Ivan Bulyko; Long Nguyen; John Makhoul; Frantisek Grezl; Mirko Hannemann; Martin Karafiát; Igor Szöke; Karel Vesely; Lori Lamel; Viet-Bac Le

We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination, where the detections of multiple systems are merged together, and their scores are interpolated with weights which are optimized using MTWV as the maximization criterion. Both score normalization and system combination approaches show that significant gains in ATWV/MTWV can be obtained, sometimes on the order of 8-10 points (absolute), in five different languages. A variant of these methods resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.

ACM Transactions on Speech and Language Processing | 2007

Web resources for language modeling in conversational speech recognition

Ivan Bulyko; Mari Ostendorf; Man-Hung Siu; Tim Ng; Andreas Stolcke; Özgür Çetin

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

international conference on acoustics, speech, and signal processing | 2005

Web-data augmented language models for Mandarin conversational speech recognition

Tim Ng; Mari Ostendorf; Mei-Yuh Hwang; Manhung Siu; Ivan Bulyko; Xin Lei

Lack of data is a problem in training language models for conversational speech recognition, particularly for languages other than English. Experiments in English have successfully used Web-based text collection, targeted for a conversational style, to augment small sets of transcribed speech; we look at extending these techniques to Mandarin. In addition, we investigate different techniques for topic adaptation. Experiments in recognizing Mandarin telephone conversations show that the use of filtered Web data leads to a 28% reduction in perplexity and 7% reduction in character error rate, with most of the gain due to the general filtered Web data.

ieee automatic speech recognition and understanding workshop | 2013

Discriminative semi-supervised training for keyword search in low resource languages

Roger Hsiao; Tim Ng; Frantisek Grezl; Damianos Karakos; Stavros Tsakalidis; Long Nguyen; Richard M. Schwartz

In this paper, we investigate semi-supervised training for low resource languages where the initial systems may have high error rate (≥ 70.0% word eror rate). To handle the lack of data, we study semi-supervised techniques including data selection, data weighting, discriminative training and multilayer perceptron learning to improve system performance. The entire suite of semi-supervised methods presented in this paper was evaluated under the IARPA Babel program for the keyword spotting tasks. Our semi-supervised system had the best performance in the OpenKWS13 surprise language evaluation for the limited condition. In this paper, we describe our work on the Turkish and Vietnamese systems.

international conference on acoustics, speech, and signal processing | 2014

The 2013 BBN Vietnamese telephone speech keyword spotting system

Stavros Tsakalidis; Roger Hsiao; Damianos Karakos; Tim Ng; Shivesh Ranjan; Guruprasad Saikumar; Le Zhang; Long Nguyen; Richard M. Schwartz; John Makhoul

In this paper we describe the Vietnamese conversational telephone speech keyword spotting system under the IARPA Babel program for the 2013 evaluation conducted by NIST. The system contains several, recently developed, novel methods that significantly improve speech-to-text and keyword spotting performance such as stacked bottleneck neural network features, white listing, score normalization, and improvements on semi-supervised training methods. These methods resulted in the highest performance for the official IARPA Babel surprise language evaluation of 2013.

international conference on acoustics, speech, and signal processing | 2007

Speech Recognition System Combination for Machine Translation

Mark J. F. Gales; Xunying Liu; Rohit Sinha; Philip C. Woodland; Kai Yu; Spyros Matsoukas; Tim Ng; Kham Nguyen; Long Nguyen; Jean-Luc Gauvain; Lori Lamel; Abdelkhalek Messaoudi

The majority of state-of-the-art speech recognition systems make use of system combination. The combination approaches adopted have traditionally been tuned to minimising word error rates (WERs). In recent years there has been a growing interest in taking the output from speech recognition systems in one language and translating it into another. This paper investigates the use of cross-site combination approaches in terms of both WER and impact on translation performance. In addition, the stages involved in modifying the output from a speech-to-text (STT) system to be suitable for translation are described. Two source languages, Mandarin and Arabic, are recognised and then translated using a phrase-based statistical machine translation system into English. Performance of individual systems and cross-site combination using cross-adaptation and ROVER are given. Results show that the best STT combination scheme in terms of WER is not necessarily the most appropriate when translating speech.

international symposium on chinese spoken language processing | 2004

Progress on Mandarin conversational telephone speech recognition

Mei-Yuh Hwang; Xin Lei; Tim Ng; Ivan Bulyko; Mari Ostendorf; Andreas Stolcke; Wen Wang; Jing Zheng; Venkata Ramana Rao Gadde; Martin Graciarena; Man-Hung Siu; Yan Huang

Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We show the impact of each of the following factors: (a) simplified Mandarin phone set; (b) pitch features; (c) auto-retrieved Web texts for augmenting n-gram training; (d) speaker adaptive training; (e) maximum mutual information estimation; (f) decision-tree-based parameter sharing; (g) cross-word co-articulation modeling; and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8% to 46.8% after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements.

conference of the international speech communication association | 2016

Sage: The New BBN Speech Processing Platform.

Roger Hsiao; Ralf Meermeier; Tim Ng; Zhongqiang Huang; Maxwell Jordan; Enoch Kan; Tanel Alumäe; Jan Silovsky; William Hartmann; Francis Keith; Omer Lang; Man-Hung Siu; Owen Kimball

To capitalize on the rapid development of Speech-to-Text (STT) technologies and the proliferation of open source machine learning toolkits, BBN has developed Sage, a new speech processing platform that integrates technologies from multiple sources, each of which has particular strengths. In this paper, we describe the design of Sage, which allows the easy interchange of STT components from different sources. We also describe our approach for fast prototyping with new machine learning toolkits, and a framework for sharing STT components across different applications. Finally, we report Sage’s state-of-the-art performance on different STT tasks.

conference of the international speech communication association | 2016

Two-Stage Data Augmentation for Low-Resourced Speech Recognition.

William Hartmann; Tim Ng; Roger Hsiao; Stavros Tsakalidis; Richard M. Schwartz

Abstract : Low resourced languages suffer from limited training data and resources. Data augmentation is a common approach to increasing the amount of training data. Additional data is synthesized by manipulating the original data with a variety of methods. Unlike most previous work that focuses on a single technique, we combine multiple, complementary augmentation approaches. The first stage adds noise and perturbs the speed of additional copies of the original audio. The data is further augmented in a second stage, where a novel fMLLR-based augmentation is applied to bottleneck features to further improve performance. A reduction in word error rate is demonstrated on four languages from the IARPA Babel program. We present an analysis exploring why these techniques are beneficial.

Explore More