Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bing Xiang.
international conference on acoustics, speech, and signal processing | 2006
Bing Xiang; Kham Nguyen; Long Nguyen; Richard M. Schwartz; J. Makhoul
In this paper, we present a novel approach for morphological decomposition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well as high recognition accuracy in a state-of-the-art Arabic broadcast news transcription system. In this approach, the compound words are decomposed into stems and affixes in both language training and acoustic training data. The decomposed words in the recognition output are re-joined before scoring. Four algorithms are experimented and compared in this work. The best system achieved 1.9% absolute reduction (9.8% relative) in word error rate (WER) when compared to the 64K-word baseline. The recognition performance of this system is also comparable to a 300K-word recognition system trained on the normal words. In the meantime, the decomposed system is much faster in terms of speed and also needs less memory than the systems with larger than 64K vocabularies
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Spyridon Matsoukas; Jean-Luc Gauvain; Gilles Adda; Thomas Colthurst; Chia-Lin Kao; Owen Kimball; Lori Lamel; Fabrice Lefèvre; Jeff Z. Ma; John Makhoul; Long Nguyen; Rohit Prasad; Richard M. Schwartz; Holger Schwenk; Bing Xiang
This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively
international conference on acoustics, speech, and signal processing | 2004
Richard M. Schwartz; Thomas Colthurst; Nicolae Duta; Herbert Gish; Rukmini Iyer; Chia-Lin Kao; Daben Liu; Owen Kimball; Jeff Z. Ma; John Makhoul; Spyros Matsoukas; Long Nguyen; Mohammed Noamany; Rohit Prasad; Bing Xiang; Dongxin Xu; Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Langzhou Chen
We report on the results of the first evaluations for the BBN/LIMSI system under the new DARPA EARS program. The evaluations were carried out for conversational telephone speech (CTS) and broadcast news (BN) for three languages: English, Mandarin, and Arabic. In addition to providing system descriptions and evaluation results, the paper highlights methods that worked well across the two domains and those few that worked well on one domain but not the other. For the BN evaluations, which had to be run under 10 times real-time, we demonstrated that a joint BBN/LIMSI system with a time constraint achieved better results than either system alone.
international conference on acoustics, speech, and signal processing | 2007
Spyros Matsoukas; Ivan Bulyko; Bing Xiang; Kham Nguyen; Richard M. Schwartz; John Makhoul
This paper presents a set of experiments that we conducted in order to optimize the performance of an Arabic/English machine translation system on broadcast news and conversational speech data. Proper integration of speech-to-text (STT) and machine translation (MT) requires special attention to issues such as sentence boundary detection, punctuation, STT accuracy, tokenization, conversion of spoken numbers and dates to written form, optimization of MT decoding weights, and scoring. We discuss these issues, and show that a carefully tuned STT/MT integration can lead to significant translation accuracy improvements compared to simply feeding the regular STT output to a text MT system.
international conference on acoustics, speech, and signal processing | 2005
Bing Xiang; Long Nguyen; Spyros Matsoukas; Richard M. Schwartz
In this paper, we present cluster-dependent acoustic modeling for large-vocabulary speech recognition. With large amounts of acoustic training data, we build multiple cluster-dependent models (CDM), each focusing on a group of speakers in order to represent speaker-dependent characteristics. It is motivated by the fact that a sufficiently trained speaker-dependent (SD) model is better than the speaker-independent (SI) model. During decoding, we decode the data of each test speaker using CDMs selected under certain criteria to achieve high recognition accuracy. Various speaker clustering and model selection techniques are proposed and compared in the task of broadcast news (BN) transcription. The CDM provided more than 1% absolute gain in unadapted decoding and 0.5% gain in adapted decoding when compared to our baseline system on the EARS BN 2003 development test set.
north american chapter of the association for computational linguistics | 2007
Antti-Veikko I. Rosti; Necip Fazil Ayan; Bing Xiang; Spyridon Matsoukas; Richard M. Schwartz; Bonnie J. Dorr
conference of the international speech communication association | 2005
Mohamed Afify; Long Nguyen; Bing Xiang; Sherif M. Abdou; John Makhoul
2004 Rich Transcriptions Workshop, Pallisades, NY | 2003
Long Nguyen; Sherif M. Abdou; Mohamed Afify; John Makhoul; Spyros Matsoukas; Richard G. Schwartz; Bing Xiang; Lori Lamel; Jean-Luc Gauvain; Gilles Adda; Holger Schwenk; Fabrice Lefèvre
conference of the international speech communication association | 2005
John Makhoul; Alex Baron; Ivan Bulyko; Long Nguyen; Lance A. Ramshaw; David Stallard; Richard M. Schwartz; Bing Xiang
conference of the international speech communication association | 2005
Long Nguyen; Bing Xiang; Mohamed Afify; Sherif M. Abdou; Spyros Matsoukas; Richard M. Schwartz; John Makhoul