Fu-Hua Liu
IBM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fu-Hua Liu.
Journal of the Acoustical Society of America | 2004
Fu-Hua Liu; Michael Picheny
A system and method for voice activity detection, in accordance with the invention includes the steps of inputting data including frames of speech and noise, and deciding if the frames of the input data include speech or noise by employing a log-likelihood ratio test statistic and pitch. The frames of the input data are tagged based on the log-likelihood ratio test statistic and pitch characteristics of the input data as being most likely noise or most likely speech. The tags are counted in a plurality of frames to determine if the input data is speech or noise.
IEEE Transactions on Audio, Speech, and Language Processing | 2006
Liang Gu; Yuqing Gao; Fu-Hua Liu; Michael Picheny
The IBM Multilingual Automatic Speech-To-Speech TranslatOR (MASTOR) system is a research prototype developed for the Defense Advanced Research Projects Agency (DARPA) Babylon/CAST speech-to-speech machine translation program. The system consists of cascaded components of large-vocabulary conversational spontaneous speech recognition, statistical machine translation, and concatenative text-to-speech synthesis. To achieve highly accurate and robust conversational spoken language translation, a unique concept-based speech-to-speech translation approach is proposed that performs the translation by first understanding the meaning of the automatically recognized text. A decision-tree based statistical natural language understanding algorithm extracts the semantic information from the input sentences, while a natural language generation (NLG) algorithm predicts the translated text via maximum-entropy-based statistical models. One critical component in our statistical NLG approach is natural concept generation (NCG). The goal of NCG is not only to generate the correct set of concepts in the target language, but also to produce them in an appropriate order. To improve maximum-entropy-based concept generation, a set of new approaches is proposed. One approach improves concept sequence generation in the target language via forward–backward modeling, which selects the hypothesis with the highest combined conditional probability based on both the forward and backward generation models. This paradigm allows the exploration of both the left and right context information in the source and target languages during concept generation. Another approach selects bilingual features that enable maximum-entropy-based model training on the preannotated parallel corpora. This feature is augmented with word-level information in order to achieve higher NCG accuracy while minimizing the total number of distinct concepts and, hence, greatly reducing the concept annotation and natural language understanding effort. These features are further expanded to multiple sets to enhance model robustness. Finally, a confidence threshold is introduced to alleviate data sparseness problems in our training corpora. Experiments show a dramatic concept generation error rate reduction of more than 40% in our speech translation corpus within limited domains. Significant improvements of both word error rate and BiLingual Evaluation Understudy (BLEU) score are also achieved in our experiments on speech-to-speech translation.
international conference on acoustics speech and signal processing | 1996
Fu-Hua Liu; Michael Picheny; Patibandla Srinivasa; Michael Daniel Monkowski; C. Julian Chen
We describe IBMs most recent efforts for speech recognition on a conversational-speech database, the Mandarin Call Home corpus. While it is similar to the well-known Switchboard corpus, the Call Home task addresses several major challenges in the domain of spoken language systems, including spontaneous dialogue with no pre-specified topics, limited-bandwidth telephone signal, and recognition of other languages than English. We particularly describe the methodology used in Mandarin Call Home corpus to address language-specific issues. We also examine and compare our results with those of the English Switchboard corpus. Preliminary experiments show that a 58.7% character error rate can be achieved in the context of April 95 Mandarin Call Home data set. The experimental results are comparable to those of the state-of-the-art IBM Switchboard system with similar amount of training data.
international conference on acoustics, speech, and signal processing | 2003
Fu-Hua Liu; Liang Gu; Yuqing Gao; Michael Picheny
Various language modeling issues in a speech-to-speech translation system are described in this paper. First, the language models for the speech recognizer need to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. Second, when a maximum entropy based statistical natural language generation model is used to generate target language sentence as the translation output, serious inflection and synonym issues arise, because the compromised solution is used in semantic representation to avoid the data sparseness problem. We use N-gram models as a postprocessing step to enhance the generation performance. When an interpolated language model is applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improves substantially to 0.514 from 0.318 when we use the correct transcription as input. Similarly, the BLEU score is improved to 0.300 from 0.194 for the same task when the input is speech data.
International Journal of Speech Technology | 2004
Fu-Hua Liu; Liang Gu; Yuqing Gao; Michael Picheny
This paper describes various language modeling issues in a speech-to-speech translation system. These issues are addressed in the IBM speech-to-speech system we developed for the DARPA Babylon program in the context of two-way translation between English and Mandarin Chinese. First, the language models for the speech recognizer had to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. This involved considerations of disfluencies and lack of punctuation, as well as domain-specific utterances. Second, we used a hybrid semantic/syntactic representation to minimize the data sparseness problem in a statistical natural language generation framework. Serious inflection and synonym issues arise when words in the target language are to be determined in the translation output. Instead of relying on tedious handcrafted grammar rules, we used N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model was applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improved substantially to 0.514 from 0.318 when we used the correct transcription as input. Similarly, the BLEU score improved to 0.300 from 0.194 for the same task when the input was speech data.
Archive | 1996
Chengjun Julian Chen; Fu-Hua Liu; Michael Picheny
Journal of the Acoustical Society of America | 1998
Fu-Hua Liu; Michael Picheny
Archive | 2008
Yuqing Gao; Liang Gu; Fu-Hua Liu
conference of the international speech communication association | 2003
Fu-Hua Liu; Yuqing Gao; Liang Gu; Michael Picheny
conference of the international speech communication association | 1998
Fu-Hua Liu; Michael Picheny