Fu-Hua Liu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fu-Hua Liu is active.

Explore More

Publication

Featured researches published by Fu-Hua Liu.

Journal of the Acoustical Society of America | 2004

Model-based voice activity detection system and method using a log-likelihood ratio and pitch

Fu-Hua Liu; Michael Picheny

A system and method for voice activity detection, in accordance with the invention includes the steps of inputting data including frames of speech and noise, and deciding if the frames of the input data include speech or noise by employing a log-likelihood ratio test statistic and pitch. The frames of the input data are tagged based on the log-likelihood ratio test statistic and pitch characteristics of the input data as being most likely noise or most likely speech. The tags are counted in a plurality of frames to determine if the input data is speech or noise.

IEEE Transactions on Audio, Speech, and Language Processing | 2006

Concept-Based Speech-to-Speech Translation Using Maximum Entropy Models for Statistical Natural Concept Generation

Liang Gu; Yuqing Gao; Fu-Hua Liu; Michael Picheny

The IBM Multilingual Automatic Speech-To-Speech TranslatOR (MASTOR) system is a research prototype developed for the Defense Advanced Research Projects Agency (DARPA) Babylon/CAST speech-to-speech machine translation program. The system consists of cascaded components of large-vocabulary conversational spontaneous speech recognition, statistical machine translation, and concatenative text-to-speech synthesis. To achieve highly accurate and robust conversational spoken language translation, a unique concept-based speech-to-speech translation approach is proposed that performs the translation by first understanding the meaning of the automatically recognized text. A decision-tree based statistical natural language understanding algorithm extracts the semantic information from the input sentences, while a natural language generation (NLG) algorithm predicts the translated text via maximum-entropy-based statistical models. One critical component in our statistical NLG approach is natural concept generation (NCG). The goal of NCG is not only to generate the correct set of concepts in the target language, but also to produce them in an appropriate order. To improve maximum-entropy-based concept generation, a set of new approaches is proposed. One approach improves concept sequence generation in the target language via forward–backward modeling, which selects the hypothesis with the highest combined conditional probability based on both the forward and backward generation models. This paradigm allows the exploration of both the left and right context information in the source and target languages during concept generation. Another approach selects bilingual features that enable maximum-entropy-based model training on the preannotated parallel corpora. This feature is augmented with word-level information in order to achieve higher NCG accuracy while minimizing the total number of distinct concepts and, hence, greatly reducing the concept annotation and natural language understanding effort. These features are further expanded to multiple sets to enhance model robustness. Finally, a confidence threshold is introduced to alleviate data sparseness problems in our training corpora. Experiments show a dramatic concept generation error rate reduction of more than 40% in our speech translation corpus within limited domains. Significant improvements of both word error rate and BiLingual Evaluation Understudy (BLEU) score are also achieved in our experiments on speech-to-speech translation.

international conference on acoustics speech and signal processing | 1996

Speech recognition on Mandarin Call Home: a large-vocabulary, conversational, and telephone speech corpus

Fu-Hua Liu; Michael Picheny; Patibandla Srinivasa; Michael Daniel Monkowski; C. Julian Chen

We describe IBMs most recent efforts for speech recognition on a conversational-speech database, the Mandarin Call Home corpus. While it is similar to the well-known Switchboard corpus, the Call Home task addresses several major challenges in the domain of spoken language systems, including spontaneous dialogue with no pre-specified topics, limited-bandwidth telephone signal, and recognition of other languages than English. We particularly describe the methodology used in Mandarin Call Home corpus to address language-specific issues. We also examine and compare our results with those of the English Switchboard corpus. Preliminary experiments show that a 58.7% character error rate can be achieved in the context of April 95 Mandarin Call Home data set. The experimental results are comparable to those of the state-of-the-art IBM Switchboard system with similar amount of training data.

international conference on acoustics, speech, and signal processing | 2003

Use of statistical N-gram models in natural language generation for machine translation

Fu-Hua Liu; Liang Gu; Yuqing Gao; Michael Picheny

Various language modeling issues in a speech-to-speech translation system are described in this paper. First, the language models for the speech recognizer need to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. Second, when a maximum entropy based statistical natural language generation model is used to generate target language sentence as the translation output, serious inflection and synonym issues arise, because the compromised solution is used in semantic representation to avoid the data sparseness problem. We use N-gram models as a postprocessing step to enhance the generation performance. When an interpolated language model is applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improves substantially to 0.514 from 0.318 when we use the correct transcription as input. Similarly, the BLEU score is improved to 0.300 from 0.194 for the same task when the input is speech data.

International Journal of Speech Technology | 2004

Applications of Language Modeling in Speech-To-Speech Translation

Fu-Hua Liu; Liang Gu; Yuqing Gao; Michael Picheny

This paper describes various language modeling issues in a speech-to-speech translation system. These issues are addressed in the IBM speech-to-speech system we developed for the DARPA Babylon program in the context of two-way translation between English and Mandarin Chinese. First, the language models for the speech recognizer had to be adapted to the specific domain to improve the recognition performance for in-domain utterances, while keeping the domain coverage as broad as possible. This involved considerations of disfluencies and lack of punctuation, as well as domain-specific utterances. Second, we used a hybrid semantic/syntactic representation to minimize the data sparseness problem in a statistical natural language generation framework. Serious inflection and synonym issues arise when words in the target language are to be determined in the translation output. Instead of relying on tedious handcrafted grammar rules, we used N-gram models as a post-processing step to enhance the generation performance. When an interpolated language model was applied to a Chinese-to-English translation task, the translation performance, measured by an objective metric of BLEU, improved substantially to 0.514 from 0.318 when we used the correct transcription as input. Similarly, the BLEU score improved to 0.300 from 0.194 for the same task when the input was speech data.

Archive | 1996