Hsiao-Wuen Hon | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hsiao-Wuen Hon is active.

Explore More

Publication

Featured researches published by Hsiao-Wuen Hon.

Computer Speech & Language | 1992

The SPHINX-II Speech Recognition System: An Overview

Xuedong Huang; Fileno A. Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld

In order for speech recognizers to deal with increased task perplexity, speaker variation, and environment variation, improved speech recognition is critical. Steady progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.

international conference on acoustics, speech, and signal processing | 1991

CMU robust vocabulary-independent speech recognition system

Hsiao-Wuen Hon; Kai-Fu Lee

Efforts to improve the performance of CMUs robust vocabulary-independent (VI) speech recognition systems on the DARPA speaker-independent resource management task are discussed. The improvements are evaluated on 320 sentences randomly selected from the DARPA June 88, February 89, and October 89 test sets. The first improvement involves more detailed acoustic modeling. The authors incorporated more dynamic features computed from the LPC cepstra and reduced error by 15% over the baseline system. The second improvement comes from a larger database. With more training data, the third improvement comes from a more detailed subword modeling. The authors incorporated the word boundary context into their VI subword modeling and it resulted in a 30% error reduction. Decision-tree allophone clustering was used to find more suitable models for the subword units not covered in the training set and further reduced error by 17%.<<ETX>>

web search and data mining | 2013

What's in a name?: an unsupervised approach to link users across communities

Jing Liu; Fan Zhang; Xinying Song; Young-In Song; Chin-Yew Lin; Hsiao-Wuen Hon

In this paper, we consider the problem of linking users across multiple online communities. Specifically, we focus on the alias-disambiguation step of this user linking task, which is meant to differentiate users with the same usernames. We start quantitatively analyzing the importance of the alias-disambiguation step by conducting a survey on 153 volunteers and an experimental analysis on a large dataset of About.me (75,472 users). The analysis shows that the alias-disambiguation solution can address a major part of the user linking problem in terms of the coverage of true pairwise decisions (46.8%). To the best of our knowledge, this is the first study on human behaviors with regards to the usages of online usernames. We then cast the alias-disambiguation step as a pairwise classification problem and propose a novel unsupervised approach. The key idea of our approach is to automatically label training instances based on two observations: (a) rare usernames are likely owned by a single natural person, e.g. pennystar88 as a positive instance; (b) common usernames are likely owned by different natural persons, e.g. tank as a negative instance. We propose using the n-gram probabilities of usernames to estimate the rareness or commonness of usernames. Moreover, these two observations are verified by using the dataset of Yahoo! Answers. The empirical evaluations on 53 forums verify: (a) the effectiveness of the classifiers with the automatically generated training data and (b) that the rareness and commonness of usernames can help user linking. We also analyze the cases where the classifiers fail.

international conference on spoken language processing | 1996

Whistler: a trainable text-to-speech system

Xuedong Huang; Alex Acero; Jim Adcock; Hsiao-Wuen Hon; John Goldsmith; Jingsong Liu; Mike Plumpe

We introduce Whistler, a trainable text to speech (TTS) system that automatically learns the model parameters from a corpus. Both prosody parameters and concatenative speech units are derived through the use of probabilistic learning methods that have been successfully used for speech recognition. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style.

european conference on information retrieval | 2008

Viewing term proximity from a different perspective

Ruihua Song; Michael J. Taylor; Ji-Rong Wen; Hsiao-Wuen Hon; Yong Yu

This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.

international conference on acoustics speech and signal processing | 1998

Automatic generation of synthesis units for trainable text-to-speech systems

Hsiao-Wuen Hon; Alex Acero; Xuedong Huang; Jingsong Liu; Mike Plumpe

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper describes in detail the design issues of constructing the synthesis unit inventory automatically from speech databases. The automatic process includes (1) determining the scaleable synthesis unit which can reflect spectral variations of different allophones; (2) segmenting the recording sentences into phonetic segments; (3) select good instances for each synthesis unit to generate best synthesis sentence during the run time. These processes are all derived through the use of probabilistic learning methods which are aimed at the same optimization criteria. Through this automatic unit generation, Whistler can automatically produce synthetic speech that sounds very natural and resembles the acoustic characteristics of the original speaker.

international conference on acoustics, speech, and signal processing | 1997

Recent improvements on Microsoft's trainable text-to-speech system-Whistler

Xuedong Huang; Alex Acero; Hsiao-Wuen Hon; Yun-Cheng Ju; Jingsong Liu; Scott Meredith; Mike Plumpe

The Whistler text-to-speech engine was designed so that we can automatically construct the model parameters from training data. This paper focuses on the improvements on prosody and acoustic modeling, which are all derived through the use of probabilistic learning methods. Whistler can produce synthetic speech that sounds very natural and resembles the acoustic and prosodic characteristics of the original speaker. The underlying technologies used in Whistler can significantly facilitate the process of creating generic TTS systems for a new language, a new voice, or a new speech style. Whisper TTS engine supports Microsoft Speech API and requires less than 3 MB of working memory.

international acm sigir conference on research and development in information retrieval | 2007

Cross-lingual query suggestion using query logs of different languages

Wei Gao; Cheng Niu; Jian-Yun Nie; Ming Zhou; Jian Hu; Kam-Fai Wong; Hsiao-Wuen Hon

Query suggestion aims to suggest relevant queries for a given query, which help users better specify their information needs. Previously, the suggested terms are mostly in the same language of the input query. In this paper, we extend it to cross-lingual query suggestion (CLQS): for a query in one language, we suggest similar or relevant queries in other languages. This is very important to scenarios of cross-language information retrieval (CLIR) and cross-lingual keyword bidding for search engine advertisement. Instead of relying on existing query translation technologies for CLQS, we present an effective means to map the input query of one language to queries of the other language in the query log. Important monolingual and cross-lingual information such as word translation relations and word co-occurrence statistics, etc. are used to estimate the cross-lingual query similarity with a discriminative model. Benchmarks show that the resulting CLQS system significantly out performs a baseline system based on dictionary-based query translation. Besides, the resulting CLQS is tested with French to English CLIR tasks on TREC collections. The results demonstrate higher effectiveness than the traditional query translation methods.

international conference on acoustics, speech, and signal processing | 1990

Allophone clustering for continuous speech recognition

Kai-Fu Lee; Satoru Hayamizu; Hsiao-Wuen Hon; Cecil Huang; Jonathan Swartz; Robert Weide

Two methods are presented for subword clustering. The first method is an agglomerative clustering algorithm. This method is completely data-driven and finds clusters without any external guidance. The second method uses decision trees for clustering. This method uses an expert-generated list of questions about contexts and recursively selects the most appropriate question to split the allophones. Preliminary results showed that when the training set has a good coverage of the allophonic variations in the test set, both method are capable of high-performance recognition. However, under vocabulary-independent conditions, the method using tree-based allophones outperformed agglomerative clustering because of its superior generalization capability.<<ETX>>

international conference on acoustics, speech, and signal processing | 1990

On semi-continuous hidden Markov modeling

Xuedong Huang; Kai-Fu Lee; Hsiao-Wuen Hon

The semicontinuous hidden Markov model is used in a 1000-word speaker-independent continuous speech recognition system and compared with the continuous mixture model and the discrete model. When the acoustic parameter is not well modeled by the continuous probability density, it is observed that the model assumption problems may cause the recognition accuracy of the semicontinuous model to be inferior to the discrete model. A simple method based on the semicontinuous model is investigated, to re-estimate the vector quantization codebook without continuous probability density function assumptions. Preliminary experiments show that such reestimation methods are as effective as the semicontinuous model, especially when the continuous probability density function assumption is inappropriate.<<ETX>>

Explore More