Man-Hung Siu
BBN Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Man-Hung Siu.
international conference on acoustics, speech, and signal processing | 1991
Herbert Gish; Man-Hung Siu; R. Rohlicek
A method for segregating speech from speakers engaged in dialogs is described. The method, assuming no prior knowledge of the speakers, employs a distance measure between speech segments used in conjunction with a clustering algorithm to perform the segregation. Properties of the distance measure are discussed, and an air traffic control application is described.<<ETX>>
IEEE Journal on Selected Areas in Communications | 2007
Tao Li; Wai Ho Mow; Vincent Kin Nang Lau; Man-Hung Siu; Roger Shu Kwan Cheng; Ross David Murch
Cognitive radio technology facilitates spectrum reuse and alleviates spectrum crunch. One fundamental problem in cognitive radio is to avoid the interference caused by other communication systems sharing the same frequency band. However, spectrum sensing cannot guarantee accurate detection of the interference in many practical situations. Hence, it is crucial to design robust receivers to combat the in-band interference. In this paper, we first present a simple pilot aided interference detection method. To combat the residual interference that cannot be detected by the interference detector, we further propose a robust joint interference detection and decoding scheme. By exploiting the code structure in interference detection, the proposed scheme can successfully detect most of the interfered symbols without requiring the knowledge of the interference distribution. Our simulation results show that, even without any prior knowledge of the interference distribution, the proposed joint interference detection and decoding scheme is able to achieve a performance close to that of the maximum likelihood decoder with the full knowledge of the interference distribution
IEEE Transactions on Speech and Audio Processing | 2000
Man-Hung Siu; Mari Ostendorf
Recent progress in variable n-gram language modeling provides an efficient representation of n-gram models and makes training of higher order n grams possible. We apply the variable n-gram design algorithm to conversational speech, extending the algorithm to learn skips and context-dependent classes to handle conversational speech characteristics such as filler words, repetitions, and other disfluencies. Experiments show that using the extended variable n-gram results in a language model that captures 4-gram context with less than half the parameters of a standard trigram while also improving the test perplexity and recognition accuracy.
international conference on acoustics, speech, and signal processing | 1992
Man-Hung Siu; George Yu; Herbert Gish
The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals. The procedure is unsupervised in that there is no training set, and sequential in that information obtained in early stages of the process is utilized in later stages.<<ETX>>
ACM Transactions on Speech and Language Processing | 2007
Ivan Bulyko; Mari Ostendorf; Man-Hung Siu; Tim Ng; Andreas Stolcke; Özgür Çetin
This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.
international conference on acoustics, speech, and signal processing | 1993
Jan Robin Rohlicek; Philippe Jeanrenaud; Kenney Ng; Herbert Gish; B. Musicus; Man-Hung Siu
The authors present a view of HMM (hidden Markov model)-based word spotting systems as described by three main components: the HMM acoustic model; the overall HMM structure, including nonkeyword modeling; and the keyword scoring method. They investigate and present comparative results for various approaches to each of these components and show that design choices for these components can be addressed separately. They also present a novel approach to word spotting that combines phonetic training, large vocabulary modeling, and statistical language modeling with a posterior probability approach to keyword scoring. They perform word spotting experiments using telephone quality conversational speech from the Switchboard corpus to examine the effect of different design choices for the three components and demonstrate that the proposed approach provides superior performance to previously used techniques.<<ETX>>
Computer Speech & Language | 1999
Man-Hung Siu; Herbert Gish
Abstract Confidence measures enable us to assess the output of a speech recognition system. The confidence measure provides us with an estimate of the probability that a word in the recognizer output is either correct or incorrect. In this paper we discuss ways in which to quantify the performance of confidence measures in terms of their discrimination power and bias. In particular, we analyze two different performance metrics: the classification equal error rate and the normalized mutual information metric. We then report experimental results of using these metrics to compare four different confidence measure estimation schemes. We also discuss the relationship between these metrics and the operating point of the speech recognition system and develop an approach to the robust estimation of normalized mutual information.
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006
Lu-feng Zhai; Man-Hung Siu; Xi Yang; Herbert Gish
In this paper, we explore the use of the support vector machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus
Computer Speech & Language | 2014
Man-Hung Siu; Herbert Gish; Arthur Chan; William Belfield; Steve Lowe
We present our approach to unsupervised training of speech recognizers. Our approach iteratively adjusts sound units that are optimized for the acoustic domain of interest. We thus enable the use of speech recognizers for applications in speech domains where transcriptions do not exist. The resulting recognizer is a state-of-the-art recognizer on the optimized units. Specifically we propose building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space. Audio is then transcribed into these self-organizing units (SOUs). We describe how SOU training can be easily implemented using existing HMM recognition tools. We tested the effectiveness of SOUs on the task of topic classification on the Switchboard and Fisher corpora. On the Switchboard corpus, the unsupervised HMM-based SOU recognizer, initialized with a segmental tokenizer, performed competitively with an HMM-based phoneme recognizer trained with 1h of transcribed data, and outperformed the Brno University of Technology (BUT) Hungarian phoneme recognizer (Schwartz et al., 2004). We also report improvements, including the use of context dependent acoustic models and lattice-based features, that together reduce the topic verification equal error rate from 12% to 7%. In addition to discussing the effectiveness of the SOU approach, we describe how we analyzed some selected SOU n-grams and found that they were highly correlated with keywords, demonstrating the ability of the SOU technology to discover topic relevant keywords.
international conference on acoustics, speech, and signal processing | 1992
Jan Robin Rohlicek; D. Ayuso; M. Bates; Robert J. Bobrow; Albert Boulanger; Herbert Gish; Philippe Jeanrenaud; Marie Meteer; Man-Hung Siu
A novel system for extracting information from stereotyped voice traffic is described. Off-the-air recordings of commercial air traffic control communications are interpreted in order to identify the flights present and determine the scenario (e.g., takeoff, landing) that they are following. The system combines algorithms from signal segmentation, speaker segregation, speech recognition, natural language parsing, and topic classification into a single system. Initial evaluation of the algorithm on data recorded at Dallas-Fort Worth airport yields performance of 68% detection of flights with 98% precision at an operating point where 76% of the flight identifications are correctly recognized. In tower recording containing both takeoff and landing scenarios, flights are correctly classified as takeoff or landing 94% of the time.<<ETX>>