Herbert Gish
BBN Technologies
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Herbert Gish.
international conference on acoustics, speech, and signal processing | 1993
Herbert Gish; Kenney Ng
The authors present a segmental speech model that explicitly models the dynamics in a variable-duration speech segment by using a time-varying trajectory model of the speech features in the segment. Each speech segment is represented by a set of statistics which includes a time-varying trajectory, a residual error covariance around the trajectory, and the number of frames in the segment. These statistics replace the frames in the segment and become the data that are modeled by either HMMs (hidden Markov models) or mixture models. This segment model is used to develop a secondary processing algorithm that rescores putative events hypothesized by a primary HMM word spotter to try to improve performance by discriminating true keywords from false alarms. This algorithm is evaluated on a keyword spotting task using the Road Rally Database, and performance is shown to improve significantly over that of the primary word spotter. The segmental model is also used on a TIMIT vowel classification task to evaluate its modeling capability.<<ETX>>
international conference on acoustics, speech, and signal processing | 1992
Man-Hung Siu; George Yu; Herbert Gish
The authors present a method for segmenting speech waveforms containing several speakers into utterances, each from one individual, and then identifying each utterance as coming from a specific individual or group of individuals. The procedure is unsupervised in that there is no training set, and sequential in that information obtained in early stages of the process is utilized in later stages.<<ETX>>
international conference on acoustics, speech, and signal processing | 1990
Herbert Gish
The development of a robust discrimination technique that does not eliminate outliers or perform robust estimation of model parameters but instead applies robustness techniques in the discrimination process is presented. Results showing the performances of robust and nonrobust methods are presented and compared. The database used for the experiments contains 16 speakers and the speech for each of the 16 speakers was recorded on 10 different occasions over long-distance telephone circuits.<<ETX>>
Computer Speech & Language | 1999
Man-Hung Siu; Herbert Gish
Abstract Confidence measures enable us to assess the output of a speech recognition system. The confidence measure provides us with an estimate of the probability that a word in the recognizer output is either correct or incorrect. In this paper we discuss ways in which to quantify the performance of confidence measures in terms of their discrimination power and bias. In particular, we analyze two different performance metrics: the classification equal error rate and the normalized mutual information metric. We then report experimental results of using these metrics to compare four different confidence measure estimation schemes. We also discuss the relationship between these metrics and the operating point of the speech recognition system and develop an approach to the robust estimation of normalized mutual information.
international conference on acoustics, speech, and signal processing | 2004
Richard M. Schwartz; Thomas Colthurst; Nicolae Duta; Herbert Gish; Rukmini Iyer; Chia-Lin Kao; Daben Liu; Owen Kimball; Jeff Z. Ma; John Makhoul; Spyros Matsoukas; Long Nguyen; Mohammed Noamany; Rohit Prasad; Bing Xiang; Dongxin Xu; Jean-Luc Gauvain; Lori Lamel; Holger Schwenk; Gilles Adda; Langzhou Chen
We report on the results of the first evaluations for the BBN/LIMSI system under the new DARPA EARS program. The evaluations were carried out for conversational telephone speech (CTS) and broadcast news (BN) for three languages: English, Mandarin, and Arabic. In addition to providing system descriptions and evaluation results, the paper highlights methods that worked well across the two domains and those few that worked well on one domain but not the other. For the BN evaluations, which had to be run under 10 times real-time, we demonstrated that a joint BBN/LIMSI system with a time constraint achieved better results than either system alone.
2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006
Lu-feng Zhai; Man-Hung Siu; Xi Yang; Herbert Gish
In this paper, we explore the use of the support vector machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus
Computer Speech & Language | 2014
Man-Hung Siu; Herbert Gish; Arthur Chan; William Belfield; Steve Lowe
We present our approach to unsupervised training of speech recognizers. Our approach iteratively adjusts sound units that are optimized for the acoustic domain of interest. We thus enable the use of speech recognizers for applications in speech domains where transcriptions do not exist. The resulting recognizer is a state-of-the-art recognizer on the optimized units. Specifically we propose building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space. Audio is then transcribed into these self-organizing units (SOUs). We describe how SOU training can be easily implemented using existing HMM recognition tools. We tested the effectiveness of SOUs on the task of topic classification on the Switchboard and Fisher corpora. On the Switchboard corpus, the unsupervised HMM-based SOU recognizer, initialized with a segmental tokenizer, performed competitively with an HMM-based phoneme recognizer trained with 1h of transcribed data, and outperformed the Brno University of Technology (BUT) Hungarian phoneme recognizer (Schwartz et al., 2004). We also report improvements, including the use of context dependent acoustic models and lattice-based features, that together reduce the topic verification equal error rate from 12% to 7%. In addition to discussing the effectiveness of the SOU approach, we describe how we analyzed some selected SOU n-grams and found that they were highly correlated with keywords, demonstrating the ability of the SOU technology to discover topic relevant keywords.
international conference on acoustics, speech, and signal processing | 1986
Herbert Gish; M. Krasner; W. Russell; Jared J. Wolf
We consider methods for text-independent speaker identification that deal with the variability in the data introduced by unknown telephone channels. The methods investigated include probabilistic channel modeling, a channel-invariant model and a modified-Gaussian model. The methods are described and then evaluated with experiments conducted with a twenty speaker database of long distance telephone calls.
international conference on acoustics, speech, and signal processing | 1985
Herbert Gish; Kenneth F. Karnofsky; M. Krasner; S. Roucos; Richard M. Schwartz; Jared J. Wolf
In this paper, we examine several methods for text-independent speaker identification of telephone speech with limited duration data, The issue addressed is the assessment of channel characteristics, especially linear aspects, and methods for improving speaker identification performance when the speaker to be identified is on a different telephone channel than that data used for training. We show experimental evidence illustrating the cross-channel problem and also show that the direct approach, of using simple channel-invariant features, can discard much speaker dependent information. The methods we have found to be most effective rely on the training process to incorporate channel variability.
international conference on acoustics, speech, and signal processing | 1993
George Yu; Herbert Gish
The approach developed is based on the robust evaluation of likelihoods based on speech segments. The method shows that speakers can be identified with minimal loss of performance in the presence of large amounts of undesired speech. The authors consider the case where there are models for only one of the two speakers and the case where one is interested in identifying both speakers. The role that clustering can play in this problem is also discussed. It is demonstrated that robust identification methods can be very effective in obtaining high correct identification rates. The methods presented should have applicability to the situation of multiple (more than two) speakers engaged in a conference as well as those speaker identification situations where interference is a problem. Clustering is seen to be a viable alternative to robust methods and can be effective without normalization.<<ETX>>