Marcia A. Bush
Fairchild Semiconductor International, Inc.
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marcia A. Bush.
IEEE Transactions on Acoustics, Speech, and Signal Processing | 1985
Gary E. Kopec; Marcia A. Bush
This paper describes a network-based approach to speaker-independent digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated in an extensive series of speaker-independent isolated digit (one-nine, oh and zero) recognition experiments using a 225-talker. multidialect database developed by Texas Instruments (TI). The best recognizer configurations achieved accuracies of 97-99 percent on the TI database.
international conference on acoustics, speech, and signal processing | 1985
Marcia A. Bush; Gary E. Kopec
This paper describes a network-based approach to speaker-independent connected digit recognition. The digits are modeled by a pronunciation network whose arcs represent classes of acoustic-phonetic segments. Each arc is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra and the use of gross acoustic features for pruning. Recognition involves finding a minimum quantization distortion path through the network by dynamic programming. The system has been evaluated using a portion of a large multi-dialect database developed by Texas Instruments (TI). Using a baseline network of concatenated independent digit models, string and digit accuracies of 86% and 97% respectively have been obtained.
international conference on acoustics, speech, and signal processing | 1983
Marcia A. Bush; Gary E. Kopec; Victor W. Zue
A series of experiments was performed in order to select a set of acoustic measurements for use as input to an expert system for stop consonant recognition. In the experiments, a trained human spectrogram reader made six-way (/b,d,g,p,t,k/) classifications of syllable-initial stops using four different data representations: DFT spectrograms, LPC spectrograms, LPC spectral slices and tables of numerical measurements. Percent correct identification was 79%, 81%, 72% and 76%, respectively, for the four data sets. The relatively high performance achieved using the numerical measurements, together with other considerations for selecting input representations for expert systems, suggest that the numerical tables are the most appropriate of the four forms of input.
international conference on acoustics, speech, and signal processing | 1984
Marcia A. Bush; Gary E. Kopec; Niels Lauritzen
Two types of isolated digit recognition systems based on vector quantization were tested in a speaker-independent task. In both types of systems, a digit was modelled as a sequence of codebooks generated from segments of training data. In systems of the first type, the training and unknown utterances were simply partitioned into 1, 2 or 3 equal-length segments. Recognition involved computing the distortion when the input spectra were vector quantized using the codebook sequences. These systems are closely related to recognizers proposed by Burton et al.[1]. In systems of the second type, training segments corresponded to acoustic-phonetic units and were obtained from hand-marked data. Recognition involved generating a minimum-distortion segmentation of the unknown by dynamic programming. Accuracies approaching 96-97% were achieved by both types of systems.
Journal of the Acoustical Society of America | 1990
Gary N. Tajchman; Marcia A. Bush
Vowels excised from the phonetically and dialectically rich DARPA TIMIT acoustic‐phonetic database [L. Lamel et al., Proc., DARPA Speech Recognition Workshop, Rpt. No. SAIC‐86/1546, 100–109 (1986)] were presented to listeners in an identification task. The identification data were gathered both to serve as a performance benchmark for a set of automatic speech recognition experiments, and to investigate some of the acoustic and linguistic factors that influence the perception of naturally produced vowels. Vowels were presented to listeners in isolation (i.e., no context), in CVC context and in syllabic context. Differences in percent correct were calculated between no‐context and context conditions, where correct was defined as agreement with vowel transcriptions supplied with the TIMIT database. Preliminary analyses of the identification data indicate that: (1) overall improvements in identification accuracy were comparable for both CVC and syllabic contexts (approximately 11%, on average, above the overall no‐context identification accuracy of 60%); (2) improvements were approximately four times larger for lax vowels than for tense vowels; and (3) identification consistency among listeners was greater than between listeners and the “correct” phonetic transcription. Acoustic analyses of the vowel tokens that are currently being performed, in order to identify spectral and/or durational attributes that correlate with the identification data.Vowels excised from the phonetically and dialectically rich DARPA TIMIT acoustic‐phonetic database [L. Lamel et al., Proc., DARPA Speech Recognition Workshop, Rpt. No. SAIC‐86/1546, 100–109 (1986)] were presented to listeners in an identification task. The identification data were gathered both to serve as a performance benchmark for a set of automatic speech recognition experiments, and to investigate some of the acoustic and linguistic factors that influence the perception of naturally produced vowels. Vowels were presented to listeners in isolation (i.e., no context), in CVC context and in syllabic context. Differences in percent correct were calculated between no‐context and context conditions, where correct was defined as agreement with vowel transcriptions supplied with the TIMIT database. Preliminary analyses of the identification data indicate that: (1) overall improvements in identification accuracy were comparable for both CVC and syllabic contexts (approximately 11%, on average, above the overa...
Journal of the Acoustical Society of America | 1984
Marcia A. Bush; Gary E. Kopec; Marie Hamilton
This talk will describe a network‐based system for speaker‐independent, isolated‐digit (one‐nine, oh, and zero) recognition and will discuss the results of an extensive series of system tuning and evaluation experiments. The digits are modeled by pronunciation networks whose ares represent classes of acoustic‐phonetic segments. Each are is associated with a matcher for rating an input speech interval as an example of the corresponding segment class. The matchers are based on vector quantization of LPC spectra. Recognition involves finding minimum quantization distortion paths through the networks by dynamic programming. The system has been tested using nearly 6000 tokens of speech by 250 talkers, including a subset of a large database developed by Texas Instruments [G. Leonard, Proc. 1984 IEEE ICASSP]. The best recognizer configurations achieved accuracies of 97–99%. Performance over 21 geographically defined talker groups included in the TI database will be discussed.
Journal of the Acoustical Society of America | 1983
Marcia A. Bush; Marie Hamilton; Kazue Hata
To date, relatively few attempts have been made to explicitly incorporate coarticulatory information in the design of automatic systems for connected digit recognition. One reason for the limited progress in this area has been the lack of a convenient description of the relevant coarticulatory phenomena, culled from systematic analyses of large amounts of real speech data. This paper describes the initial stage of an ongoing project aimed at remedying this situation. The paper is based on computer‐aided analyses of waveforms and spectrograms of 1180 isolated digits and connected digit strings, as produced by two male and two female speakers of American English. The acoustic data were supplemented with phonetic transcriptions by trained linguists. Coarticulatory phenomena are categorized according to: (1) word‐boundary effects (e.g., phoneme insertions and deletions); (2) within‐word effects (e.g., context‐dependent changes in vowel formant frequencies); and (3) consistent speaker‐dependent effects (e.g., ...
Journal of the Acoustical Society of America | 1983
Gary E. Kopec; Marcia A. Bush
Recent experiments [M. Bush et al., Proc. 1983 IEEE ICASSP] have demonstrated the ability of a trained spectrogram reader to identify initial stops in /CVb/ syllables from a table of numerical acoustic measurements with approximately 80% accuracy. This paper discusses an automatic system for discriminating between the voiceless plosives (/p,t,k/)which is based on the features and rules identified in these experiments. Ten binary features are extracted from two linear prediction spectra which are computed during the 35 ms following the consonant release. Typical features include “back‐k‐release‐spectrum” and “compact‐release‐spectrum.” The features are detected by examining the frequencies and amplitudes of the local maxima and minima of the two LPC spectra, in a manner motivted by the actions of the human spectrogram reader. A simple statistical classifier is used to combine the outputs of the ten feature detectors. The classifier was trained on the 108 /p,t,k/ tokens of the multi‐speaker corpus used in t...
Journal of the Acoustical Society of America | 1982
Gary E. Kopec; Marcia A. Bush; Victor W. Zue
The adequacies of three sets of acoustic measurements for stop consonant recognition were assessed by examining the performance of a trained human spectrogram reader in a stop identification task. The data sets consisted principally of(1) gray‐level pictures (wideband spectrograms); (2) two‐dimensional line drawings (LPC spectra and short‐time energy contours); and (3) tables of numerical measurements (e.g., frequencies, amplitudes, and bandwidths of spectral peaks). The readers task was to identify the initial consonant in /CVb/ syllables selected from a corpus containing all combinations of the six stop consonants /b,d,g,p,t,k/ with the six vowels /i,e,ae,a,o,u/ as spoken by three male and three female talkers. The objective of the experiment was to identify an appropriate set of measurements for use in a knowledge‐based expert system for automatic stop consonant identification. Results of the experiment will be presented and discussed in the context of this objective.
Archive | 1992
Ron Cole; Lynette Hirschman; Les E. Atlas; Mary E. Beckman; Alan W. Bierman; Marcia A. Bush; Mark A. Clements; Jordan Cohen; Oscar N. Garcia; Brian A. Hanson; Hynek Hermansky; Steve Levinson; Kathy McKeown; Nelson Morgan; David G. Novick; Mari Ostendorf; Sharon L. Oviatt; Patti Price; Harvey F. Silverman; Judy Spitz; Alex Waibel; Cliff Weinstein; Steve Zahorian; Victor W. Zue