Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mark A. Fanty is active.

Publication


Featured researches published by Mark A. Fanty.


international conference on acoustics speech and signal processing | 1996

The contribution of consonants versus vowels to word recognition in fluent speech

Ronald A. Cole; Yonghong Yan; Brian Mak; Mark A. Fanty; Troy Bailey

Three perceptual experiments were conducted to test the relative importance of vowels vs. consonants to recognition of fluent speech. Sentences were selected from the TIMIT corpus to obtain approximately equal numbers of vowels and consonants within each sentence and equal durations across the set of sentences. In experiments 1 and 2, subjects listened to (a) unaltered TIMIT sentences; (b) sentences in which all of the vowels were replaced by noise; or (c) sentences in which all of the consonants were replaced by noise. The subjects listened to each sentence five times, and attempted to transcribe what they heard. The results of these experiments show that recognition of words depends more upon vowels than consonants-about twice as many words are recognized when vowels are retained in the speech. The effect was observed when occurrences of [1], [r], [w], [y] [m], [n], were included in the sentences (experiment 1) or replaced by noise (experiment 2). Experiment 3 tested the hypothesis that vowel boundaries contain more information about the neighboring consonants than vice versa.


international conference on spoken language processing | 1996

Rapid unsupervised adaptation to children's speech on a connected-digit task

Daniel C. Burnett; Mark A. Fanty

We are exploring ways in which to rapidly adapt our neural network classifiers to new speakers and conditions using very small amounts of speech, say, one or a few words. Our approach is to perform a speaker-dependent warping of the frequency scale by selecting a Bark offset for each speaker. We choose the offset for a speaker to be the one that maximizes our recognizer output score on the adaptation utterance. We then use the speakers offset during evaluation of all other utterances by the speaker. To test our approach, we evaluate an adult-speech trained recognizer on childrens speech from the same task both before and after adaptation to each childs voice. Using only a single digit for adaptation, we have reduced the word error rate for childrens speech from 9.6% to 4.2%. Using a seven-digit utterance further reduced the error rate to 3.5%.


human language technology | 1990

Spoken letter recognition

Ronald A. Cole; Mark A. Fanty

Automatic recognition of spoken letters is one of the most challenging tasks in the field of computer speech recognition. The difficulty of the task is due to the acoustic similarity of many of the letters. Accurate recognition requires the system to perform fine phonetic distinctions, such as B vs. D, B vs. P, D vs. T, T vs. G, C vs. Z, V vs. Z, M vs. N and J vs. K. The ability to perform fine phonetic distinctions---to discriminate among the minimal sound units of the language---is a fundamental unsolved problem in computer speech recognition.


international conference on acoustics speech and signal processing | 1998

Accessible technology for interactive systems: a new approach to spoken language research

Ronald A. Cole; Stephen Sutton; Yonghong Yan; Pieter Vermeulen; Mark A. Fanty

In this paper, we argue for a paradigm shift in spoken language technology, from transcription tasks to interactive systems. The current paradigm evaluates speech recognition technology in terms of word recognition accuracy on large vocabulary transcription tasks, such as telephone conversations or media broadcasts. Systems are evaluated in international competitions, with strict rules for participation and well-defined evaluation metrics. Participation in these competitions is limited to a few elite laboratories that have the resources to develop and field systems. We propose a new, more productive and more accessible paradigm for spoken language research, in which research advances are evaluated in the context of interactive systems that allow people to perform useful tasks, such as accessing information from the World Wide Web, while driving a car. These systems are made available for daily use by ordinary citizens through telephone networks or placement in easily accessible kiosks in public institutions. It has previously been argued that this new paradigm, which focuses on the goal of universal access to information for all people, better serves the needs of the research community, as well as the welfare of our citizens. We discuss the challenges and rewards of an interactive system approach to spoken language research, and discuss our initial attempts to stimulate a paradigm shift and engage a large community of researchers through free distribution of the CSLU toolkit.


international conference on acoustics, speech, and signal processing | 1993

City name recognition over the telephone

Mark A. Fanty; Philipp Schmid; Ronald A. Cole

The authors present a neural-network-based speech recognition system for telephone speech. A neural network classifier provides phoneme probabilities for each frame of the utterance. A dynamic programming algorithm finds the most probable sequence of words. The classifier was trained on a spoken name corpus which contained the test vocabulary and many other words. The test set consisted of 262 utterances containing 44 cities and 2 states. The best result obtained on the test set was 92.9% word accuracy (90.1% on just the city names). Removing phoneme duration constraints reduced recognition accuracy to 82%. Performance fell to 82.4% using a network trained on a large vocabulary, fluent-speech corpus. Several other experiments are reported which did not produce significant changes in system performance.<<ETX>>


international conference on acoustics, speech, and signal processing | 2007

Multi-Pass Pronunciation Adaptation

Nathan Bodenstab; Mark A. Fanty

A mapping between words and pronunciations (potential phonetic realizations) is a key component of speech recognition systems. Traditionally, this has been encoded in a lexicon where each pronunciation is transcribed by a linguist or generated by a grapheme-to-phoneme algorithm. For large vocabulary recognition systems, this process is highly susceptible to errors. We present an off-line data driven algorithm to correct suboptimal pronunciations using transcribed utterances. Unlike previous data driven algorithms that struggle to balance acoustic representation and multi-speaker generalization, our multi-pass approach maximizes both criteria, instead of compromising between the two. We demonstrate on an automated name dialing task that our multi-pass algorithm achieves a 70% error rate reduction when compared to a baseline grapheme-to-phoneme generated lexicon.


asilomar conference on signals, systems and computers | 1991

A comparative study of five spectral representations for speaker-independent phonetic recognition

Joseph W. Creekmore; Mark A. Fanty; Ronald A. Cole

The authors describe a comparative study of five spectral representations for speaker-independent phonetic recognition using the TIMIT database. A feedforward network was trained to classify 20-ms frames of speech as one of 39 phonetic classes derived from the TIMIT database. The five representations investigated include the discrete Fourier transform, three representations based on conventional linear predictive coding (LPC), and the cepstral coefficients derived from perceptual linear predictive (PLP) analysis. The PLP cepstral coefficients outperformed the other representations on the task of assigning the correct phonetic label to individual time frames. It is shown that phonetic context can be exploited by providing spectral information before and after the frame to be classified. The effect of the training set size and distribution is also examined.<<ETX>>


conference of the international speech communication association | 1992

A telephone speech database of spelled and spoken names.

Ronald A. Cole; Krist Roginski; Mark A. Fanty


Archive | 2001

Segmentation approach for speech recognition systems

Mark A. Fanty; Michael S. Phillips


conference of the international speech communication association | 1994

A prototype voice-response questionnaire for the u.s. census.

Ronald A. Cole; David G. Novick; Mark A. Fanty; Pieter Vermeulen; Stephen Sutton; Daniel C. Burnett; Johan Schalkwyk

Collaboration


Dive into the Mark A. Fanty's collaboration.

Top Co-Authors

Avatar

Ronald A. Cole

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

David G. Novick

University of Texas at El Paso

View shared research outputs
Top Co-Authors

Avatar

Etienne Barnard

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge