Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Bryan L. Pellom is active.

Publication


Featured researches published by Bryan L. Pellom.


international conference on acoustics, speech, and signal processing | 2000

Confidence measures for dialogue management in the CU Communicator system

R. San-Segundo; Bryan L. Pellom; Wayne H. Ward; José Manuel Pardo

This paper provides improved confidence assessment for detection of word-level speech recognition errors and out-of-domain user requests using language model features. We consider a combined measure of confidence that utilizes the language model back-off sequence, language model score, and phonetic length of recognized words as indicators of speech recognition confidence. The paper investigates the ability of each feature to detect speech recognition errors and out-of-domain utterances as well as two methods for combining the features contextually: a multi-layer perceptron and a statistical decision tree. We illustrate the effectiveness of the algorithm by considering utterances from the ATIS airline information task as either in-domain and out-of-domain for the DARPA Communicator task. Using this hand-labeled data, it is shown that 27.9% of incorrectly recognized words and 36.4% of out-of-domain phrases are detected at a 2.5% false alarm rate.


Proceedings of the IEEE | 2003

Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction

Ron Cole; S. Van Vuuren; Bryan L. Pellom; K. Hacioglu; Jiyong Ma; Javier R. Movellan; S. Schwartz; D. Wade-Stein; Wayne H. Ward; Jie Yan

This paper presents a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote, and gesture. These animated agents will converse with people much like people converse effectively with assistants in a variety of focused applications. Despite the research advances required to realize this vision, and the lack of strong experimental evidence that animated agents improve human-computer interaction, we argue that initial prototypes of perceptive animated interfaces can be developed today, and that the resulting systems will provide more effective and engaging communication experiences than existing systems. In support of this hypothesis, we first describe initial experiments using an animated character to teach speech and language skills to children with hearing problems, and classroom subjects and social skills to children with autistic spectrum disorder. We then show how existing dialogue system architectures can be transformed into perceptive animated interfaces by integrating computer vision and animation capabilities. We conclude by describing the Colorado Literacy Tutor, a computer-based literacy program that provides an ideal testbed for research and development of perceptive animated interfaces, and consider next steps required to realize the vision.


international conference on acoustics, speech, and signal processing | 2003

Recent improvements in the CU Sonic ASR system for noisy speech: the SPINE task

Bryan L. Pellom; Kadri Hacioglu

We report on recent improvements in the University of Colorado system for the DARPA/NRL Speech in Noisy Environments (SPINE) task. In particular, we describe our efforts on improving acoustic and language modeling for the task and investigate methods for unsupervised speaker and environment adaptation from limited data. We show that the MAPLR adaptation method outperforms single and multiple regression class MLLR on the SPINE task. Our current SPINE system uses the Sonic speech recognition engine that was developed at the University of Colorado. This system is shown to have a word error rate of 31.5% on the SPINE-2 evaluation data. These improvements amount to a 16% reduction in relative word error rate compared to our previous SPINE-2 system fielded in the November 2001 DARPA/NRL evaluation.


ieee automatic speech recognition and understanding workshop | 2003

Children's speech recognition with application to interactive books and tutors

Andreas Hagen; Bryan L. Pellom; Ronald A. Cole

We present initial work towards development of a childrens speech recognition system for use within an interactive reading and comprehension training system. We first describe the Colorado Literacy Tutor project and two corpora collected for childrens speech recognition research. Next, baseline speech recognition experiments are performed to illustrate the degree of acoustic mismatch for children in grades K through 5. It is shown that an 11.2% relative reduction in word error rate can be achieved through vocal tract normalization applied to childrens speech. Finally, we describe our baseline system for automatic recognition of spontaneously spoken story summaries. It is shown that a word error rate of 42.6% is achieved on the presented childrens story summarization task after using unsupervised MAPLR (maximum a posteriori linear regression) adaptation and VTLN (vocal tract length normalization) to compensate for inter-speaker acoustic variability. Based on this result, we point to promising directions for further study.


conference of the international speech communication association | 2005

Distinguishing Deceptive from Non-Deceptive Speech

Julia Hirschberg; Stefan Benus; Jason M. Brenier; Frank Enos; Sarah Friedman; Sarah Gilman; Cynthia Girand; Martin Graciarena; Andreas Kathol; Laura A. Michaelis; Bryan L. Pellom; Elizabeth Shriberg; Andreas Stolcke

To date, studies of deceptive speech have largely been confined to descriptive studies and observations from subjects, researchers, or practitioners, with few empirical studies of the specific lexical or acoustic/prosodic features which may characterize deceptive speech. We present results from a study seeking to distinguish deceptive from non-deceptive speech using machine learning techniques on features extracted from a large corpus of deceptive and non-deceptive speech. This corpus employs an interview paradigm that includes subject reports of truth vs. lie at multiple temporal scales. We present current results comparing the performance of acoustic/prosodic, lexical, and speaker-dependent features and discuss future research directions.


IEEE Signal Processing Letters | 1998

An efficient scoring algorithm for Gaussian mixture model based speaker identification

Bryan L. Pellom; John H. L. Hansen

This article presents a novel algorithm for reducing the computational complexity of identifying a speaker within a Gaussian mixture speaker model framework. For applications in which the entire observation sequence is known, we illustrate that rapid pruning of unlikely speaker model candidates can be achieved by reordering the time-sequence of observation vectors used to update the accumulated probability of each speaker model. The overall approach is integrated into a beam-search strategy and shown to reduce the time to identify a speaker by a factor of 140 over the standard full-search method, and by a factor of six over the standard beam-search method when identifying speakers from the 138 speaker YOHO corpus.


Speech Communication | 2007

Highly accurate children's speech recognition for interactive reading tutors using subword units

Andreas Hagen; Bryan L. Pellom; Ronald A. Cole

Speech technology offers great promise in the field of automated literacy and reading tutors for children. In such applications speech recognition can be used to track the reading position of the child, detect oral reading miscues, assessing comprehension of the text being read by estimating if the prosodic structure of the speech is appropriate to the discourse structure of the story, or by engaging the child in interactive dialogs to assess and train comprehension. Despite such promises, speech recognition systems exhibit higher error rates for children due to variabilities in vocal tract length, formant frequency, pronunciation, and grammar. In the context of recognizing speech while children are reading out loud, these problems are compounded by speech production behaviors affected by difficulties in recognizing printed words that cause pauses, repeated syllables and other phenomena. To overcome these challenges, we present advances in speech recognition that improve accuracy and modeling capability in the context of an interactive literacy tutor for children. Specifically, this paper focuses on a novel set of speech recognition techniques which can be applied to improve oral reading recognition. First, we demonstrate that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling. Next, we propose extending our baseline system by introducing a novel token-passing search architecture targeting subword unit based speech recognition. The proposed subword unit based speech recognition framework is shown to provide equivalent accuracy to a whole-word based speech recognizer while enabling detection of oral reading events and finer grained speech analysis during recognition. The efficacy of the approach is demonstrated using data collected from children in grades 3-5, namely 34.6% of partial words with reasonable evidence in the speech signal are detected at a low false alarm rate of 0.5%.


international conference on acoustics speech and signal processing | 1999

An experimental study of speaker verification sensitivity to computer voice-altered imposters

Bryan L. Pellom; John H. L. Hansen

This paper investigates the relative sensitivity of a Gaussian mixture model (GMM) based voice verification algorithm to computer voice-altered imposters. First, a new trainable speech synthesis algorithm based on trajectory models of the speech line spectral frequency (LSF) parameters is presented in order to model the spectral characteristics of a target voice. A GMM based speaker verifier is then constructed for the 138 speaker YOHO database and shown to have an initial equal-error rate (EER) of 1.45% for the case of casual imposter attempts using a single combination-lock phrase test. Next, imposter voices are automatically altered using the synthesis algorithm to mimic the customers voice. After voice transformation, the false acceptance rate is shown to increase from 1.45% to over 86% if the baseline EER threshold is left unmodified. Furthermore, at a customer false rejection rate of 25%, the false acceptance rate for the voice-altered imposter remains as high as 34.6%.


Speech Communication | 1998

Automatic segmentation of speech recorded in unknown noisy channel characteristics

Bryan L. Pellom; John H. L. Hansen

Abstract This paper investigates the problem of automatic segmentation of speech recorded in noisy channel corrupted environments. Using an HMM-based speech segmentation algorithm, speech enhancement and parameter compensation techniques previously proposed for robust speech recognition are evaluated and compared for improved segmentation in colored noise. Speech enhancement algorithms considered include: Generalized Spectral Subtraction, Nonlinear Spectral Subtraction, Ephraim–Malah MMSE enhancement, and Auto-LSP Constrained Iterative Wiener filtering. In addition, the Parallel Model Combination (PMC) technique is also compared for additive noise compensation. In telephone environments, we compare channel normalization techniques including Cepstral Mean Normalization (CMN) and Signal Bias Removal (SBR) and consider the coupling of channel compensation with front-end speech enhancement for improved automatic segmentation. Compensation performance is assessed for each method by automatically segmenting TIMIT degraded by additive colored noise (i.e., aircraft cockpit, automobile highway, etc.), telephone transmitted NTIMIT, and cellular telephone transmitted CTIMIT databases.


IEEE Transactions on Visualization and Computer Graphics | 2006

Accurate visible speech synthesis based on concatenating variable length motion capture data

Jiyong Ma; Ronald A. Cole; Bryan L. Pellom; Wayne H. Ward; Barbara Wise

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is described. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergartens through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.

Collaboration


Dive into the Bryan L. Pellom's collaboration.

Top Co-Authors

Avatar

John H. L. Hansen

University of Texas at Dallas

View shared research outputs
Top Co-Authors

Avatar

Wayne H. Ward

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Kadri Hacioglu

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Andreas Hagen

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Ronald A. Cole

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar

Jiyong Ma

University of Colorado Boulder

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Audrey N. Le

National Institute of Standards and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge