Uwe Meier
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Uwe Meier.
international conference on acoustics speech and signal processing | 1996
Uwe Meier; Wolfgang Hürst; Paul Duchnowski
We present work on improving the performance of automated speech recognizers by using additional visual information: (lip-/speechreading); achieving error reduction of up to 50%. This paper focuses on different methods of combining the visual and acoustic data to improve the recognition performance. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. We have developed adaptive combination methods at several levels of the recognition network. Additional information such as estimated signal-to-noise ratio (SNR) is used in some cases. The results of the different combination methods are shown for clean speech and data with artificial noise (white, music, motor). The new combination methods adapt automatically to varying noise conditions making hand-tuned parameters unnecessary.
international conference on acoustics, speech, and signal processing | 1995
Paul Duchnowski; Martin Hunke; Dietrich Büsching; Uwe Meier; Alex Waibel
We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speakers face by the lip locator module. Finally, the speakers face is automatically acquired and followed by the face tracker sub-system. Integration of the three functions results in the first bi-modal speech recognizer allowing the speaker reasonable freedom of movement within a possibly noisy room while continuing to communicate with the computer via voice. Compared to audio-alone recognition, the combined system achieves a 20 to 50 percent error rate reduction for various signal/noise conditions.
human factors in computing systems | 1998
Jie Yang; Rainer Stiefelhagen; Uwe Meier; Alex Waibel
In this paper, we present visual tracking techniques for multimodal human computer interaction. First, we discuss techniques for tracking human faces in which human skin-color is used as a major feature. An adaptive stochastic model has been developed to characterize the skin-color distributions. Based on the maximum likelihood method, the model parameters can be adapted for different people and different lighting conditions. The feasibility of the model has been demonstrated by the development of a real-time face tracker. The system has achieved a rate of 30-t- frames/second using a low-end workstation with a framegrabber and a camera. We also present a top-down approach for tracking facial features such as eyes, nostrils, and lip comers. These real-time visual tracking techniques have been successfully applied to many applications such as gaze tracking, and lipreading. The face tracker has been combined with a microphone array for extracting speech signal from a specific person. The gaze tracker has been combined with a speech recognizer in a multimodal interface for controlling a panoramic image viewer.
International Journal of Pattern Recognition and Artificial Intelligence | 2000
Uwe Meier; Rainer Stiefelhagen; Jie Yang; Alex Waibel
Lip reading provides useful information in speech perception and language understanding, especially when the auditory speech is degraded. However, many current automatic lip reading systems impose some restrictions on users. In this paper, we present our research efforts in the Interactive System Laboratory, towards unrestricted lip reading. We first introduce a top–down approach to automatically track and extract lip regions. This technique makes it possible to acquire visual information in real-time without limiting the users freedom of movement. We then discuss normalization algorithms to preprocess images for different lightning conditions (global illumination and side illumination). We also compare different visual preprocessing methods such as raw image, Linear Discriminant Analysis (LDA), and Principle Component Analysis (PCA). We demonstrate the feasibility of the proposed methods by the development of a modular system for flexible human–computer interaction via both visual and acoustic speech. The system is based on an extension of the existing state-of-the-art speech recognition system, a modular Multiple State–Time Delayed Neural Network (MS–TDNN) system. We have developed adaptive combination methods at several different levels of the recognition network. The system can automatically track a speaker and extract his/her lip region in real-time. The system has been evaluated under different noisy conditions such as white noise, music, and mechanical noise. The experimental results indicate that the system can achieve up to 55% error reduction using visual information in addition to the acoustic signal.
ACM Sigcaph Computers and The Physically Handicapped | 1998
Ron Cole; Tim Carmell; Pam Connors; Mike Macon; Johan Wouters; Jacques de Villiers; Alice Tarachow; Dominic W. Massaro; Michael M. Cohen; Jonas Beskow; Jie Yang; Uwe Meier; Alex Waibel; Pat Stone; Alice Davis; Chris Soland; George Fortier
This report describes a three-year project, now eight months old, to develop interactive learning tools for language training with profoundly deaf children. The tools combine four key technologies: speech recognition, developed at the Oregon Graduate Institute; speech synthesis, developed at the University of Edinburgh and modified at OGI; facial animation, developed at University of California, Santa Cruz; and face tracking and speech reading, developed at Carnegie Mellon University. These technologies are being combined to create an intelligent conversational agent; a three-dimensional face that produces and understands auditory and visual speech. The agent has been incorporated into CSLU Toolkit, a software environment for developing and researching spoken language systems. We describe our experiences in bringing interactive learning tools to classrooms at the Tucker-Maxon Oral School in Portland, Oregon, and the technological advances that are required for this project to succeed.
conference of the international speech communication association | 1994
Paul Duchnowski; Uwe Meier; Alex Waibel
AVSP | 1998
Jie Yang; Rainer Stiefelhagen; Uwe Meier; Alex Waibel
Proceedings of the IEEE | 1997
Rainer Stiefelhagen; Uwe Meier; Jie Yang
conference of the international speech communication association | 1998
Petra Geutner; Matthias Denecke; Uwe Meier; Martin Westphal; Alex Waibel
AVSP | 1997
Uwe Meier; Rainer Stiefelhagen; Me Yang