Uwe Meier | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Uwe Meier is active.

Explore More

Publication

Featured researches published by Uwe Meier.

international conference on acoustics speech and signal processing | 1996

Adaptive bimodal sensor fusion for automatic speechreading

Uwe Meier; Wolfgang Hürst; Paul Duchnowski

We present work on improving the performance of automated speech recognizers by using additional visual information: (lip-/speechreading); achieving error reduction of up to 50%. This paper focuses on different methods of combining the visual and acoustic data to improve the recognition performance. We show this on an extension of an existing state-of-the-art speech recognition system, a modular MS-TDNN. We have developed adaptive combination methods at several levels of the recognition network. Additional information such as estimated signal-to-noise ratio (SNR) is used in some cases. The results of the different combination methods are shown for clean speech and data with artificial noise (white, music, motor). The new combination methods adapt automatically to varying noise conditions making hand-tuned parameters unnecessary.

international conference on acoustics, speech, and signal processing | 1995

Toward movement-invariant automatic lip-reading and speech recognition

Paul Duchnowski; Martin Hunke; Dietrich Büsching; Uwe Meier; Alex Waibel

We present the development of a modular system for flexible human-computer interaction via speech. The speech recognition component integrates acoustic and visual information (automatic lip-reading) improving overall recognition, especially in noisy environments. The image of the lips, constituting the visual input, is automatically extracted from the camera picture of the speakers face by the lip locator module. Finally, the speakers face is automatically acquired and followed by the face tracker sub-system. Integration of the three functions results in the first bi-modal speech recognizer allowing the speaker reasonable freedom of movement within a possibly noisy room while continuing to communicate with the computer via voice. Compared to audio-alone recognition, the combined system achieves a 20 to 50 percent error rate reduction for various signal/noise conditions.

human factors in computing systems | 1998

Visual tracking for multimodal human computer interaction

Jie Yang; Rainer Stiefelhagen; Uwe Meier; Alex Waibel

In this paper, we present visual tracking techniques for multimodal human computer interaction. First, we discuss techniques for tracking human faces in which human skin-color is used as a major feature. An adaptive stochastic model has been developed to characterize the skin-color distributions. Based on the maximum likelihood method, the model parameters can be adapted for different people and different lighting conditions. The feasibility of the model has been demonstrated by the development of a real-time face tracker. The system has achieved a rate of 30-t- frames/second using a low-end workstation with a framegrabber and a camera. We also present a top-down approach for tracking facial features such as eyes, nostrils, and lip comers. These real-time visual tracking techniques have been successfully applied to many applications such as gaze tracking, and lipreading. The face tracker has been combined with a microphone array for extracting speech signal from a specific person. The gaze tracker has been combined with a speech recognizer in a multimodal interface for controlling a panoramic image viewer.

International Journal of Pattern Recognition and Artificial Intelligence | 2000

TOWARDS UNRESTRICTED LIP READING

Uwe Meier; Rainer Stiefelhagen; Jie Yang; Alex Waibel

Lip reading provides useful information in speech perception and language understanding, especially when the auditory speech is degraded. However, many current automatic lip reading systems impose some restrictions on users. In this paper, we present our research efforts in the Interactive System Laboratory, towards unrestricted lip reading. We first introduce a top–down approach to automatically track and extract lip regions. This technique makes it possible to acquire visual information in real-time without limiting the users freedom of movement. We then discuss normalization algorithms to preprocess images for different lightning conditions (global illumination and side illumination). We also compare different visual preprocessing methods such as raw image, Linear Discriminant Analysis (LDA), and Principle Component Analysis (PCA). We demonstrate the feasibility of the proposed methods by the development of a modular system for flexible human–computer interaction via both visual and acoustic speech. The system is based on an extension of the existing state-of-the-art speech recognition system, a modular Multiple State–Time Delayed Neural Network (MS–TDNN) system. We have developed adaptive combination methods at several different levels of the recognition network. The system can automatically track a speaker and extract his/her lip region in real-time. The system has been evaluated under different noisy conditions such as white noise, music, and mechanical noise. The experimental results indicate that the system can achieve up to 55% error reduction using visual information in addition to the acoustic signal.

ACM Sigcaph Computers and The Physically Handicapped | 1998

Intelligent animated agents for interactive language training

Ron Cole; Tim Carmell; Pam Connors; Mike Macon; Johan Wouters; Jacques de Villiers; Alice Tarachow; Dominic W. Massaro; Michael M. Cohen; Jonas Beskow; Jie Yang; Uwe Meier; Alex Waibel; Pat Stone; Alice Davis; Chris Soland; George Fortier

This report describes a three-year project, now eight months old, to develop interactive learning tools for language training with profoundly deaf children. The tools combine four key technologies: speech recognition, developed at the Oregon Graduate Institute; speech synthesis, developed at the University of Edinburgh and modified at OGI; facial animation, developed at University of California, Santa Cruz; and face tracking and speech reading, developed at Carnegie Mellon University. These technologies are being combined to create an intelligent conversational agent; a three-dimensional face that produces and understands auditory and visual speech. The agent has been incorporated into CSLU Toolkit, a software environment for developing and researching spoken language systems. We describe our experiences in bringing interactive learning tools to classrooms at the Tucker-Maxon Oral School in Portland, Oregon, and the technological advances that are required for this project to succeed.

conference of the international speech communication association | 1994