Hans Peter Graf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hans Peter Graf is active.

Explore More

Publication

Featured researches published by Hans Peter Graf.

international conference on image processing | 1998

An image transform approach for HMM based automatic lipreading

Gerasimos Potamianos; Hans Peter Graf; Eric Cosatto

This paper concentrates on the visual front end for hidden Markov model based automatic lipreading. Two approaches for extracting features relevant to lipreading, given image sequences of the speakers mouth region, are considered: a lip contour based feature approach which first obtains estimates of the speakers lip contours and subsequently extracts features from them; and an image transform based approach, which obtains a compressed representation of the image pixel values that contain the speakers mouth. Various possible features are considered in each approach, and experimental results on a number of visual-only recognition tasks are reported. It is shown that the image transform based approach results in superior lipreading performance. In addition, feature mean subtraction is demonstrated to improve the performance in multi-speaker and speaker-independent recognition tasks. Finally, the effects of video degradations to image transform based automatic lipreading are studied. It is shown that lipreading performance dramatically deteriorates below a 10 Hz field rate, and that image transform features are robust to noise and compression artifacts.

IEEE Journal of Solid-state Circuits | 1992

A reconfigurable VLSI neural network

Srinagesh Satyanarayana; Yannis Tsividis; Hans Peter Graf

Due to the variety of architectures that need be considered while attempting solutions to various problems using neural networks, the implementation of a neural network with programmable topology and programmable weights has been undertaken. A new circuit block, the distributed neuron-synapse, has been used to implement a 1024 synapse reconfigurable network on a VLSI chip. In order to evaluate the performance of the VLSI chip, a complete test setup consisting of hardware for configuring the chip, programming the synaptic weights, presenting analog input vectors to the chip, and recording the outputs of the chip, has been built. Following the performance verification of each circuit block on the chip, various sample problems were solved. In each of the problems the synaptic weights were determined by training the neural network using a gradient-based learning algorithm which is incorporated in the experimental test setup. The results of this work indicate that reconfigurable neural networks built using distributed neuron synapses can be used to solve various problems efficiently. >

IEEE Communications Magazine | 1989

Handwritten digit recognition: applications of neural network chips and automatic learning

Y. Le Cun; Lawrence D. Jackel; Bernhard E. Boser; John S. Denker; Hans Peter Graf; I. Guyon; D. Henderson; R. E. Howard; W. Hubbard

Two novel methods for achieving handwritten digit recognition are described. The first method is based on a neural network chip that performs line thinning and feature extraction using local template matching. The second method is implemented on a digital signal processor and makes extensive use of constrained automatic learning. Experimental results obtained using isolated handwritten digits taken from postal zip codes, a rather difficult data set, are reported and discussed.<<ETX>>

international conference on acoustics speech and signal processing | 1998

Discriminative training of HMM stream exponents for audio-visual speech recognition

Gerasimos Potamianos; Hans Peter Graf

We propose the use of discriminative training by means of the generalized probabilistic descent (GPB) algorithm to estimate hidden Markov model (HMM) stream exponents for audio-visual speech recognition. Synchronized audio and visual features are used to respectively train audio-only and visual-only single-stream HMMs of identical topology by maximum likelihood. A two-stream HMM is then obtained by combining the two single-stream HMMs and introducing exponents that weigh the log-likelihood of each stream. We present the GPD algorithm for stream exponent estimation, consider a possible initialization, and apply it to the single speaker connected letters task of the AT&T bimodal database. We demonstrate the superior performance of the resulting multi-stream HMM to the audio-only, visual-only, and audio-visual single-stream HMMs.

IEEE Circuits & Devices | 1989

Analog electronic neural network circuits

Hans Peter Graf; Lawrence D. Jackel

It is argued that the large interconnectivity and the precision required in neural network models present novel opportunities for analog computing. Analog circuits for a wide variety of problems such as pattern matching, optimization, and learning have been proposed and a few have been built. Most of the circuits built so far are relatively small, exploratory designs. Circuits implementing several different neural algorithms, namely, template matching, associative memory, learning, and two-dimensional resistor networks inspired by the architecture of the retina are discussed. The most mature circuits are those for template matching, and chips performing this function are now being applied to pattern-recognition problems. Examples of analog implementation are examined.<<ETX>>

international conference on automatic face and gesture recognition | 1996

Multi-modal system for locating heads and faces

Hans Peter Graf; Eric Cosatto; David C. Gibbon; Michael Kocheisen; Eric Petajan

We designed a modular system using a combination of shape analysis, color segmentation and motion information for locating reliably heads and faces of different sizes and orientations in complex images. The first of the systems three channels does a shape analysis on gray-level images to determine the location of individual facial features as well as the outlines of heads. In the second channel the color space is analyzed with a clustering algorithm to find areas of skin colors. The color space is first calibrated, using the results from the other channels. In the third channel motion information is extracted from frame differences. Head outlines are determined by analyzing the shapes of areas with large motion vectors. All three channels produce lists of shapes, each marking an area of the image where a facial feature or a part of the outline of a head may be present. Combinations of such shapes are evaluated with n-gram searches to produce a list of likely head positions and the locations of facial features. We tested the system for tracking faces of people sitting in front of terminals and video phones and used it to track people entering through a doorway.

IEEE Signal Processing Letters | 1995

Lip synchronization using speech-assisted video processing

Tsuhan Chen; Hans Peter Graf; Kuansan Wang

We utilize speech information to improve the quality of audio-visual communications such as videotelephony and videoconferencing. In particular, the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion. Demonstration sequences are presented. Other applications, including speech-assisted video coding, are outlined.<<ETX>>

Applied Optics | 1987

Electronic neural network chips

Lawrence D. Jackel; Hans Peter Graf; R. E. Howard

This paper reviews two custom electronic circuits that implement some simple models of neural function. The circuits include a thin-film array of read-only resistive synapses and an array of programmable synapses and amplifiers serving as electronic neurons. Circuit performance and architecture are discussed.

international conference on automatic face and gesture recognition | 1996

Robust face feature analysis for automatic speechreading and character animation

Eric D. Petajan; Hans Peter Graf

The robust acquisition of facial features needed for visual speech processing is fraught with difficulties which greatly increase the complexity of the machine vision system. This system must extract the inner lip contour from facial images with variations in pose, lighting, and facial hair. This paper describes a face feature acquisition system with robust performance in the presence of extreme lighting variations and moderate variations in pose. Furthermore, system performance is not degraded by facial hair or glasses. To find the position of a face reliably we search the whole image for facial features. These features are then combined and tests are applied, to determine whether any such combination actually belongs to a face. In order to find where the lips are, other features of the face, such as the eyes, must be located as well. Without this information it is difficult to reliably find the mouth in a complex image. Just the mouth by itself is easily missed or other elements in the image can be mistaken for a mouth. If camera position can be constrained to allow the nostrils to be viewed, then nostril tracking is used to both reduce computation and provide additional robustness. Once the nostrils are tracked from frame to frame using a tracking window the mouth area can be isolated and normalized for scale and rotation. A mouth detail analysis procedure is then used to estimate the inner lip contour and teeth and tongue regions. The inner lip contour and head movements are then mapped to synthetic face parameters to generate a graphical talking head synchronized with the original human voice. This information can also be used as the basis for visual speech features in an automatic speechreading system. Similar features were used in our previous automatic speechreading systems.

international conference on multimedia and expo | 2000

Audio-visual unit selection for the synthesis of photo-realistic talking-heads

Eric Cosatto; Gerasimos Potamianos; Hans Peter Graf

This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. Synthesizing a new speech animation from these recorded units starts with audio speech and its phonetic annotation from a text-to-speech synthesizer. Then, optimal image units are selected from the recorded set using a Viterbi search through a graph of candidate image units. Costs are attached to the nodes and arcs of the graph that are computed from similarities in both the acoustic and visual domain. While acoustic similarities are computed by simple phonetic matching, visual similarities are estimated using a hierarchical metric that uses high-level features (position and sizes of facial parts) and low-level features (projection of the image pixels on principal components of the database). This method preserves coarticulation and temporal coherence, producing smooth, lip-synched animations. Once the database has been prepared, this system can produce animations from ASCII text fully automatically.

Explore More