Eric D. Petajan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eric D. Petajan is active.

Explore More

Publication

Featured researches published by Eric D. Petajan.

Signal Processing-image Communication | 1997

MPEG-4: Audio/video and synthetic graphics/audio for mixed media

Peter Doenges; Tolga K. Çapin; Fabio Lavagetto; Joern Ostermann; Igor S. Pandzic; Eric D. Petajan

Abstract MPEG-4 addresses coding of digital hybrids of natural and synthetic, aural and visual (A/V) information. The objective of this synthetic/natural hybrid coding (SNHC) is to facilitate content-based manipulation, interoperability, and wider user access in the delivery of animated mixed media. SNHC will support non-real-time and passive media delivery, as well as more interactive, real-time applications. Integrated spatial-temporal coding is sought for audio, video, and 2D/3D computer graphics as standardized A/V objects. Targets of standardization include mesh-segmented video coding, compression of geometry, synchronization between A/V objects, multiplexing of streamed A/V objects, and spatial-temporal integration of mixed media types. Composition, interactivity, and scripting of A/V objects can thus be supported in client terminals, as well as in content production for servers, also more effectively enabling terminals as servers. Such A/V objects can exhibit high efficiency in transmission and storage, plus content-based interactivity, spatial-temporal scalability, and combinations of transient dynamic data and persistent downloaded data. This approach can lower bandwidth of mixed media, offer tradeoffs in quality versus update for specific terminals, and foster varied distribution methods for content that exploit spatial and temporal coherence over buses and networks. MPEG-4 responds to trends at home and work to move beyond the paradigm of audio/video as a passive experience to more flexible A/V objects which combine audio/video with synthetic 2D/3D graphics and audio.

asilomar conference on signals, systems and computers | 1994

Continuous optical automatic speech recognition by lipreading

Alan J. Goldschen; Oscar N. Garcia; Eric D. Petajan

We describe a continuous optical automatic speech recognizer (OASR) that uses optical information from the oral-cavity shadow of a speaker. The system achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides. We introduce 13, mostly dynamic, oral-cavity features used for optical recognition, present phones that appear optically similar (visemes) for our speaker, and present the recognition results for our hidden Markov models (HMMs) using visemes, trisemes, and generalized trisemes. We conclude that future research is warranted for optical recognition, especially when combined with other input modalities.<<ETX>>

international conference on automatic face and gesture recognition | 1996

Robust face feature analysis for automatic speechreading and character animation

Eric D. Petajan; Hans Peter Graf

The robust acquisition of facial features needed for visual speech processing is fraught with difficulties which greatly increase the complexity of the machine vision system. This system must extract the inner lip contour from facial images with variations in pose, lighting, and facial hair. This paper describes a face feature acquisition system with robust performance in the presence of extreme lighting variations and moderate variations in pose. Furthermore, system performance is not degraded by facial hair or glasses. To find the position of a face reliably we search the whole image for facial features. These features are then combined and tests are applied, to determine whether any such combination actually belongs to a face. In order to find where the lips are, other features of the face, such as the eyes, must be located as well. Without this information it is difficult to reliably find the mouth in a complex image. Just the mouth by itself is easily missed or other elements in the image can be mistaken for a mouth. If camera position can be constrained to allow the nostrils to be viewed, then nostril tracking is used to both reduce computation and provide additional robustness. Once the nostrils are tracked from frame to frame using a tracking window the mouth area can be isolated and normalized for scale and rotation. A mouth detail analysis procedure is then used to estimate the inner lip contour and teeth and tongue regions. The inner lip contour and head movements are then mapped to synthetic face parameters to generate a graphical talking head synchronized with the original human voice. This information can also be used as the basis for visual speech features in an automatic speechreading system. Similar features were used in our previous automatic speechreading systems.

Archive | 1996

Rationale for Phoneme-Viseme Mapping and Feature Selection in Visual Speech Recognition

Alan J. Goldschen; Oscar N. Garcia; Eric D. Petajan

We describe a methodology to automatically identify visemes and to determine important oral-cavity features for a speaker dependent, optical continuous speech recognizer. A viseme, as defined by Fisher (1968), represents phones that contain optically similar sequences of oral-cavity movements. Large vocabulary, continuous acoustic speech recognizers that use Hidden Markov Models (HMMs) require accurate phones models (Lee 1989). Similarly, an optical recognizer requires accurate viseme models (Goldschen 1993). Since no universal agreement exists on a subjective viseme definition, we provide an empirical viseme definition using HMMs. We train a set of phone HMMs using optical information, and then cluster similar phone HMMs to form viseme HMMs. We compare our algorithmic phone-to-viseme mapping with the mappings from human speechreading experts. We start, however, by describing the oral-cavity feature selection process to determine features that characterize the movements of the oral-cavity during speech. The feature selection process uses a correlation matrix, principal component analysis, and speechreading heuristics to reduce the number of oral-cavity features from 35 to 13. Our analysis concludes that the dynamic oral-cavity features offer great potential for machine speechreading and for the teaching of human speechreading.

international conference on consumer electronics | 1996

Progressive-scan rate up-conversion of 24/30 Hz source materials for HDTV

Kobad A. Bugwadia; Eric D. Petajan; Narindra N. Puri

A new frame rate up-conversion technique for utilizing 24/30 Hz source materials for HDTV is proposed. The proposed approach, utilizes motion information along with the information about the sampling grids of the available digitized frames to directly interpolate from them the missing frames necessary to convert the frame rate from 24/30 to 60 Hz.

ieee symposium on information visualization | 1995

Research report. DataSpace: 3-D visualizations of large databases

Vinod Anupam; Shaul Dar; Ted Leibfried; Eric D. Petajan

DataSpace is a system for interactive 3-D visualization and analysis of large databases. DataSpace utilizes the display space by placing panels of information, possibly generated by different visualization applications, in a 3-D graph layout, and providing continuous navigation facilities. Selective rearrangements and transparency can be used to reduce occlusion or to compare or merge a set of images (e.g. line graphs or scatter plots) that are aligned and stacked in depth. A prototype system supporting the basic 3-D graphic operations (layout, zoom, rotation, translation, transparency) has been implemented. We provide several illustrative examples of DataSpace displays taken from the current system. We present the 3-D display paradigm, describe the query, layout and rendering steps required to create a display, and discuss some performance issues.

IEEE Transactions on Consumer Electronics | 1992

A codec for HDTV

Arun N. Netravali; Eric D. Petajan; Scott C. Knauer; Alireza Farid Faryar; George J. Kustka; Kim Nigel Matthews; Robert J. Safranek

A high-quality digital video codec has been developed for the Zenith/AT&T HDTV system. It adaptively selects between two transmission modes with differing rates and robustness. The codec works on an image progressively scanned with 1575 scan lines every 1/30th of a second and achieves a compression ratio of approximately 50 to 1. The high compression ratio facilitates robust transmission of the compressed HDTV signal within an NTSC taboo channel. Transparent image quality is achieved using motion compensated transform coding coupled with a perceptual criterion to determine the quantization accuracy required for each transform coefficient. The codec has been designed to minimize complexity and memory in the receiver. >

IEEE Communications Magazine | 1996

The HDTV Grand Alliance System

Eric D. Petajan

Broadcast television in the United States has remained essentially unchanged in the last 50 years except for the addition of color and stereo sound. Today, personal computers are meeting the need for random access of high-resolution images and CD-quality audio. Furthermore, advances in digital video compression and digital communication technology have cleared the way toward offering high-resolution video and audio services to consumers using traditional analog communications channels. In 1987, the US Federal Communications Commission (FCC) chartered an advisory committee to recommend an advanced television system for the United States. From 1990 to 1992, the Advanced Television Test Center (ATTC) tested four all-digital systems, one analog high-definition television (HDTV) system, and one enhancement NTSC system using broadcast and cable television environment simulators. The formation of the HDTV Grand Alliance in May 1993 resulted from the withdrawal of the only analog HDN system from the competition and a stalemate between the other four all-digital systems. The HDTV Grand Alliance system is composed of the best components from previously competing digital systems demonstrated to the FCC. The Moving Pictures Experts Group (MPEG-2) syntax is used with novel encoding techniques to deliver a set of video scanning formats for a variety of applications. This article describes the important features and concepts embodied in the HDTV Grand Alliance system.

visual communications and image processing | 1995

Lip synchronization in talking head video utilizing speech information

Tsuhan Chen; Hans Peter Graf; Homer H. Chen; Wu Chou; Barry G. Haskell; Eric D. Petajan; Yao Wang

We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and demonstrate speech-assisted coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.

international conference on image processing | 1995

Speech-assisted lip synchronization in audio-visual communications

Tsuhan Chen; Hans Peter Graf; Barry G. Haskell; Eric D. Petajan; Yao Wang; Homer H. Chen; Wu Chou

We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and apply it to coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.

Explore More