Eric D. Petajan
Bell Labs
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Eric D. Petajan.
Signal Processing-image Communication | 1997
Peter Doenges; Tolga K. Çapin; Fabio Lavagetto; Joern Ostermann; Igor S. Pandzic; Eric D. Petajan
Abstract MPEG-4 addresses coding of digital hybrids of natural and synthetic, aural and visual (A/V) information. The objective of this synthetic/natural hybrid coding (SNHC) is to facilitate content-based manipulation, interoperability, and wider user access in the delivery of animated mixed media. SNHC will support non-real-time and passive media delivery, as well as more interactive, real-time applications. Integrated spatial-temporal coding is sought for audio, video, and 2D/3D computer graphics as standardized A/V objects. Targets of standardization include mesh-segmented video coding, compression of geometry, synchronization between A/V objects, multiplexing of streamed A/V objects, and spatial-temporal integration of mixed media types. Composition, interactivity, and scripting of A/V objects can thus be supported in client terminals, as well as in content production for servers, also more effectively enabling terminals as servers. Such A/V objects can exhibit high efficiency in transmission and storage, plus content-based interactivity, spatial-temporal scalability, and combinations of transient dynamic data and persistent downloaded data. This approach can lower bandwidth of mixed media, offer tradeoffs in quality versus update for specific terminals, and foster varied distribution methods for content that exploit spatial and temporal coherence over buses and networks. MPEG-4 responds to trends at home and work to move beyond the paradigm of audio/video as a passive experience to more flexible A/V objects which combine audio/video with synthetic 2D/3D graphics and audio.
asilomar conference on signals, systems and computers | 1994
Alan J. Goldschen; Oscar N. Garcia; Eric D. Petajan
We describe a continuous optical automatic speech recognizer (OASR) that uses optical information from the oral-cavity shadow of a speaker. The system achieves a 25.3 percent recognition on sentences having a perplexity of 150 without using any syntactic, semantic, acoustic, or contextual guides. We introduce 13, mostly dynamic, oral-cavity features used for optical recognition, present phones that appear optically similar (visemes) for our speaker, and present the recognition results for our hidden Markov models (HMMs) using visemes, trisemes, and generalized trisemes. We conclude that future research is warranted for optical recognition, especially when combined with other input modalities.<<ETX>>
international conference on automatic face and gesture recognition | 1996
Eric D. Petajan; Hans Peter Graf
The robust acquisition of facial features needed for visual speech processing is fraught with difficulties which greatly increase the complexity of the machine vision system. This system must extract the inner lip contour from facial images with variations in pose, lighting, and facial hair. This paper describes a face feature acquisition system with robust performance in the presence of extreme lighting variations and moderate variations in pose. Furthermore, system performance is not degraded by facial hair or glasses. To find the position of a face reliably we search the whole image for facial features. These features are then combined and tests are applied, to determine whether any such combination actually belongs to a face. In order to find where the lips are, other features of the face, such as the eyes, must be located as well. Without this information it is difficult to reliably find the mouth in a complex image. Just the mouth by itself is easily missed or other elements in the image can be mistaken for a mouth. If camera position can be constrained to allow the nostrils to be viewed, then nostril tracking is used to both reduce computation and provide additional robustness. Once the nostrils are tracked from frame to frame using a tracking window the mouth area can be isolated and normalized for scale and rotation. A mouth detail analysis procedure is then used to estimate the inner lip contour and teeth and tongue regions. The inner lip contour and head movements are then mapped to synthetic face parameters to generate a graphical talking head synchronized with the original human voice. This information can also be used as the basis for visual speech features in an automatic speechreading system. Similar features were used in our previous automatic speechreading systems.
Archive | 1996
Alan J. Goldschen; Oscar N. Garcia; Eric D. Petajan
We describe a methodology to automatically identify visemes and to determine important oral-cavity features for a speaker dependent, optical continuous speech recognizer. A viseme, as defined by Fisher (1968), represents phones that contain optically similar sequences of oral-cavity movements. Large vocabulary, continuous acoustic speech recognizers that use Hidden Markov Models (HMMs) require accurate phones models (Lee 1989). Similarly, an optical recognizer requires accurate viseme models (Goldschen 1993). Since no universal agreement exists on a subjective viseme definition, we provide an empirical viseme definition using HMMs. We train a set of phone HMMs using optical information, and then cluster similar phone HMMs to form viseme HMMs. We compare our algorithmic phone-to-viseme mapping with the mappings from human speechreading experts. We start, however, by describing the oral-cavity feature selection process to determine features that characterize the movements of the oral-cavity during speech. The feature selection process uses a correlation matrix, principal component analysis, and speechreading heuristics to reduce the number of oral-cavity features from 35 to 13. Our analysis concludes that the dynamic oral-cavity features offer great potential for machine speechreading and for the teaching of human speechreading.
international conference on consumer electronics | 1996
Kobad A. Bugwadia; Eric D. Petajan; Narindra N. Puri
A new frame rate up-conversion technique for utilizing 24/30 Hz source materials for HDTV is proposed. The proposed approach, utilizes motion information along with the information about the sampling grids of the available digitized frames to directly interpolate from them the missing frames necessary to convert the frame rate from 24/30 to 60 Hz.
ieee symposium on information visualization | 1995
Vinod Anupam; Shaul Dar; Ted Leibfried; Eric D. Petajan
DataSpace is a system for interactive 3-D visualization and analysis of large databases. DataSpace utilizes the display space by placing panels of information, possibly generated by different visualization applications, in a 3-D graph layout, and providing continuous navigation facilities. Selective rearrangements and transparency can be used to reduce occlusion or to compare or merge a set of images (e.g. line graphs or scatter plots) that are aligned and stacked in depth. A prototype system supporting the basic 3-D graphic operations (layout, zoom, rotation, translation, transparency) has been implemented. We provide several illustrative examples of DataSpace displays taken from the current system. We present the 3-D display paradigm, describe the query, layout and rendering steps required to create a display, and discuss some performance issues.
IEEE Transactions on Consumer Electronics | 1992
Arun N. Netravali; Eric D. Petajan; Scott C. Knauer; Alireza Farid Faryar; George J. Kustka; Kim Nigel Matthews; Robert J. Safranek
A high-quality digital video codec has been developed for the Zenith/AT&T HDTV system. It adaptively selects between two transmission modes with differing rates and robustness. The codec works on an image progressively scanned with 1575 scan lines every 1/30th of a second and achieves a compression ratio of approximately 50 to 1. The high compression ratio facilitates robust transmission of the compressed HDTV signal within an NTSC taboo channel. Transparent image quality is achieved using motion compensated transform coding coupled with a perceptual criterion to determine the quantization accuracy required for each transform coefficient. The codec has been designed to minimize complexity and memory in the receiver. >
IEEE Communications Magazine | 1996
Eric D. Petajan
Broadcast television in the United States has remained essentially unchanged in the last 50 years except for the addition of color and stereo sound. Today, personal computers are meeting the need for random access of high-resolution images and CD-quality audio. Furthermore, advances in digital video compression and digital communication technology have cleared the way toward offering high-resolution video and audio services to consumers using traditional analog communications channels. In 1987, the US Federal Communications Commission (FCC) chartered an advisory committee to recommend an advanced television system for the United States. From 1990 to 1992, the Advanced Television Test Center (ATTC) tested four all-digital systems, one analog high-definition television (HDTV) system, and one enhancement NTSC system using broadcast and cable television environment simulators. The formation of the HDTV Grand Alliance in May 1993 resulted from the withdrawal of the only analog HDN system from the competition and a stalemate between the other four all-digital systems. The HDTV Grand Alliance system is composed of the best components from previously competing digital systems demonstrated to the FCC. The Moving Pictures Experts Group (MPEG-2) syntax is used with novel encoding techniques to deliver a set of video scanning formats for a variety of applications. This article describes the important features and concepts embodied in the HDTV Grand Alliance system.
visual communications and image processing | 1995
Tsuhan Chen; Hans Peter Graf; Homer H. Chen; Wu Chou; Barry G. Haskell; Eric D. Petajan; Yao Wang
We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and demonstrate speech-assisted coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.
international conference on image processing | 1995
Tsuhan Chen; Hans Peter Graf; Barry G. Haskell; Eric D. Petajan; Yao Wang; Homer H. Chen; Wu Chou
We utilize speech information to improve the quality of audio-visual communications such as video telephony and videoconferencing. We show that the marriage of speech analysis and image processing can solve problems related to lip synchronization. We present a technique called speech-assisted frame-rate conversion, and apply it to coding of talking head video. Demonstration sequences are presented. Extensions and other applications are outlined.