Michele Covell | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michele Covell is active.

Explore More

Publication

Featured researches published by Michele Covell.

international conference on computer graphics and interactive techniques | 1997

Video Rewrite: driving visual speech with audio

Christoph Bregler; Michele Covell; Malcolm Slaney

Video Rewrite uses existing footage to create automatically new video of a person mouthing words that she did not speak in the original footage. This technique is useful in movie dubbing, for example, where the movie sequence can be modified to sync the actors’ lip motions to the new soundtrack. Video Rewrite automatically labels the phonemes in the training data and in the new audio track. Video Rewrite reorders the mouth images in the training footage to match the phoneme sequence of the new audio track. When particular phonemes are unavailable in the training footage, Video Rewrite selects the closest approximations. The resulting sequence of mouth images is stitched into the background footage. This stitching process automatically corrects for differences in head position and orientation between the mouth images and the background footage. Video Rewrite uses computer-vision techniques to track points on the speaker’s mouth in the training footage, and morphing techniques to combine these mouth gestures into the final video sequence. The new video combines the dynamics of the original actor’s articulations with the mannerisms and setting dictated by the background footage. Video Rewrite is the first facial-animation system to automate all the labeling and assembly tasks required to resync existing footage to a new soundtrack.

international conference on acoustics speech and signal processing | 1996

Automatic audio morphing

Malcolm Slaney; Michele Covell; Bud Lassiter

This paper describes techniques to automatically morph from one sound to another. Audio morphing is accomplished by representing the sound in a multi-dimensional space that is warped or modified to produce a desired result. The multi-dimensional space encodes the spectral shape and pitch on orthogonal axes. After matching components of the sound, a morph smoothly interpolates the amplitudes to describe a new sound in the same perceptual space. Finally, the representation is inverted to produce a sound. This paper describes representations for morphing, techniques for matching, and algorithms for interpolating and morphing each sound component. Spectrographic images of a complete morph are shown at the end.

computer vision and pattern recognition | 2017

Full Resolution Image Compression with Recurrent Neural Networks

George Toderici; Damien Vincent; Nick Johnston; Sung Jin Hwang; David Minnen; Joel Shor; Michele Covell

This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural network for entropy coding. We compare RNN types (LSTM, associative LSTM) and introduce a new hybrid of GRU and ResNet. We also study one-shot versus additive reconstruction architectures and introduce a new scaled-additive framework. We compare to previous work, showing improvements of 4.3%–8.8% AUC (area under the rate-distortion curve), depending on the perceptual metric used. As far as we know, this is the first neural network architecture that is able to outperform JPEG at image compression across most bitrates on the rate-distortion curve on the Kodak dataset images, with and without the aid of entropy coding.

international conference on acoustics speech and signal processing | 1998

MACH1: nonuniform time-scale modification of speech

Michele Covell; M. Margaret Withgott; Malcolm Slaney

We propose a new approach to nonuniform time compression, called Mach1, designed to mimic the natural timing of fast speech. At identical overall compression rates, listener comprehension for Mach1-compressed speech increased between 5 and 31 percentage points over that for linearly compressed speech, and the response times dropped by 15%. For rates between 2.5 and 4.2 times real time, there was no significant comprehension loss with increasing Mach1 compression rates. In A-B preference tests, Mach1-compressed speech was chosen 95% of the time. This paper describes the Mach1 technique and our listener-test results.

international conference on automatic face and gesture recognition | 1996

Eigen-points: control-point location using principal component analyses

Michele Covell

Eigen-points estimates the image-plane locations of fiduciary points on an object. By estimating multiple locations simultaneously, eigen-points exploits the interdependence between these locations. This is done by associating neighboring, inter-dependent control-points with a model of the local appearance. The model of local appearance is used to find the feature in new unlabeled images. Control-point locations are then estimated from the appearance of this feature in the unlabeled image. The estimation is done using an affine manifold model of the coupling between the local appearance and the local shape. Eigen-points uses models aimed specifically at recovering shape from image appearance. The estimation equations are solved non-iteratively, in a way that accounts for noise in the training data and the unlabeled images and that accounts for uncertainty in the distribution and dependencies within these noise sources.

international conference on acoustics, speech, and signal processing | 1994

Spanning the gap between motion estimation and morphing

Michele Covell; M. Margaret Withgott

Motion estimation is an important and well-studied method for determining correspondences between images in video processing. In contrast with motion estimation, the use of image warping on image sequences is a new and developing field. To provide image warping effects, morphing algorithms rely on mesh points to align two images. Presently, the selection of these mesh points is a labor-intensive, manual process. This paper attempts to automate the selection of mesh points for morphing by recognizing this problem as one of determining image correspondences. Applications include video compression, animation sequence generation and high-quality time dilation of video.<<ETX>>

computer vision and pattern recognition | 2000

Articulated-pose estimation using brightness- and depth-constancy constraints

Michele Covell; A. Rahini; Michael Harville; Trevor Darrell

This paper explores several approaches for articulated-pose estimation, assuming that video-rate depth information is available, from either stereo cameras or other sensors. We use these depth measurements in the traditional linear brightness constraint equation, as well as in a depth constraint equation. To capture the joint constraints, we combine the brightness and depth constraints with twist mathematics. We address several important issues in the formation of the constraint equations, including updating the body rotation matrix without using a first-order matrix approximation and removing the coupling between the rotation and translation updates. The resulting constraint equations are linear on a modified parameter set. After solving these linear constraints, there is a single closed-form non-linear transformation to return the updates to the original pose parameters. We show results for tracking body pose in oblique views of synthetic walking sequences and in moving-camera views of synthetic jumping-jack sequences. We also show results for tracking body pose in side views of a real walking sequence.

international conference on image processing | 1996

Eigen-points [image matching]

Michele Covell; Christoph Bregler

Eigen-points places control points onto unmarked images. The control points are the image locations corresponding to fiduciary points on an object. For example, we might designate ten points on the outside boundary of the lip as fiduciary points on a face. Then, the control points mark the image locations where those points on the outside lip boundary appear. The control-point locations are estimated using a coupled manifold model, which describes the joint variation of the image appearance and the control-point location. The paper first discusses why this problem is interesting and then reviews previous approaches to placing control points on images of deformable objects. It also outlines our approach to placing control points automatically. Finally, some results from our analysis are presented.

international conference on acoustics, speech, and signal processing | 2001

FastMPEG: time-scale modification of bit-compressed audio information

Michele Covell; Malcolm Slaney; Art Rothstein

This paper describes techniques to change the playback speed of MPEG-compressed audio, without first decompressing the audio file. There are two primary contributions in this paper. (1) We describe three techniques to perform time-scale modification in the maximally decimated domain. (2) We show how to infer the output of the auditory masking model on the new audio stream, using the information in the original file. This new FastMPEG algorithm is more than an order of magnitude more efficient than decompressing the audio, performing time-scale modification in the conventional time-domain, and recompressing.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 2001

Correspondence with cumulative similarity transforms

Trevor Darrell; Michele Covell

A local image transform based on cumulative similarity measures is defined and is shown to enable efficient correspondence and tracking near occluding boundaries. Unlike traditional methods, this transform allows correspondences to be found when the only contrast present is the occluding boundary itself and when the sign of contrast along the boundary is possibly reversed. The transform is based on the idea of a cumulative similarity measure which characterizes the shape of local image homogeneity; both the value of an image at a particular point and the shape of the region with locally similar and connected values is captured. This representation is insensitive to structure beyond an occluding boundary but is sensitive to the shape of the boundary itself, which is often an important cue. We show results comparing this method to traditional least-squares and robust correspondence matching.

Explore More