Malcolm Slaney | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Malcolm Slaney is active.

Explore More

Publication

Featured researches published by Malcolm Slaney.

international conference on acoustics, speech, and signal processing | 1997

Construction and evaluation of a robust multifeature speech/music discriminator

Eric D. Scheirer; Malcolm Slaney

We report on the construction of a real-time computer system capable of distinguishing speech signals from music signals over a wide range of digital audio input. We have examined 13 features intended to measure conceptually distinct properties of speech and/or music signals, and combined them in several multidimensional classification frameworks. We provide extensive data on system performance and the cross-validated training/test setup used to evaluate the system. For the datasets currently in use, the best classifier classifies with 5.8% error on a frame-by-frame basis, and 1.4% error when integrating long (2.4 second) segments of sound.

international conference on computer graphics and interactive techniques | 1997

Video Rewrite: driving visual speech with audio

Christoph Bregler; Michele Covell; Malcolm Slaney

Video Rewrite uses existing footage to create automatically new video of a person mouthing words that she did not speak in the original footage. This technique is useful in movie dubbing, for example, where the movie sequence can be modified to sync the actors’ lip motions to the new soundtrack. Video Rewrite automatically labels the phonemes in the training data and in the new audio track. Video Rewrite reorders the mouth images in the training footage to match the phoneme sequence of the new audio track. When particular phonemes are unavailable in the training footage, Video Rewrite selects the closest approximations. The resulting sequence of mouth images is stitched into the background footage. This stitching process automatically corrects for differences in head position and orientation between the mouth images and the background footage. Video Rewrite uses computer-vision techniques to track points on the speaker’s mouth in the training footage, and morphing techniques to combine these mouth gestures into the final video sequence. The new video combines the dynamics of the original actor’s articulations with the mannerisms and setting dictated by the background footage. Video Rewrite is the first facial-animation system to automate all the labeling and assembly tasks required to resync existing footage to a new soundtrack.

international conference on acoustics, speech, and signal processing | 1990

A perceptual pitch detector

Malcolm Slaney; Richard F. Lyon

A pitch detector based on Lickliders (1979) duplex theory of pitch perception was implemented and tested on a variety of stimuli from human perceptual tests. It is believed that this approach accurately models how people perceive pitch. It is shown that it correctly identifies the pitch of complex harmonic and inharmonic stimuli and that it is robust in the face of noise and phase changes. This perceptual pitch detector combines a cochlear model with a bank of autocorrelators. By performing an independent autocorrelation for each channel, the pitch detector is relatively insensitive to phase changes across channels. The information in the correlogram is filtered, nonlinearly enhanced, and summed across channels. Peaks are identified and a pitch is then proposed that is consistent with the peaks.<<ETX>>

international conference on acoustics speech and signal processing | 1998

Baby Ears: a recognition system for affective vocalizations

Malcolm Slaney; Gerald W. McRoberts

We collected more than 500 utterances from adults talking to their infants. We automatically classified 65% of the strongest utterances correctly as approval, attentional bids, or prohibition. We used several pitch and formant measures, and a multidimensional Gaussian mixture-model discriminator to perform this task. As previous studies have shown, changes in pitch are an important cue for affective messages; we found that timbre or cepstral coefficients are also important. The utterances of female speakers, in this test, were easier to classify than were those of male speakers. We hope this research will allow us to build machines that sense the emotional state of a user.

international conference on acoustics, speech, and signal processing | 1994

Auditory model inversion for sound separation

Malcolm Slaney; Daniel Naar; Richard F. Lyon

Techniques to recreate sounds from perceptual displays known as cochleagrams and correlograms are developed using a convex projection framework. Prior work on cochlear-model inversion is extended to account for rectification and gain adaptation. A prior technique for phase recovery in spectrogram inversion is combined with the synchronized overlap-and-add technique of speech rate modification, and is applied to inverting the short-time autocorrelation function representation in the auditory correlogram. Improved methods of initial phase estimation are explored. A range of computational cost options, with and without iteration, produce a range of quality levels from fair to near perfect.<<ETX>>

international conference on acoustics speech and signal processing | 1996

Automatic audio morphing

Malcolm Slaney; Michele Covell; Bud Lassiter

This paper describes techniques to automatically morph from one sound to another. Audio morphing is accomplished by representing the sound in a multi-dimensional space that is warped or modified to produce a desired result. The multi-dimensional space encodes the spectral shape and pitch on orthogonal axes. After matching components of the sound, a morph smoothly interpolates the amplitudes to describe a new sound in the same perceptual space. Finally, the representation is inverted to produce a sound. This paper describes representations for morphing, techniques for matching, and algorithms for interpolating and morphing each sound component. Spectrographic images of a complete morph are shown at the end.

international conference on acoustics speech and signal processing | 1998

MACH1: nonuniform time-scale modification of speech

Michele Covell; M. Margaret Withgott; Malcolm Slaney

We propose a new approach to nonuniform time compression, called Mach1, designed to mimic the natural timing of fast speech. At identical overall compression rates, listener comprehension for Mach1-compressed speech increased between 5 and 31 percentage points over that for linearly compressed speech, and the response times dropped by 15%. For rates between 2.5 and 4.2 times real time, there was no significant comprehension loss with increasing Mach1 compression rates. In A-B preference tests, Mach1-compressed speech was chosen 95% of the time. This paper describes the Mach1 technique and our listener-test results.

systems man and cybernetics | 1995

Pattern playback from 1950 to 1995

Malcolm Slaney

This paper describes algorithms to convert spectrograms, cochleagrams and correlograms back into sounds. Each of these representations converts sound waves into pictures or movies. Techniques for inversion, known as the pattern playback problem, are important because they allow these representations to be used for analysis and transformations of sound. The algorithms described here use convex projections and intelligent phases guesses to iteratively find the closest waveform consistent with the known information. Reconstructions from the spectrogram and cochleagram are indistinguishable from the original sound. In informal listening tests, the correlogram reconstructions are nearly identical.

Journal of the Acoustical Society of America | 1990

Visual representations of speech—A computer model based on correlation

Malcolm Slaney; Richard F. Lyon

The use of the cochleagram and the correlogram in speech and sound recognition is discussed. The cochleagram represents a sound as a pattern of neural firing probabilities at places along the basilar membrane versus time. It is roughly analogous to the spectrogram and its benefits have been described in several papers. But using the cochleagram as a basis for speech recognition is only a weak way to use knowledge of human auditory processing A richer representation of speech and sound is called the correlogram. Sound is represented as a two‐dimensional picture versus time—the extra dimension allowing several interesting perceptual experiences to be modeled. Assembling correlograms into a movie and synchronizing them to sound allow the auditory and visual percepts to be compared. The correlogram is more useful than a cochleagram (or spectrogram) because it shows an orthogonal dimension that represents the fine‐time structure and pitch in the auditory signal. The extra dimension provides the information nec...

Archive | 1988