Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Robert J. McAulay is active.

Publication


Featured researches published by Robert J. McAulay.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1986

Speech transformations based on a sinusoidal representation

Thomas E. Quatieri; Robert J. McAulay

In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speakers fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism which has been shown to produce synthetic speech that preserves the wave-form shape and is perceptually indistinguishable from the original. Although the analysis/synthesis system was originally designed for single-speaker signals, it is also capable of recovering and modifying nonspeech signals such as music, multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.


IEEE Transactions on Signal Processing | 1992

Shape invariant time-scale and pitch modification of speech

Thomas F. Quatieri; Robert J. McAulay

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. A time-scale modification system that preserves this shape-invariance property during voicing is developed. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its ability to perform time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing. >


IEEE Transactions on Aerospace and Electronic Systems | 1973

A Decision - Directed Adaptive Tracker

Robert J. McAulay; E. Denlinger

In the design of a tracking filter for air traffic control (ATC) applications, a maneuvering aircraft can be modelled by a linear system with random noise accelerations. A Kalman filter tracker, designed on the basis of a variance chosen according to the distribution of the potential maneuver accelerations, will maintain track during maneuvers and provide some improvement in position accuracy. However, during those portions of the flight path where the aircraft is not maneuvering, the tracking accuracy will not be as good as if no acceleration noise had been allowed in the tracking filter. In this paper, statistical decision theory is used to derive an optimal test for detecting the aircraft maneuver; a more practical suboptimal test is then deduced from the optimal test. As long as no maneuver is declared, a simpler filter, based on a constant-velocity model, is used to track the aircraft. When a maneuver is detected, the tracker is reinitialized using stored data, up-dated to the present time, and then normal tracking is resumed as new data arrives. In essence, the tracker performs on the basis of a piecewise linear model in which the breakpoints are defined on-line using the maneuver detector. Simulation results show that there is a significant improvement in tracking capability using the decision-directed adaptive tracker.


international conference on acoustics, speech, and signal processing | 1990

Pitch estimation and voicing detection based on a sinusoidal speech model

Robert J. McAulay; Thomas F. Quatieri

A technique for estimating the pitch of a speech waveform is developed. It fits a harmonic set of sine waves to the input data using a mean-squared-error (MSE) criterion. By exploiting a sinusoidal model for the input speech waveform, a pitch estimation criterion is derived that is inherently unambiguous, uses pitch-adaptive resolution, uses small-signal suppression to provide enhanced discrimination, and uses amplitude compression to eliminate the effects of pitch-formant interaction. The normalized minimum mean squared error proves to be a powerful discriminant for estimating the likelihood that a given frame of speech is voiced.<<ETX>>


international conference on acoustics, speech, and signal processing | 1985

Mid-rate coding based on a sinusoidal representation of speech

Robert J. McAulay; Thomas F. Quatieri

In this paper a sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. The resulting synthetic waveform preserves the waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech and the noise are maintained. Based on this system, a coder operating at 8 kbps is developed that codes the amplitudes and phases of each of the sine wave components and uses a harmonic model to code all of the frequencies. Since not all of the phases can be coded, a high frequency regeneration technique is developed that exploits the properties of the sinusoidal representation of the coded baseband signal. Based on a relatively limited data base, computer simulation has demonstrated that coded speech of good quality can be achieved. A real-time simulation is being developed to provide a more thorough evaluation of the algorithm.


international conference on acoustics, speech, and signal processing | 1990

Noise reduction using a soft-decision sine-wave vector quantizer

Thomas F. Quatieri; Robert J. McAulay

Noise reduction is performed in the context of a high-quality harmonic zero-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes, a voicing probability, and a fundamental frequency. Least-squared error estimation of a harmonic sine-wave representation leads to a soft decision template estimate consisting of sine-wave amplitudes and a voicing probability. The least-squares solution is modified to use template-matching with nearest neighbors. The reconstruction is improved by using the modified least-squares solution only in spectral regions with low signal-to-noise ratio. The results, although preliminary, provide evidence that harmonic zero-phase sine-wave analysis/synthesis, combined with effective estimation of sine-wave amplitudes and probability of voicing, offers a promising approach to noise reduction.<<ETX>>


international conference on acoustics, speech, and signal processing | 1984

Magnitude-only reconstruction using a sinusoidal speech modelMagnitude-only reconstruction using a sinusoidal speech model

Robert J. McAulay; Thomas F. Quatieri

In this paper a sinusoidal model for the speech waveform is used to develop a new synthesis technique that requires specification of only the amplitudes and frequencies of the component sine waves. These parameters are estimated from the short-time spectral magnitude. The resulting synthetic waveform preserves the short-time spectral magnitude during rapid movements of spectral energy such as voiced/unvoiced transitions, and yields speech of very high quality and intelligibility. The approach is sufficiently flexible to also allow for high-quality time-scale modification with the option of time-varying scaling. Finally, results are given for some initial experiments that explore the possibility of magnitude-only waveform coding at 8 kbps.


international conference on acoustics speech and signal processing | 1988

Computationally efficient sine-wave synthesis and its application to sinusoidal transform coding

Robert J. McAulay; Thomas F. Quatieri

A technique for sine-wave synthesis is described that uses the fast Fourier transform overlap-add method at a 100 Hz rate based on sine-wave parameter coded at a 50 Hz rate. This technique leads to an implementation requiring less than one-half the computational power of a digital-signal-processor chip. The synthesis method implicitly introduces a frequency jitter which renders the encoded synthetic speech more natural. For speech computed by additive acoustic noise, the synthesizer, in conjunction with straightforward noise suppression, greatly improve the quality of the synthetic speech, rendering the sinusoidal transform coder (STC) algorithm a truly robust system. More recent architecture studies of the STC algorithm suggests that an entire implementation requires no more than two ADSP2 100 chips.<<ETX>>


international conference on acoustics, speech, and signal processing | 1987

Multirate sinusoidal transform coding at rates from 2.4 kbps to 8 kbps

Robert J. McAulay; Thomas F. Quatieri

It has been shown [1] that an analysis/synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially indistinguishable from the original. By exploiting the peak-to-peak correlation of the sine-wave amplitudes [2], a harmonic model for the sine-wave frequencies, and a predictive model for the sine-wave phases [3], it has also been shown that the sine-wave parameters can be coded at 8 kbps. In this paper a new technique is described for coding the sine-wave amplitudes based on the idea of a pitch-adaptive channel vocoder. Using this amplitude-coding strategy and operating at a total bit rate of 4.8 kbps, it was possible to code and transmit enough phase information so that very intelligible, natural sounding speech could be synthesized. This 4.8 kbps system has been implemented in real-time and has achieved a Diagnostic Rhyme Test (DRT) score of 95. At 2.4 kbps no explicit phase information could be coded, but by phase-locking all of the sine waves to the fundamental, by adding a pitch-adaptive quadratic phase, and by adding a voicing dependent random phase to each sine wave, natural sounding synthetic speech could be obtained. This new system is currently being implemented in real-time so that intelligibility tests can be performed.


Archive | 1991

Sine-Wave Amplitude Coding at Low Data Rates

Robert J. McAulay; Thomas M. Parks; Thomas F. Quatieri; Michael Sabin

An analysis/synthesis system based on the sinusoidal speech model has been developed [1]. In that system, the sine-wave amplitudes and frequencies are located by searching for the peaks of the magnitude of the short-time Fourier transform (STFT) of the input speech. The phases are computed from the real and imaginary parts of the STFT at the measured frequencies. The frequencies on successive frames are matched, used in a cubic phase interpolator and applied to a sine-wave generator. Each sine wave is amplitude-modulated by the linear interpolation of the matched sine-wave amplitudes. At a 10 ms frame rate, this system produces speech that is perceptually indistinguishable from the original [1]. Since it is not possible to code all of the sine-wave parameters at low data rates, a system has been developed that codes the sine-wave frequencies by fitting a harmonic set of sine waves to the input waveform using a modified mean-squared error criterion [2], and codes the phase information implicitly using a voicing adaptive transition frequency to provide for a mixed voiced/unvoiced phase excitation model [3]. Provided a postfilter is used at the synthesizer to attenuate the noise in the formant nulls, the speech synthesized by this system is of quite high quality having achieved a DAM score of 63.0 in the uncoded mode. Since the fundamental frequency can be coded using ≈ 7 bits and the voicing measure can be coded using ≈ 3 bits, then the possibility exists for good speech quality at low data rates provided the sine-wave amplitudes can be coded efficiently. In this paper the zero-phase, harmonic analysis/synthesis system and the post-filter design methodology will be described and then the various techniques that have been examined for coding the sine-wave amplitudes will be discussed.

Collaboration


Dive into the Robert J. McAulay's collaboration.

Top Co-Authors

Avatar

Thomas F. Quatieri

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Robert B. Dunn

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Elliot Singer

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Thomas E. Hanna

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Bernard Gold

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Clifford J. Weinstein

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

E. Denlinger

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

G. Neben

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joel A. Feldman

Massachusetts Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge