Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marvin R. Sambur is active.

Publication


Featured researches published by Marvin R. Sambur.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1975

Applications of a nonlinear smoothing algorithm to speech processing

Lawrence R. Rabiner; Marvin R. Sambur; Carolyn E. Schmidt

In this paper a nonlinear smoothing algorithm recently proposed by Tukey is described and evaluated for speech processing applications. Simple linear smoothing routines generally fail to provide adequate smoothing for data which exhibit both local roughness and sharp discontinuities. The proposed nonlinear smoothing algorithm can effectively smooth such data by using a combination of median smoothing routines and linear filtering. The concept of double smoothing is introduced as a refinement on the smoothing algorithm. Examples of the application of the nonlinear smoothing methods to typical speech parameters are included in this paper.


Journal of the Acoustical Society of America | 1974

Algorithm for determining the endpoints of isolated utterances

Lawrence R. Rabiner; Marvin R. Sambur

An important problem in speech processing is to detect the presence of speech in a background of noise. This problem is often referred to as the endpoint location problem. By accurately detecting the beginning and end of an utterance, the amount of processing of speech data can be kept to a minimum. The algorithm proposed for locating the endpoints of an utterance is based on two measures of the signal, zero crossing rate and energy. The algorithm is inherently capable of performing correctly in any reasonable acoustic environment in which the signal-to-noise ratio is on the order of 30 dB or better. The algorithm has been tested over a variety of recording conditions and for a large number of speakers and has been found to perform well across all tested conditions.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1975

Selection of acoustic features for speaker identification

Marvin R. Sambur

The aim of this study was to determine a set of acoustic features in the speech signal that are effective for the identification of a speaker. The investigation examined a large number of theoretically attractive features. The analysis technique of linear prediction was incorporated to examine features that were previously ignored because their measurement was either too time consuming or not easily amenable to automatic measurement. A novel probability of error criterion was used to determine the the relative merits of the features. The experimental data base was collected over a 3\frac{1}{2} year period and afforded the oportunity to investigate the variation over time of the measurements. The measurements that were found to be the most important were the value of the second resonance (around 1000 Hz) in /n/, the value of the third or fourth resonance (1700-2000 Hz) in /m/ the values of the second, third and fourth formant frequencies in vowels, and the average fundamental frequency of the speaker. A speaker identification experiment using only the best five features was performed. The test data consisted of the multisession data of 11 speakers, and the test data was kept independent of the design data. One error was made in the identification of these speakers for 320 separate identification experiments.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1978

Adaptive noise canceling for speech signals

Marvin R. Sambur

A least mean-square (LMS) adaptive filtering approach has been formulated for removing the deleterious effects of additive noise on the speech signal. Unlike the classical LMS adaptive filtering scheme, the proposed method is designed to cancel out the clean speech signal. This method takes advantage of the quasi-periodic nature of the speech signal to form an estimate of the clean speech signal at time t from the value of the signal at time t minus the estimated pitch period. For additive white noise distortion, preliminary tests indicate that the method improves the perceived speech quality and increases the signal-to-noise ratio (SNR) by 7 dB in a 0 dB environment. The method has also been shown to partially remove the perceived granularity of CVSD coded speech signals and to lead to an improvement in the linear prediction analysis/synthesis of noisy speech.


international conference on acoustics, speech, and signal processing | 1977

Voiced-unvoiced-silence detection using the Itakura LPC distance measure

Lawrence R. Rabiner; Marvin R. Sambur

One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been proposed for making this 3-level decision, these schemes have met with only modest success. In this paper a novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the 3 classes of signal is obtained during a training session, and an LPC distance metric and an energy distance are nonlinearly combined to make the final discrimination. This algorithm has been tested over conventional switched telephone lines, across a variety of speakers, and has been found to have an error rate of about 5%, with the majority of the errors (about 2/3) occurring at the boundaries between signal classes. The algorithm is currently being used in a speaker independent word recognition system.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

LPC analysis/Synthesis from speech inputs containing quantizing noise or additive white noise

Marvin R. Sambur; N. Jayant

An important problem in some communication systems is the performance of linear prediction (LPC) analysis with speech inputs that have been corrupted by (signal-correlated) quantization distortion or additive white noise. To gain a first insight into this problem, a high-quality speech sample was deliberately degraded by using various degrees (bit rates of 16 kbps and more) of differential PCM (DPCM), and delta modulation (DM) quantization, and by the introduction of additive white noise. The resulting speech samples were then analyzed to obtain the LPC control signals: pitch, gain, and the linear prediction coefficients. These control parameters were then compared to the parameters measured in the original, high quality signal. The measurements of pitch perturbations were assessed on the basis of how many points exceeded an appropriate difference limen. A distance measure proposed by Itakura was used to compare the original LPC coefficients with the coefficients measured from the degraded speech. In addition, the measured control signals were used to synthesize speech for perceptual evaluation. Results suggest that LPC analysis/synthesis is fairly immune to the degradation of DPCM quantization. The effects of DM quantization are more severe and the effects of additive white noise are the most serious.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

Speaker recognition using orthogonal linear prediction

Marvin R. Sambur

Recent experiments in speech synthesis have shown that, by an appropriate eigenvector analysis, a set of orthogonal parameters can be obtained that is essentially independent of all linguistic information across an analyzed utterance, but highly indicative of the identity of the speaker. The orthogonal parameters are formed by a linear transformation of the linear prediction parameters, and can achieve their recognition potential without the need of any time-normalization procedure. The speaker discrimination potential of the linear prediction orthogonal parameters was formally tested in both a speaker identification and a speaker verification experiment. The speech data for these experiments consisted of six repetitions of the same sentence spoken by 21 male speakers on six separate occasions. For both identification and verification, the recognition accuracy of the orthogonal parameters exceeded 99 percent for high-quality speech inputs. For telephone inputs, the accuracy exceeded 96 percent. In a separate text-independent speaker identification experiment, an accuracy of 94 percent was achieved for high-quality speech inputs.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1977

Application of an LPC distance measure to the voiced-unvoiced-silence detection problem

Lawrence R. Rabiner; Marvin R. Sambur

One of the most difficult problems in speech analysis is reliable discrimination among silence, unvoiced speech, and voiced speech which has been transmitted over a telephone line. Although several methods have been proposed for making this three-level decision, these schemes have met with only modest success. In this paper, a novel approach to the voiced-unvoiced-silence detection problem is proposed in which a spectral characterization of each of the three classes of signal is obtained during a training session, and an LPC distance measure and an energy distance are nonlinearly combined to make the final discrimination. This algorithm has been tested over conventional switched telephone lines, across a variety of speakers, and has been found to have an error rate of about 5 percent, with the majority of the errors (about \frac{2}{3} ) occurring at the boundaries between signal classes. The algorithm is currently being used in a speaker-independent word recognition system.


Journal of the Acoustical Society of America | 1978

On reducing the buzz in LPC synthesis

Marvin R. Sambur; Aaron E. Rosenberg; Lawrence R. Rabiner; C. A. McGonegal

A method for reducing the characteristic buzz of LPC synthetic speech is presented. The method consists of the use of a nonimpulse source for exciting the LPC synthesizer during voiced sounds. One novel feature is that the temporal parameters of the source are kept in fixed proportion to the pitch period. An extensive perceptual experiment has shown that the resulting quality of the synthesis is significantly preferred over the quality of the standard LPC synthesis.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1977

LPC prediction error--Analysis of its variation with the position of the analysis frame

Lawrence R. Rabiner; Bishnu S. Atal; Marvin R. Sambur

The LPC prediction error provides one measure of the success of linear prediction analysis in modeling a speech signal. Although a great deal is known about the properties of the prediction error, relatively little has been published about its variation as a function of the position of the analysis frame. In this paper it is shown that a fairly substantial variation in the prediction error is obtained within a single frame (i.e., 10 ms), independent of the analysis method (i.e., the covariance, autocorrelation, or lattice method). The implication of this result is that standard methods of LPC analysis may be inadequate for some applications. This is because the error signal is generally uniformly sampled at a low rate (on the order of 100 Hz), and this can lead to aliased results because of the variation of the error signal within the frame. For applications such as word recognition with frame-to-frame distance calculations using the prediction error, the errors due to uniform sampling can accrue. For speech synthesis applications, the effect of uniform sampling of the error signal is a small, but noticeable roughness in the synthetic speech. Various techniques for reducing the intraframe variation of the prediction error are discussed.

Collaboration


Dive into the Marvin R. Sambur's collaboration.

Researchain Logo
Decentralizing Knowledge