Bishnu S. Atal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Bishnu S. Atal is active.

Explore More

Publication

Featured researches published by Bishnu S. Atal.

IEEE Transactions on Speech and Audio Processing | 1993

Efficient vector quantization of LPC parameters at 24 bits/frame

Kuldip Kumar Paliwal; Bishnu S. Atal

For low bit rate speech coding applications, it is important to quantize the LPC parameters accurately using as few bits as possible. Though vector quantizers are more efficient than scalar quantizers, their use for accurate quantization of linear predictive coding (LPC) information (using 24-26 bits/frame) is impeded by their prohibitively high complexity. A split vector quantization approach is used here to overcome the complexity problem. An LPC vector consisting of 10 line spectral frequencies (LSFs) is divided into two parts, and each part is quantized separately using vector quantization. Using the localized spectral sensitivity property of the LSF parameters, a weighted LSF distance measure is proposed. With this distance measure, it is shown that the split vector quantizer can quantize LPC information in 24 bits/frame with an average spectral distortion of 1 dB and less than 2% of the frames having spectral distortion greater than 2 dB. The effect of channel errors on the performance of this quantizer is also investigated and results are reported. >

international conference on acoustics, speech, and signal processing | 1982

A new model of LPC excitation for producing natural-sounding speech at low bit rates

Bishnu S. Atal; Joel R. Remde

The excitation for LPC speech synthesis usually consists of two separate signals - a delta-function pulse once every pitch period for voiced speech and white noise for unvoiced speech. This manner of representing excitation requires that speech segments be classified accurately into voiced and unvoiced categories and the pitch period of voiced segments be known. It is now well recognized that such a rigid idealization of the vocal excitation is often responsible for the unnatural quality associated with synthesized speech. This paper describes a new approach to the excitation problem that does not require a priori knowledge of either the voiced-unvoiced decision or the pitch period. All classes of sounds are generated by exciting the LPC filter with a sequence of pulses; the amplitudes and locations of the pulses are determined using a non-iterative analysis-by-synthesis procedure. This procedure minimizes a perceptual-distance metric representing subjectively-important differences between the waveforms of the original and the synthetic speech signals. The distance metric takes account of the finite-frequency resolution as well as the differential sensitivity of the human ear to errors in the formant and inter-formant regions of the speech spectrum.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1976

A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition

Bishnu S. Atal; Lawrence R. Rabiner

In speech analysis, the voiced-unvoiced decision is usually performed in conjunction with pitch analysis. The linking of voiced-unvoiced (V-UV) decision to pitch analysis not only results in unnecessary complexity, but makes it difficult to classify short speech segments which are less than a few pitch periods in duration. In this paper, we describe a pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, unvoiced speech, or silence, based on measurements made on the signal. In this method, five different measurements are made on the speech segment to be classified. The measured parameters are the zero-crossing rate, the speech energy, the correlation between adjacent speech samples, the first predictor coefficient from a 12-pole linear predictive coding (LPC) analysis, and the energy in the prediction error. The speech segment is assigned to a particular class based on a minimum-distance rule obtained under the assumption that the measured parameters are distributed according to the multidimensional Gaussian probability density function. The means and covariances for the Gaussian distribution are determined from manually classified speech data included in a training set. The method has been found to provide reliable classification with speech segments as short as 10 ms and has been used for both speech analysis-synthesis and recognition applications. A simple nonlinear smoothing algorithm is described to provide a smooth 3-level contour of an utterance for use in speech recognition applications. Quantitative results and several examples illustrating the performance of the method are included in the paper.

Proceedings of the IEEE | 1976

Automatic recognition of speakers from their voices

Bishnu S. Atal

This paper presents a survey of automatic speaker recognition techniques. The paper indudes a discussion of the speaker-dependent properties of the speech signal, methods for selecting an efficient set of speech measurements, results of experimental studies illustrating the performance of various methods of speaker recognition, and a comparision of the performance of automatic methods with that of human listeners. Both text-dependent as well as text-independent speaker-recognition techniques are discussed.

international conference on acoustics, speech, and signal processing | 1983

Efficient coding of LPC parameters by temporal decomposition

Bishnu S. Atal

This paper describes a method for efficient coding of LPC log area parameters. It is now well recognized that sample-by-sample quantization of LPC parameters is not very efficient in minimizing the bit rate needed to code these parameters. Recent methods for reducing the bit rate have used vector and segment quantization methods. Much of the past work in this area has focussed on efficient coding of LPC parameters in the context of vocoders which put a ceiling on achievable speech quality. The results from these studies cannot be directly applied to synthesis of high quality speech. This paper describes a different approach to efficient coding of log area parameters. Our aim is to determine the extent to which the bit rate of LPC parameters can be reduced without sacrificing speech quality. Speech events occur generally at non-uniformly spaced time intervals. Moreover, some speech events are slow while others are fast. Uniform sampling of speech parameters is thus not efficient. We describe a non-uniform sampling and interpolation procedure for efficient coding of log area parameters. A temporal decomposition technique is used to represent the continuous variation of these parameters as a linearly-weighted sum of a number of discrete elementary components. The location and length of each component is automatically adapted to speech events. We find that each elementary component can be coded as a very low information rate signal.

international conference on acoustics, speech, and signal processing | 1984

Improving performance of multi-pulse LPC coders at low bit rates

Sharad Singhal; Bishnu S. Atal

The multi-pulse excitation model provides a method for producing natural-sounding speech at medium to low bit rates. Multi-pulse analysis obtains the all-pole filter excitation by minimizing a spectrally-weighted mean-squared error between the original and synthetic speech signals. Although the method provides high quality speech around 10 kbits/sec, speech quality suffers if the bit rate is lowered. In this paper, we focus on problems encountered in attempting to maintain speech quality while synthesizing speech using multi-pulse excitation at lower bit rates.

international conference on acoustics, speech, and signal processing | 1990

Pitch predictors with high temporal resolution

Peter Kroon; Bishnu S. Atal

A first-order pitch predictor is described whose delay is specified as an integer number of samples plus a fraction of a sample at the current sampling rate. This realization has a better performance than conventional multiple coefficient predictors and leads to more efficient coding of the predictor parameters. Also discussed is the application of noninteger delay pitch predictors to low-bit-rate speech coding.<<ETX>>

international conference on acoustics, speech, and signal processing | 1980

Improved quantizer for adaptive predictive coding of speech signals at low bit rates

Bishnu S. Atal; Manfred R. Schroeder

Adaptive predictive coding of speech signals at bit rates lower than 10 kbits/sec often requires the use of 2-level (1 bit) quantization of the samples of the prediction residual. Such a coarse quantization of the prediction residual can produce audible quantizing noise in the reproduced speech signal at the receiver. This paper describes a new method of quantization for improving the speech quality. The improvement is obtained by center clipping the prediction residual and by fine quantization of the high-amplitude portions of the prediction residual. The threshold of center clipping is adjusted to provide encoding of the prediction residual at a specified bit rate. This method of quantization not only improves the speech quality by accurate quantization of the prediction residual when its amplitude is large but also allows encoding of the prediction residual at bit rates below 1 bit/sample.

IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

Amplitude optimization and pitch prediction in multipulse coders

Sharad Singhal; Bishnu S. Atal

Although the multipulse model is conceptually simple, the problem of locating the pulses is computationally complex. The authors discuss the basic multipulse model and describe a procedure to compute the excitation with optimally adjusted amplitudes. The algorithm provides a framework for computing multipulse excitation with varying degrees of optimization and computational complexity. The authors find that speech quality depends on the pulse rate. They also find that for the same quality, female speech requires a higher pulse rate than male speech. The pitch dependence can be reduced and speech quality improved for high-pitched speakers by incorporating long delay prediction in the multipulse model. >

international conference on acoustics, speech, and signal processing | 1987

Quantization procedures for the excitation in CELP coders

Peter Kroon; Bishnu S. Atal

Past research on CELP (Code-Excited Linear Predictive) coders has mainly concentrated on the feasibility of the CELP concept and on the reduction of the computational complexity. In this paper we address the problem of finding and encoding the excitation parameters with a limited bit rate, such that high quality speech coding in the 4.8 - 7.2 kb/s range becomes feasible. First, we examine the effect of the various excitation parameters such as code book size, code book population, order of the long-term predictor and update rate on the quality of the reconstructed speech. Second, we investigate procedures for designing and incorporating quantizers for the parameters involved. Finally, using both scalar and vector quantization techniques for the LPC coefficients, we simulated 4.8 kb/s and 7.2 kb/s coders. We also report on the use of postfiltering to further improve the performance of the CELP coder.

Explore More