P. V. S. Rao
Tata Institute of Fundamental Research
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by P. V. S. Rao.
Signal Processing | 1981
Kuldip Kumar Paliwal; P. V. S. Rao
A modified autocorrelation method of linear prediction is proposed for pitch-synchronous analysis of voiced speech. The method needs one full period of speech data for analysis and assumes periodic extension of the data. This method guarantees the stability of the estimated all-pole filter and is shown to perform better than the covariance and autocorrelation methods of linear prediction.
Speech Communication | 1998
K. Samudravijaya; Sanjeev K. Singh; P. V. S. Rao
The accuracy of speech recognition systems is known to be affected by fast speech. If fast speech can be detected by means of a measure of speaking rate, the acoustic as well as language models of a speech recognition system can be adapted to compensate for fast speech effects. We have studied several measures of speaking rate which have the advantage that they can be computed prior to speech recognition. The proposed measures have been compared with conventional measures, viz., word and phone rate on the TIMIT database. Some of the proposed measures have significant correlations with phone rate and vowel duration. We have shown that the mismatch between actual and expected durations of test vowels reduces if the vowel duration models are adapted to speaking rate, as estimated by the proposed measures. These measures can be computed from features commonly employed in speech recognition, do not entail significant additional computational load and do not need labeling or segmentation of unknown utterance in terms of linguistic units.
Signal Processing | 1982
Kuldip Kumar Paliwal; P. V. S. Rao
Several alternate linear prediction parametric representations are experimentally compared as to their vowel recognition performance. The speech data used for this purpose consist of 900 utterances of 10 different vowels spoken by 3 speakers in a/b/ -vowel- /b/ context. The cepstral coefficients representation is found to be the best linear prediction parametric representation.
Journal of the Acoustical Society of America | 1979
Kuldip Kumar Paliwal; P. V. S. Rao
An acoustic phonemic recognition system for continuous speech is presented. The system utilizes both steady-state and transition segments of the speech signal to achieve recognition. The information contained in formant transitions is utilized by the system by using a synthesis-based recognition approach. It is shown that this improves the performance of the system considerably. Recognition of continuous speech is accomplished here in three stages: segmentation, steady-state recognition, and synthesis-based recognition. The system has been tried out on 40 test utterances, each 3-4 s in duration, spoken by a single male speaker and the following results are obtained: 5.4% missed segment error, 8.3% extra segment error, 52.3% correct recognition using only steady-state segments, and 62.0% correct recognition using both steady-state and transition segments.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1983
Kuldip Kumar Paliwal; P. V. S. Rao
The k-nearest-neighbor decision rule is known to provide a useful nonparametric procedure for pattern classification. This rule is applied here to a vowel recognition problem and the effect of the number (k) of nearest neighbors, the size of the trained set and the type of the distance measure on vowel recognition performance is studied. It is shown that the vowel recognition performance remains approximately constant for all the values of k. The recognition performance initially improves with the size of the training set and then converges to an asymptotic value. Selection of a better distance measure leads to a significant improvement in vowel recognition performance.
Speech Communication | 1993
P. V. S. Rao
A Voice Oriented Interactive Computing Environment (VOICE) has been implemented in the Hindi language. The system provides in interactive facility for visual and voice feedback. The 200 isolated word recognition system is designed around a railway reservation enquiry task and uses acoustic-phonetic segments as the basic units of recognition. Frame level classification into broad acoustic-phonetic categories is accomplished by a maximum likelihood classifier and segmentation by hierarchical clustering of the frame level likelihood vectors by use of explicit duration semi (Hidden) Markov Models. A more detailed classification of a few categories (vowels, voice bar and nasals in the first instance) is performed by neural nets. String matching using dynamic programming accomplishes lexical access, or conversion of the phonetic category symbol strings into words. Distributed processing of the word recognition task enables recognition at four times real time. A language processor disambiguates between multiple choices given by the recognizer for each word and even corrects some acoustic level recognition errors. This, the first system working in any Indian language, gives a recognition performance of 85% at the word level. For comparison, a purely HMM based word level recognizer has also been implemented. The performance is expected to improve further as there is still substantial scope for refinement.
Signal Processing | 1982
Kuldip Kumar Paliwal; P. V. S. Rao
Burgs method of maximum entropy spectral analysis is used to analyse voiced speech signal and its performance is compared with that of the autocorrelation and covariance methods of linear prediction using the following three criteria: (1) normalized total-squared linear prediction error, (2) error in estimating the power spectrum and (3) errors in estimating the first three formant frequencies and bandwidths. Results of pitch-synchronous and pitch-asynchronous analyses when applied to synthetic vowel signals are discussed.
Speech Communication | 1983
Kuldip Kumar Paliwal; P. V. S. Rao
A synthesis-based method for pitch extraction of the speech signal is proposed. The method synthesizes a number of log power spectra for different values of fundamental frequency and compares them with the log power spectrum of the input speech segment. The average magnitude (AM) difference between the two spectra is used for comparison. The value of fundamental frequency that gives the minimum AM difference between the synthesized spectrum and the input spectrum is chosen as the estimated value of fundamental frequency. The voiced/unvoiced decision is made on the basis of the value of the AM difference at the minimum. For synthesizing the log power spectrum, the speech signal is assumed to be the output of an all-pole filter. The transfer function of the all-pole filter is estimated from the input speech segment by using the autocorrelation method of linear prediction. The synthesis-based method is tried out on real speech data and the results are discussed.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 1995
P. V. S. Rao
The approach described is based on an empirical parametric model for the handwriting recognition system. The parameters are so chosen and quantized as to retain only broad shape information, ignoring writer-dependent and other variability. Concatenation of character prototypes generates archetypal reference words for recognition, and training is unnecessary. The recognition scores exceed 90%.
Signal Processing | 1981
T.V. Sreenivas; P. V. S. Rao
A satisfactory solution is yet to be found for the problem of estimating the pitch of speech signals. It is difficult even to evolve objective criteria to evaluate existing algorithms. This is usually done on the basis of complexity of the algorithm, speed of computation, ease of implementation, etc. Even the verification of the results of one particular technique is difficult. This paper presents three functional demarcations of pitch estimation methods (based on the linear model of speech production, analysis of the short-time spectrum and examination of the time domain signal, respectively) as used for speech processing. It is shown that evaluation and comparison of different algorithms becomes consistent and easy within each demarcation. Also, methods falling within each demarcation are shown to be suited for a particular area of speech processing.