Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where P. V. de Souza is active.

Publication


Featured researches published by P. V. de Souza.


international conference on acoustics, speech, and signal processing | 1986

Maximum mutual information estimation of hidden Markov model parameters for speech recognition

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

A method for estimating the parameters of hidden Markov models of speech is described. Parameter values are chosen to maximize the mutual information between an acoustic observation sequence and the corresponding word sequence. Recognition results are presented comparing this method with maximum likelihood estimation.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1989

A tree-based statistical language model for natural language speech recognition

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

The problem of predicting the next word a speaker will say, given the words already spoken; is discussed. Specifically, the problem is to estimate the probability that a given word will be the next word uttered. Algorithms are presented for automatically constructing a binary decision tree designed to estimate these probabilities. At each node of the tree there is a yes/no question relating to the words already spoken, and at each leaf there is a probability distribution over the allowable vocabulary. Ideally, these nodal questions can take the form of arbitrarily complex Boolean expressions, but computationally cheaper alternatives are also discussed. Some results obtained on a 5000-word vocabulary with a tree designed to predict the next word spoken from the preceding 20 words are included. The tree is compared to an equivalent trigram model and shown to be superior. >


international conference on acoustics speech and signal processing | 1988

Acoustic Markov models used in the Tangora speech recognition system

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Michael Picheny

The Speech Recognition Group at IBM Research has developed a real-time, isolated-word speech recognizer called Tangora, which accepts natural English sentences drawn from a vocabulary of 20000 words. Despite its large vocabulary, the Tangora recognizer requires only about 20 minutes of speech from each new user for training purposes. The accuracy of the system and its ease of training are largely attributable to the use of hidden Markov models in its acoustic match component. An automatic technique for constructing Markov word models is described and results are included of experiments with speaker-dependent and speaker-independent models on several isolated-word recognition tasks.<<ETX>>


international conference on acoustics speech and signal processing | 1988

Speech recognition with continuous-parameter hidden Markov models

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

The acoustic-modelling problem in automatic speech recognition is examined from an information theoretic point of view. This problem is to design a speech-recognition system which can extract from the speech waveform as much information as possible about the corresponding word sequence. The information extraction process is factored into two steps: a signal-processing step which converts a speech waveform into a sequence of informative acoustic feature vectors, and a step which models such a sequence. The authors are primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space. They explore the trade-off between packing information into such sequences and being able to model them accurately. The difficulty of developing accurate models of continuous-parameter sequences is addressed by investigating a method of parameter estimation which is designed to cope with inaccurate modeling assumptions.<<ETX>>


international conference on acoustics speech and signal processing | 1988

A new algorithm for the estimation of hidden Markov model parameters

Lalit R. Bahl; Peter F. Brown; P. V. de Souza; Robert L. Mercer

Discusses the problem of estimating the parameter values of hidden Markov word models for speech recognition. The authors argue that maximum-likelihood estimation of the parameters does not lead to values which maximize recognition accuracy and describe an alternative estimation procedure called corrective training which is aimed at minimizing the number of recognition errors. Corrective training is similar to a well-known error-correcting training procedure for linear classifiers and works by iteratively adjusting the parameter values so as to make correct words more probable and incorrect words less probable. There are also strong parallels between corrective training and maximum mutual information estimation. They do not prove that the corrective training algorithm converges, but experimental evidence suggests that it does, and that it leads to significantly fewer recognition errors than maximum likelihood estimation.<<ETX>>


IEEE Transactions on Speech and Audio Processing | 1994

The metamorphic algorithm: a speaker mapping approach to data augmentation

Jerome R. Bellegarda; P. V. de Souza; Arthur Nádas; David Nahamoo; Michael Picheny; Lalit R. Bahl

Large vocabulary speaker-dependent speech recognition systems adjust to the acoustic peculiarities of each new speaker based on some enrolment data provided by this speaker. As the amount of data required increases with the sophistication of the underlying acoustic models, the enrolment may get lengthy. To streamline it, it is therefore desirable to make use of previously acquired speech data. The authors describe a data augmentation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired data of the reference speaker onto the space of the new speaker. The performance of the resulting procedure, dubbed the metamorphic algorithm, is illustrated on an isolated utterance speech recognition task with a vocabulary of 20000 words. Results show that the metamorphic algorithm can substantially reduce the word error rate when only a limited amount of enrolment data is available. Alternatively, it leads to a level of performance comparable to that obtained when a much greater amount of enrolment data is required from the new speaker. In addition, it can also be used for tracking spectral evolution over time, thus providing a possible means for robust speaker self-adaptation. >


IEEE Transactions on Speech and Audio Processing | 1993

Multonic Markov word models for large vocabulary continuous speech recognition

Lalit R. Bahl; Jerome R. Bellegarda; P. V. de Souza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system. The models, built from combinations of acoustically based sub-word units called fenones, are derived automatically from one or more sample utterances of a word. Because they are more flexible than previously reported fenone-based word models, they lead to an improved capability of modeling variations in pronunciation. They are therefore particularly useful in the recognition of continuous speech. In addition, their construction is relatively simple, because it can be done using the well-known forward-backward algorithm for parameter estimation of hidden Markov models. Appropriate reestimation formulas are derived for this purpose. Experimental results obtained on a 5000-word vocabulary natural language continuous speech recognition task are presented to illustrate the enhanced power of discrimination of the new models. >


international conference on acoustics, speech, and signal processing | 1995

Experiments using data augmentation for speaker adaptation

Jerome R. Bellegarda; P. V. de Souza; David Nahamoo; Mukund Padmanabhan; Michael Picheny; Lalit R. Bahl

Speaker adaptation typically involves customizing some existing (reference) models in order to account for the characteristics of a new speaker. This work considers the slightly different paradigm of customizing some reference data for the purpose of populating the new speakers space, and then using the resulting (augmented) data to derive the customized models. The data augmentation technique is based on the metamorphic algorithm first proposed in Bellegarda et al. [1992], assuming that a relatively modest amount of data (100 sentences) is available from each new speaker. This contraint requires that reference speakers be selected with some care. The performance of this method is illustrated on a portion of the Wall Street Journal task.


international conference on acoustics, speech, and signal processing | 1987

Experiments with the Tangora 20,000 word speech recognizer

Amir Averbuch; Lalit R. Bahl; Raimo Bakis; Peter F. Brown; G. Daggett; Subhro Das; K. Davies; S. De Gennaro; P. V. de Souza; Edward A. Epstein; D. Fraleigh; Frederick Jelinek; Burn L. Lewis; Robert Leroy Mercer; J. Moorhead; Arthur Nádas; Deebitsudo Nahamoo; Michael Picheny; G. Shichman; P. Spinelli; D. Van Compernolle; H. Wilkens

The Speech Recognition Group at IBM Research in Yorktown Heights has developed a real-time, isolated-utterance speech recognizer for natural language based on the IBM Personal Computer AT and IBM Signal Processors. The system has recently been enhanced by expanding the vocabulary from 5,000 words to 20,000 words and by the addition of a speech workstation to support usability studies on document creation by voice. The system supports spelling and interactive personalization to augment the vocabularies. This paper describes the implementation, user interface, and comparative performance of the recognizer.


international conference on acoustics, speech, and signal processing | 1994

Robust methods for using context-dependent features and models in a continuous speech recognizer

Lalit R. Bahl; P. V. de Souza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

In this paper we describe the method we use to derive acoustic features that reflect some of the dynamics of frame-based parameter vectors. Models for such observations must be context dependent. Such models were outlined in an earlier paper. Here we describe a method for using these models in a recognition system. The method is more robust than using continuous parameter models in recognition. At the same time it does not suffer from the possible information loss in vector quantization based systems.<<ETX>>

Researchain Logo
Decentralizing Knowledge