Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jerome R. Bellegarda is active.

Publication


Featured researches published by Jerome R. Bellegarda.


IEEE Transactions on Acoustics, Speech, and Signal Processing | 1990

Tied mixture continuous parameter modeling for speech recognition

Jerome R. Bellegarda; David Nahamoo

The acoustic-modeling problem in automatic speech recognition is examined with the goal of unifying discrete and continuous parameter approaches. To model a sequence of information-bearing acoustic feature vectors which has been extracted from the speech waveform via some appropriate front-end signal processing, a speech recognizer basically faces two alternatives: (1) assign a multivariate probability distribution directly to the stream of vectors, or (2) use a time-synchronous labeling acoustic processor to perform vector quantization on this stream, and assign a multinomial probability distribution to the output of the vector quantizer. With a few exceptions, these two methods have traditionally been given separate treatment. A class of very general hidden Markov models which can accommodate feature vector sequences lying either in a discrete or in a continuous space is considered; the new class allows one to represent the prototypes in an assumption-limited, yet convenient way, as tied mixtures of simple multivariate densities. Speech recognition experiments, reported for two (5000- and 20000-word vocabulary) office correspondence tasks, demonstrate some of the benefits associated with this technique. >


international conference on acoustics, speech, and signal processing | 1989

Large vocabulary natural language continuous speech recognition

Lalit R. Bahl; Raimo Bakis; Jerome R. Bellegarda; Peter F. Brown; D. Burshtein; Subhro Das; P. V. de Souza; Ponani S. Gopalakrishnan; Frederick Jelinek; Dimitri Kanevsky; Robert L. Mercer; Arthur Nádas; David Nahamoo; Michael Picheny

A description is presented of the authors current research on automatic speech recognition of continuously read sentences from a naturally-occurring corpus: office correspondence. The recognition system combines features from their current isolated-word recognition system and from their previously developed continuous-speech recognition system. It consists of an acoustic processor, an acoustic channel model, a language model, and a linguistic decoder. Some new features in the recognizer relative to the isolated-word speech recognition system include the use of a fast match to prune rapidly to a manageable number the candidates considered by the detailed match, multiple pronunciations of all function words, and modeling of interphone coarticulatory behavior. The authors recorded training and test data from a set of ten male talkers. The perplexity of the test sentences was found to be 93; none of sentences was part of the data used to generate the language model. Preliminary (speaker-dependent) recognition results on these talkers yielded an average word error rate of 11.0%.<<ETX>>


IEEE Transactions on Speech and Audio Processing | 1994

The metamorphic algorithm: a speaker mapping approach to data augmentation

Jerome R. Bellegarda; P. V. de Souza; Arthur Nádas; David Nahamoo; Michael Picheny; Lalit R. Bahl

Large vocabulary speaker-dependent speech recognition systems adjust to the acoustic peculiarities of each new speaker based on some enrolment data provided by this speaker. As the amount of data required increases with the sophistication of the underlying acoustic models, the enrolment may get lengthy. To streamline it, it is therefore desirable to make use of previously acquired speech data. The authors describe a data augmentation strategy based on a piecewise linear mapping between the feature space of a new speaker and that of a reference speaker. This speaker-normalizing mapping is used to transform the previously acquired data of the reference speaker onto the space of the new speaker. The performance of the resulting procedure, dubbed the metamorphic algorithm, is illustrated on an isolated utterance speech recognition task with a vocabulary of 20000 words. Results show that the metamorphic algorithm can substantially reduce the word error rate when only a limited amount of enrolment data is available. Alternatively, it leads to a level of performance comparable to that obtained when a much greater amount of enrolment data is required from the new speaker. In addition, it can also be used for tracking spectral evolution over time, thus providing a possible means for robust speaker self-adaptation. >


IEEE Transactions on Speech and Audio Processing | 1993

Multonic Markov word models for large vocabulary continuous speech recognition

Lalit R. Bahl; Jerome R. Bellegarda; P. V. de Souza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system. The models, built from combinations of acoustically based sub-word units called fenones, are derived automatically from one or more sample utterances of a word. Because they are more flexible than previously reported fenone-based word models, they lead to an improved capability of modeling variations in pronunciation. They are therefore particularly useful in the recognition of continuous speech. In addition, their construction is relatively simple, because it can be done using the well-known forward-backward algorithm for parameter estimation of hidden Markov models. Appropriate reestimation formulas are derived for this purpose. Experimental results obtained on a 5000-word vocabulary natural language continuous speech recognition task are presented to illustrate the enhanced power of discrimination of the new models. >


international conference on acoustics, speech, and signal processing | 1995

Experiments using data augmentation for speaker adaptation

Jerome R. Bellegarda; P. V. de Souza; David Nahamoo; Mukund Padmanabhan; Michael Picheny; Lalit R. Bahl

Speaker adaptation typically involves customizing some existing (reference) models in order to account for the characteristics of a new speaker. This work considers the slightly different paradigm of customizing some reference data for the purpose of populating the new speakers space, and then using the resulting (augmented) data to derive the customized models. The data augmentation technique is based on the metamorphic algorithm first proposed in Bellegarda et al. [1992], assuming that a relatively modest amount of data (100 sentences) is available from each new speaker. This contraint requires that reference speakers be selected with some care. The performance of this method is illustrated on a portion of the Wall Street Journal task.


international conference on acoustics, speech, and signal processing | 1989

Tied mixture continuous parameter models for large vocabulary isolated speech recognition

Jerome R. Bellegarda; David Nahamoo

The acoustic modeling problem in automatic speech recognition is estimated with the specific goal of unifying discrete and continuous parameter approaches. The authors consider a class of very general hidden Markov models which can accommodate sequences of information-bearing acoustic feature vectors lying either in a discrete or in a continuous space. More generally, the new class allows one to represent the prototypes in an assumption-limited, yet convenient, way, as (tied) mixtures of simple multivariate densities. Speech recognition experiments, reported for a large (5000-word) vocabulary office correspondence task, demonstrate some of the benefits associated with this technique.<<ETX>>


international conference on acoustics, speech, and signal processing | 1993

On-line handwriting recognition using continuous parameter hidden Markov models

Krishna S. Nathan; Jerome R. Bellegarda; David Nahamoo; Eveline Jeannine Bellegarda

The problem of the automatic recognition of handwritten text is addressed. The text to be recognized is captured online and the temporal sequence of the data is presented. The approach is based on a left-to-right hidden markov model (HMM) for each character that models the dynamics of the written script. A mixture of Gaussian distributions is used to represent the output probabilities at each arc of the HMM. Several strategies for reestimating the model parameters are discussed. Experiments show that this approach results in significant decreases in error rate for the recognition of discretely written characters compared with elastic matching techniques. The HMM outperforms the elastic matching technique for both writer-dependent and writer-independent recognition tasks.<<ETX>>


IEEE Transactions on Aerospace and Electronic Systems | 1991

The hit array: an analysis formalism for multiple access frequency hop coding

Jerome R. Bellegarda; Edward L. Titlebaum

A formalism is presented for the analysis of general frequency hop waveforms, such as those suitable for use in coherent active radar and sonar echolocation systems as well as multiple-access spread-spectrum communications. This formalism is based on the concept of coincidence, or hit, between two frequency hopping patterns. The collection of all possible hits, together with their locations, is recorded in time-frequency space, which produces the high array associated with the two patterns considered. If the code length is sufficiently small with respect to the time-bandwidth product chosen, the hit array can be viewed as a digital representation of the corresponding ambiguity function. Salient properties of the hit array formalism are derived, including simple relationships between hit arrays resulting from basic symmetry-preserving transformations. These properties make it possible to predict the performance of a given set of frequency hop waveforms directly from the associated set of frequency hopping patterns. >


human language technology | 1994

The hub and spoke paradigm for CSR evaluation

Francis Kubala; Jerome R. Bellegarda; Jordan R. Cohen; David S. Pallett; Doug Paul; Michael S. Phillips; R. Rajasekaran; Fred Richardson; Michael Riley; Roni Rosenfeld; Bob Roth; Mitch Weintraub

In this paper, we introduce the new paradigm used in the most recent ARPA-sponsored Continuous Speech Recognition (CSR) evaluation and then discuss the important features of the test design.The 1993 CSR evaluation was organized in a novel fashion in an attempt to accomodate research over a broad variety of important problems in CSR while maintaining a clear program-wide research focus. Furthermore, each test component in the evaluation was designed as an experiment to extract as much information as possible from the results.The evaluation was centered around a large vocabulary speaker-independent (SI) baseline test, which was required of every participating site. This test was dubbed the Hub since it was common to all sites and formed the basis for controlled inter-system comparisons.The Hub test was augmented with a variety of problem-specific optional tests designed to explore a variety of important problems in CSR, mostly involving some kind of mismatch between the training and test conditions. These tests were known as the Spokes since they all could be informatively compared to the Hub, but were otherwise independent.In the first trial of this evaluation paradigm in November, 1993, 11 research groups participated, yielding a rich array of comparative and contrastive results, all calibrated to the current state of the art in large vocabulary CSR.


IEEE Transactions on Aerospace and Electronic Systems | 1988

Time-frequency hop codes based upon extended quadratic congruences

Jerome R. Bellegarda; E.L. Titlebaum

Time-frequency hop codes are developed that can be used for coherent multiuser echolocation and asynchronous spread spectrum communication systems. They represent a compromise between Costas codes, which have nearly ideal autoambiguity but not so good cross-ambiguity properties, and linear congruential codes, which have nearly ideal cross-ambiguity but unattractive autoambiguity properties. Extended quadratic congruential code words are shown to have reasonably good autoambiguity and cross-ambiguity properties across the whole class of code sets considered. A uniform upper bound is placed on the entire cross-ambiguity function surface, and bounds are placed on the position and amplitude of spurious peaks in the autoambiguity function. These bounds depend on the time/bandwidth product and code length exclusively and lead naturally to a discussion of the design tradeoffs for these two parameters. Examples of typical autoambiguity and cross-ambiguity functions are given to illustrate the performance of the new codes. >

Researchain Logo
Decentralizing Knowledge