Ponani S. Gopalakrishnan

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ponani S. Gopalakrishnan is active.

Explore More

Publication

Featured researches published by Ponani S. Gopalakrishnan.

IEEE Transactions on Information Theory | 1991

An inequality for rational functions with applications to some statistical estimation problems

Ponani S. Gopalakrishnan; Dimitri Kanevsky; Arthur Nádas; David Nahamoo

The well-known Baum-Eagon inequality (1967) provides an effective iterative scheme for finding a local maximum for homogeneous polynomials with positive coefficients over a domain of probability values. However, in many applications the goal is to maximize a general rational function. In view of this, the Baum-Eagon inequality is extended to rational functions. Some of the applications of this inequality to statistical estimation problems are briefly described. >

international conference on acoustics speech and signal processing | 1998

Clustering via the Bayesian information criterion with applications in speech recognition

Scott Shaobing Chen; Ponani S. Gopalakrishnan

One difficult problem we are often faced with in clustering analysis is how to choose the number of clusters. We propose to choose the number of clusters by optimizing the Bayesian information criterion (BIC), a model selection criterion in the statistics literature. We develop a termination criterion for the hierarchical clustering methods which optimizes the BIC criterion in a greedy fashion. The resulting algorithms are fully automatic. Our experiments on Gaussian mixture modeling and speaker clustering demonstrate that the BIC criterion is able to choose the number of clusters according to the intrinsic complexity present in the data.

international conference on acoustics, speech, and signal processing | 1995

Performance of the IBM large vocabulary continuous speech recognition system on the ARPA Wall Street Journal task

Lalit R. Bahl; S. Balakrishnan-Aiyer; J.R. Bellgarda; Martin Franz; Ponani S. Gopalakrishnan; David Nahamoo; Miroslav Novak; Mukund Padmanabhan; Michael Picheny; Salim Roukos

In this paper we discuss various experimental results using our continuous speech recognition system on the Wall Street Journal task. Experiments with different feature extraction methods, varying amounts and type of training data, and different vocabulary sizes are reported.

international conference on acoustics, speech, and signal processing | 1991

Decision trees for phonological rules in continuous speech

Lalit R. Bahl; Peter Vincent Desouza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

The authors present an automatic method for modeling phonological variation using decision trees. For each phone they construct a decision tree that specifies the acoustic realization of the phone as a function of the context in which it appears. Several-thousand sentences from a natural language corpus spoken by several speakers are used to construct these decision trees. Experimental results on a 5000-word vocabulary natural language speech recognition task are presented.<<ETX>>

IEEE Transactions on Speech and Audio Processing | 1993

Multonic Markov word models for large vocabulary continuous speech recognition

Lalit R. Bahl; Jerome R. Bellegarda; P. V. de Souza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

A new class of hidden Markov models is proposed for the acoustic representation of words in an automatic speech recognition system. The models, built from combinations of acoustically based sub-word units called fenones, are derived automatically from one or more sample utterances of a word. Because they are more flexible than previously reported fenone-based word models, they lead to an improved capability of modeling variations in pronunciation. They are therefore particularly useful in the recognition of continuous speech. In addition, their construction is relatively simple, because it can be done using the well-known forward-backward algorithm for parameter estimation of hidden Markov models. Appropriate reestimation formulas are derived for this purpose. Experimental results obtained on a 5000-word vocabulary natural language continuous speech recognition task are presented to illustrate the enhanced power of discrimination of the new models. >

Journal of the Acoustical Society of America | 1995

Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models

Lalit R. Bahl; Peter V. De Souza; Ponani S. Gopalakrishnan; Michael Picheny

A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.

IEEE Transactions on Speech and Audio Processing | 1993

A fast approximate acoustic match for large vocabulary speech recognition

Lalit R. Bahl; S.V. De Gennaro; Ponani S. Gopalakrishnan; Robert L. Mercer

In a large vocabulary speech recognition system using hidden Markov models, calculating the likelihood of an acoustic signal segment for all the words in the vocabulary involves a large amount of computation. In order to run in real time on a modest amount of hardware, it is important that these detailed acoustic likelihood computations be performed only on words which have a reasonable probability of being the word that was spoken. The authors describe a scheme for rapidly obtaining an approximate acoustic match for all the words in the vocabulary in such a way as to ensure that the correct word is, with high probability, one of a small number of words examined in detail. Using fast search methods, they obtain a matching algorithm that is about a hundred times faster than doing a detailed acoustic likelihood computation on all the words in the IBM Office Correspondence isolated word dictation task, which has a vocabulary of 20000 words. Experimental results showing the effectiveness of such a fast match for a number of talkers are given. >

international conference on acoustics, speech, and signal processing | 1994

Robust methods for using context-dependent features and models in a continuous speech recognizer

Lalit R. Bahl; P. V. de Souza; Ponani S. Gopalakrishnan; David Nahamoo; Michael Picheny

In this paper we describe the method we use to derive acoustic features that reflect some of the dynamics of frame-based parameter vectors. Models for such observations must be context dependent. Such models were outlined in an earlier paper. Here we describe a method for using these models in a recognition system. The method is more robust than using continuous parameter models in recognition. At the same time it does not suffer from the possible information loss in vector quantization based systems.<<ETX>>

international conference on acoustics, speech, and signal processing | 1995

A tree search strategy for large-vocabulary continuous speech recognition

Ponani S. Gopalakrishnan; Lalit R. Bahl; Robert L. Mercer

We describe a tree search strategy, called the Envelope Search, which is a time-asynchronous search scheme that combines aspects of the A* heuristic search algorithm with those of the time-synchronous Viterbi search algorithm. This search technique is used in the large-vocabulary continuous speech recognition system developed at the IBM Research Center.

international conference on acoustics speech and signal processing | 1998

Compression of acoustic features for speech recognition in network environments

Ganesh N. Ramaswamy; Ponani S. Gopalakrishnan

In this paper, we describe a new compression algorithm for encoding acoustic features used in typical speech recognition systems. The proposed algorithm uses a combination of simple techniques, such as linear prediction and multi-stage vector quantization, and the current version of the algorithm encodes the acoustic features at a fixed rate of 4.0 kbit/s. The compression algorithm can be used very effectively for speech recognition in network environments, such as those employing a client-server model, or to reduce storage in general speech recognition applications. The algorithm has also been tuned for practical implementations, so that the computational complexity and memory requirements are modest. We have successfully tested the compression algorithm against many test sets from several different languages, and the algorithm performed very well, with no significant change in the recognition accuracy due to compression.

Explore More