Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Joel Praveen Pinto is active.

Publication


Featured researches published by Joel Praveen Pinto.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator

Joel Praveen Pinto; Sivaram Garimella; Mathew Magimai-Doss; Hynek Hermansky

We analyze a simple hierarchical architecture consisting of two multilayer perceptron (MLP) classifiers in tandem to estimate the phonetic class conditional probabilities. In this hierarchical setup, the first MLP classifier is trained using standard acoustic features. The second MLP is trained using the posterior probabilities of phonemes estimated by the first, but with a long temporal context of around 150-230 ms. Through extensive phoneme recognition experiments, and the analysis of the trained second MLP using Volterra series, we show that 1) the hierarchical system yields higher phoneme recognition accuracies-an absolute improvement of 3.5% and 9.3% on TIMIT and CTS respectively-over the conventional single MLP-based system, 2) there exists useful information in the temporal trajectories of the posterior feature space, spanning around 230 ms of context, 3) the second MLP learns the phonetic temporal patterns in the posterior features, which include the phonetic confusions at the output of the first MLP as well as the phonotactics of the language as observed in the training data, and 4) the second MLP classifier requires fewer number of parameters and can be trained using lesser amount of training data.


international conference on acoustics, speech, and signal processing | 2008

Exploiting contextual information for improved phoneme recognition

Joel Praveen Pinto; B. Yegnanarayana; Hynek Hermansky; Mathew Magimai-Doss

In this paper, we investigate the significance of contextual information in a phoneme recognition system using the hidden Markov model - artificial neural network paradigm. Contextual information is probed at the feature level as well as at the output of the multilayered perceptron. At the feature level, we analyze and compare different methods to model sub-phonemic classes. To exploit the contextual information at the output of the multilayered perceptron, we propose the hierarchical estimation of phoneme posterior probabilities. The best phoneme (excluding silence) recognition accuracy of 73.4% on the TIMIT database is comparable to that of the state-of- the-art systems, but more emphasis is on analysis of the contextual information.


international conference on acoustics, speech, and signal processing | 2011

Posterior features for template-based ASR

Serena Soldo; Mathew Magimai.-Doss; Joel Praveen Pinto

This paper investigates the use of phoneme class conditional probabilities as features (posterior features) for template-based ASR. Using 75 words and 600 words task-independent and speaker-independent setup on Phonebook database, we investigate the use of different posterior distribution estimators, different distance measures that are better suited for posterior distributions, and different training data. The reported experiments clearly demonstrate that posterior features are always superior, and generalize better than other classical acoustic features (at the cost of training a posterior distribution estimator).


ieee automatic speech recognition and understanding workshop | 2009

MLP based hierarchical system for task adaptation in ASR

Joel Praveen Pinto; Mathew Magimai-Doss

We investigate a multilayer perceptron (MLP) based hierarchical approach for task adaptation in automatic speech recognition. The system consists of two MLP classifiers in tandem. A well-trained MLP available off-the-shelf is used at the first stage of the hierarchy. A second MLP is trained on the posterior features estimated by the first, but with a long temporal context of around 130 ms. By using an MLP trained on 232 hours of conversational telephone speech, the hierarchical adaptation approach yields a word error rate of 1.8% on the 600-word Phonebook isolated word recognition task. This compares favorably to the error rate of 4% obtained by the conventional single MLP based system trained with the same amount of Phonebook data that is used for adaptation. The proposed adaptation scheme also benefits from the ability of the second MLP to model the temporal information in the posterior features.


international conference on acoustics, speech, and signal processing | 2009

Volterra series for analyzing MLP based phoneme posterior estimator

Joel Praveen Pinto; Garimella S. V. S. Sivaram; Hynek Hermansky; Mathew Magimai-Doss

We present a framework to apply Volterra series to analyze multi-layered perceptrons trained to estimate the posterior probabilities of phonemes in automatic speech recognition. The identified Volterra kernels reveal the spectro-temporal patterns that are learned by the trained system for each phoneme. To demonstrate the applicability of Volterra series, we analyze a multilayered perceptron trained using Mel filter bank energy features and analyze its first order Volterra kernels.


text speech and dialogue | 2008

Reverse Correlation for Analyzing MLP Posterior Features in ASR

Joel Praveen Pinto; Garimella S. V. S. Sivaram; Hynek Hermansky

In this work, we investigate the reverse correlation technique for analyzing posterior feature extraction using an multilayered perceptron trained on multi-resolution RASTA (MRASTA) features. The filter bank in MRASTA feature extraction is motivated by human auditory modeling. The MLP is trained based on an error criterion and is purely data driven. In this work, we analyze the functionality of the combined system using reverse correlation analysis.


international acm sigir conference on research and development in information retrieval | 2008

Fast Approximate Spoken Term Detection from Sequence of Phonemes

Joel Praveen Pinto; Igor Szöke; S. R. Mahadeva Prasanna; Hynek Hermansky


conference of the international speech communication association | 2007

Exploiting Phoneme Similarities in Hybrid HMM-ANN Keyword Spotting

Joel Praveen Pinto; Andrew Lovitt; Hynek Hermansky


conference of the international speech communication association | 2008

Combining Evidence from a Generative and a Discriminative Model in Phoneme Recognition

Joel Praveen Pinto; Hynek Hermansky


Archive | 2007

On Confusions in a Phoneme Recognizer

Andrew Lovitt; Joel Praveen Pinto; Hynek Hermansky

Collaboration


Dive into the Joel Praveen Pinto's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mathew Magimai.-Doss

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

B. Yegnanarayana

International Institute of Information Technology

View shared research outputs
Top Co-Authors

Avatar

S. R. Mahadeva Prasanna

Indian Institute of Technology Guwahati

View shared research outputs
Top Co-Authors

Avatar

Serena Soldo

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Andrew Lovitt

Idiap Research Institute

View shared research outputs
Top Co-Authors

Avatar

Hervé Bourlard

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge