Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Néstor Becerra Yoma is active.

Publication


Featured researches published by Néstor Becerra Yoma.


IEEE Transactions on Speech and Audio Processing | 2002

Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm

Néstor Becerra Yoma; Miguel Villar

This paper proposes the replacement of the ordinary output probability with its expected value if the addition of noise is modeled as a stochastic process, which in turn is merged with the hidden Markov model (HMM) in the Viterbi algorithm. This new output probability is analytically derived for the generic case of a mixture of Gaussians and can be seen as the definition of a stochastic version of the weighted Viterbi algorithm. Moreover, an analytical expression to estimate the uncertainty in noise canceling is also presented. The method is applied in combination with spectral subtraction to improve the robustness to additive noise of a text-dependent speaker verification system. Reductions as high as 30% or 40% in the error rates and improvements of 50% in the stability of the decision thresholds are reported.


Pattern Recognition Letters | 2008

Confidence based multiple classifier fusion in speaker verification

Fernando Huenupán; Néstor Becerra Yoma; Carlos Molina; Claudio Garretón

A novel framework that applies Bayes-based confidence measure for multiple classifier system fusion is proposed. Compared with ordinary Bayesian fusion, the presented approach can lead to reductions as high as 37% and 35% in EER and ROC curve area, respectively, in speaker verification.


Computer Speech & Language | 2014

Shape-based modeling of the fundamental frequency contour for emotion detection in speech

Juan Pablo Arias; Carlos Busso; Néstor Becerra Yoma

This paper proposes the use of neutral reference models to detect local emotional prominence in the fundamental frequency. A novel approach based on functional data analysis (FDA) is presented, which aims to capture the intrinsic variability of F0 contours. The neutral models are represented by a basis of functions and the testing F0 contour is characterized by the projections onto that basis. For a given F0 contour, we estimate the functional principal component analysis (PCA) projections, which are used as features for emotion detection. The approach is evaluated with lexicon-dependent (i.e., one functional PCA basis per sentence) and lexicon-independent (i.e., a single functional PCA basis across sentences) models. The experimental results show that the proposed system can lead to accuracies as high as 75.8% in binary emotion classification, which is 6.2% higher than the accuracy achieved by a benchmark system trained with global F0 statistics. The approach can be implemented at sub-sentence level (e.g., 0.5s segments), facilitating the detection of localized emotional information conveyed within the sentence. The approach is validated with the SEMAINE database, which is a spontaneous corpus. The results indicate that the proposed scheme can be effectively employed in real applications to detect emotional speech.


Speech Communication | 2009

ASR based pronunciation evaluation with automatically generated competing vocabulary and classifier fusion

Carlos Molina; Néstor Becerra Yoma; Jorge Wuth; Hiram Vivanco

In this paper the application of automatic speech recognition (ASR) technology in CAPT (Computer Aided Pronunciation Training) is addressed. A method to automatically generate the competitive lexicon, required by an ASR engine to compare the pronunciation of a target word with its correct and wrong phonetic realization, is presented. In order to enable the efficient deployment of CAPT applications, the generation of this competitive lexicon does not require any human assistance or a priori information of mother language dependent errors. The method presented here leads to averaged subjective-objective score correlation equal to 0.82 and 0.75 depending on the task. Index Terms: second language learning, computer aided pronunciation training, speech recognition and competing vocabulary


Speech Communication | 2010

Automatic intonation assessment for computer aided language learning

Juan Pablo Arias; Néstor Becerra Yoma; Hiram Vivanco

Abstract In this paper the nature and relevance of the information provided by intonation is discussed in the framework of second language learning. As a consequence, an automatic intonation assessment system for second language learning is proposed based on a top-down scheme. A stress assessment system is also presented by combining intonation and energy contour estimation. The utterance pronounced by the student is directly compared with a reference one. The trend similarity of intonation and energy contours are compared frame-by-frame by using DTW alignment. Moreover the robustness of the alignment provided by the DTW algorithm to microphone, speaker and quality pronunciation mismatch is addressed. The intonation assessment system gives an averaged subjective–objective score correlation as high as 0.88. The stress assessment evaluation system gives an EER equal to 21.5%, which in turn is similar to the error observed in phonetic quality evaluation schemes. These results suggest that the proposed systems could be employed in real applications. Finally, the schemes presented here are text- and language-independent due to the fact that the reference utterance text-transcription and language are not required.


IEEE Transactions on Speech and Audio Processing | 2001

On including temporal constraints in Viterbi alignment for speech recognition in noise

Néstor Becerra Yoma; Fergus R. McInnes; Mervyn A. Jack; Sandra Dotto Stump; Lee Luan Ling

This paper addresses the problem of temporal constraints in the Viterbi algorithm using conditional transition probabilities. The results here presented suggest that in a speaker dependent small vocabulary task the statistical modelling of state durations is not relevant if the max and min state duration restrictions are imposed, and that truncated probability densities give better results than a metric previously proposed [1]. Finally, context dependent and context independent temporal restrictions are compared in a connected word speech recognition task and it is shown that the former leads to better results with the same computational load.


IEEE Signal Processing Letters | 2005

Bayes-based confidence measure in speech recognition

Néstor Becerra Yoma; Jorge Carrasco; Carlos Molina

In this letter, Bayes-based confidence measure (BBCM) in speech recognition is proposed. BBCM is applicable to any standard word feature and makes use of information about the speech recognition engine performance. In contrast to ordinary confidence measures, BBCM is a probability, which is interesting itself from the practical and theoretical point of view. If applied with word density confidence measure (WDCM), BBCM dramatically improves the discrimination ability of the false acceptance curve when compared to WDCM itself.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Modeling, estimating, and compensating low-bit rate coding distortion in speech recognition

Néstor Becerra Yoma; Carlos Molina; Jorge F. Silva; Carlos Busso

A solution to the problem of speech recognition with signals distorted by low-bit rate coders is presented in this paper. A model for the coding-decoding distortion, a HMM compensation method to include this model, and an EM-based adaptation algorithm to estimate this distortion are proposed here. Medium vocabulary continuous-speech speaker-independent recognition experiments with 8 kbps G.729(CS-CELP), 13 kbps RPE-LTP (GSM), 5.3 kbps G723.1, 4.8 kbps FS-1016 and 32 kbps G.726(ADPCM) coders show that the approach described in this paper is able to dramatically reduce the effect of the coding distortion and, in some cases, gives a word accuracy higher than the baseline system with uncoded speech. Finally, the EM estimation algorithm requires only one adapting utterance and the approach described is certainly suitable for dialogue systems where just a few adapting utterances are available.


IEEE Transactions on Speech and Audio Processing | 2002

MAP speaker adaptation of state duration distributions for speech recognition

Néstor Becerra Yoma; Jorge Silva Sánchez

This paper presents a framework for maximum a posteriori (MAP) speaker adaptation of state duration distributions in hidden Markov models (HMM). Four key issues of MAP estimation, namely analysis and modeling of state duration distributions, the choice of prior distribution, the specification of the parameters of the prior density and the evaluation of the MAP estimates, are tackled. Moreover, a comparison with an adaptation procedure based on maximum likelihood (ML) estimation is presented, and the problem of truncation of the state duration distribution is addressed from the statistical point of view. The results shown in this paper suggest that the speaker adaptation of temporal restrictions substantially improves the accuracy of speaker-independent (SI) HMM with clean and noisy speech. The method requires a low computational load and a small number of adapting utterances, and can be useful to follow the dynamics of the speaking rate in speech recognition.


Speech Communication | 2002

Robust speaker verification with state duration modeling

Néstor Becerra Yoma; Tarciano Facco Pegoraro

This paper addresses the problem of state duration modeling in the Viterbi algorithm in a text-dependent speaker verification task. The results presented in this paper suggest that temporal constraints can lead to reductions of 10% and 20% in the error rates with signals corrupted by noise at SNR equal to 6 and 0 dB, respectively, and that the accurate statistical modeling of state duration (e.g. with gamma probability distribution) does not seem to be very relevant if maximal and minimal state duration restrictions are imposed. In contrast, temporal restrictions do not seem to give any improvement in a speaker verification task with clean speech or high SNR. It is also shown that state duration constraints can easily be applied with the likelihood normalization metrics based on speaker-dependent temporal parameters. Finally, the results here presented show that word position-dependent state duration parameters give no significant improvement when compared with the word position-independent approach if the coarticulation effect between contiguous words is low.

Collaboration


Dive into the Néstor Becerra Yoma's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Richard M. Stern

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge