Claudio Garretón
University of Chile
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Claudio Garretón.
Pattern Recognition Letters | 2008
Fernando Huenupán; Néstor Becerra Yoma; Carlos Molina; Claudio Garretón
A novel framework that applies Bayes-based confidence measure for multiple classifier system fusion is proposed. Compared with ordinary Bayesian fusion, the presented approach can lead to reductions as high as 37% and 35% in EER and ROC curve area, respectively, in speaker verification.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Carlos Molina; Néstor Becerra Yoma; Fernando Huenupan; Claudio Garretón; Jorge Wuth
In this paper, a novel confidence-based reinforcement learning (RL) scheme to correct observation log-likelihoods and to address the problem of unsupervised compensation with limited estimation data is proposed. A two-step Viterbi decoding is presented which estimates a correction factor for the observation log-likelihoods that makes the recognized and neighboring HMMs more or less likely by using a confidence score. If regions in the output delivered by the recognizer exhibit low confidence scores, the second Viterbi decoding will tend to focus the search on neighboring models. In contrast, if recognized regions exhibit high confidence scores, the second Viterbi decoding will tend to retain the recognition output obtained at the first step. The proposed RL mechanism is modeled as the linear combination of two metrics or information sources: the acoustic model log-likelihood and the logarithm of a confidence metric. A criterion based on incremental conditional entropy maximization to optimize a linear combination of metrics or information sources online is also presented. The method requires only one utterance, as short as 0.7 s, and can lead to significant reductions in word error rate (WER) between 3% and 18%, depending on the task, training-testing conditions, and method used to optimize the proposed RL scheme. In contrast to ordinary feature compensation and model parameter adaptation methods, the confidence-based RL method takes place in the frame log-likelihood domain. Consequently, as shown in the results presented here, it is complementary to feature compensation and to model adaptation techniques.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Claudio Garretón; Néstor Becerra Yoma
This correspondence presents a novel feature-space channel compensation technique that models the convolutional distortion in the log-energy mel-filter domain by means of a polynomial approximation. The proposed parametric distortion model generates appropriate constraints in the spectral domain that help to improve the channel cancelling estimation with limited data. In a text-dependent speaker verification task, the polynomial-based channel estimation scheme can lead to reductions in equal error rate (EER) as great as 22% and 8% when compared with the baseline system and with the standard cepstral bias removal approach, respectively, with no significant increase in computational load.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Claudio Garretón; Néstor Becerra Yoma; Matias Torres
This correspondence proposes a novel feature transform for channel robustness with short utterances. In contrast to well-known techniques based on feature trajectory filtering, the presented procedure aims to reduce the time-varying component of channel distortion by applying a bandpass filter along the Mel frequency domain on a frame-by-frame basis. By doing so, the channel cancelling effect due to conventional feature trajectory filtering methods is enhanced. The filtering parameters are defined by employing a novel version of relative importance analysis based on a discriminant function. Experiments with telephone speech on a text-dependent speaker verification task show that the proposed scheme can lead to reductions of 8.6% in equal error rate when compared with the baseline system. Also, when applied in combination with cepstral mean normalization and RASTA, the presented technique leads to further reductions of 9.7% and 4.3% in equal error rate, respectively, when compared with those methods isolated.
Speech Communication | 2008
Néstor Becerra Yoma; Claudio Garretón; Carlos Molina; Fernando Huenupán
In this paper, an unsupervised intra-speaker variability compensation (ISVC) method based on Gestalt is proposed to address the problem of limited enrolling data and noise robustness in text-dependent speaker verification (SV). Experiments with two databases show that: ISVC can lead to reductions in EER as high as 20% or 40% and ISCV provides reductions in the integral below the ROC curve between 30% and 60%. Also, the observed improvements are independent of the number of enrolling utterances. In contrast to model adaptation methods, ISVC is memoryless with respect to previous verification attempts. As shown here, unsupervised model adaptation can lead to substantial improvements in EER but is highly dependent on the sequence of client/impostor verification events. In adverse scenarios, such as massive impostor attacks and verification from alternated telephone line, unsupervised model adaptation might even provide reductions in verification accuracy when compared with the baseline system. In those cases, ISVC can even outperform adaptation schemes. It is worth emphasizing that ISVC and unsupervised model adaptation are compatible and the combination of both methods always improves the performance of model adaptation. The combination of both schemes can lead to improvements in EER as high as 34%. Due to the restrictions of commercially available databases for text-dependent SV research, the results presented here are based on local databases in Spanish. By doing so, the visibility of research in Iberian Languages is highlighted.
Archive | 2007
N. Becerra Yoma; Carlos Molina; Claudio Garretón; Fernando Huenupán
Robustness to noise and low-bit rate coding distortion is one of the main problems faced by automatic speech recognition (ASR) and speaker verification (SV) systems in real applications. Usually, ASR and SV models are trained with speech signals recorded in conditions that are different from testing environments. This mismatch between training and testing can lead to unacceptable error rates. Noise and low-bit rate coding distortion are probably the most important sources of this mismatch. Noise can be classified into additive or convolutional if it corresponds, respectively, to an additive process in the linear domain or to the insertion of a linear transmission channel function. On the other hand, low-bit rate coding distortion is produced by coding – decoding schemes employed in cellular systems and VoIP/ToIP. A popular approach to tackle these problems attempts to estimate the original speech signal before the distortion is introduced. However, the original signal cannot be recovered with 100% accuracy and there will be always an uncertainty in noise canceling. Due to its simplicity, spectral subtraction (SS) (Berouti et al., 1979; Vaseghi & Milner, 1997) has widely been used to reduce the effect of additive noise in speaker recognition (Barger & Sridharan, 1997; Drygajlo & El-Maliki, 1998; Ortega & Gonzalez, 1997), despite the fact that SS loses accuracy at low segmental SNR. Parallel Model Combination (PMC) (Gales & Young,1993) was applied under noisy conditions in (Rose et. al.,1994) where high improvements with additive noise were reported. Nevertheless, PMC requires an accurate knowledge about the additive corrupting signal, whose model is estimated using appreciable amounts of noise data which in turn imposes restrictions on noise stationarity, and about the convolutional distortion that needs to be estimated a priori (Gales, 1997). Rasta filtering (Hermansky et al., 1991) and Cepstral Mean Normalization (CMN) can be very useful to cancel convolutional distortion (Furui, 1982; Reynolds, 1994; van Vuuren, 1996) but, if the speech signal is also corrupted by additive noise, these techniques lose
conference of the international speech communication association | 2007
Fernando Huenupán; Néstor Becerra Yoma; Carlos Molina; Claudio Garretón
conference of the international speech communication association | 2008
Carlos Molina; Néstor Becerra Yoma; Fernando Huenupan; Claudio Garretón
conference of the international speech communication association | 2006
Claudio Garretón; Néstor Becerra Yoma; Carlos Molina; Fernando Huenupan
conference of the international speech communication association | 2013
Néstor Becerra Yoma; Claudio Garretón; Fernando Huenupan; Ignacio Catalan; Jorge Wuth