José Novoa
University of Chile
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by José Novoa.
Computer Speech & Language | 2018
José Novoa; Josué Fredes; Víctor Poblete; Néstor Becerra Yoma
Abstract In this paper an uncertainty weighting scheme for DNN–HMM-based speech recognition is proposed to increase discriminability in the decoding process. To this end, the DNN pseudo-log-likelihoods are weighted according to the uncertainty variance assigned to the acoustic observation. The results presented here suggest that substantial reduction in WER is achieved with clean training. Moreover, modelling the uncertainty propagation through the DNN is not required and no approximations for non-linear activation functions are made. The presented method can be applied to any network topology that delivers log-likelihood-like scores. It can be combined with any noise removal technique and adds a minimal computational cost. This technique was exhaustively evaluated and combined with uncertainty-propagation-based schemes for computing the pseudo-log-likelihoods and uncertainty variance at the DNN output. Two proposed methods optimized the parameters of the weighting function by leveraging the grid search either on a development database representing the given task or on each utterance based on discrimination metrics. Experiments with Aurora-4 task showed that, with clean training, the proposed weighting scheme can reduce WER by a maximum of 21% compared with a baseline system with spectral subtraction and uncertainty propagation using the unscented transform. The uncertainty weighting method reduced the gap between clean and multi-noise/multi-condition training. This can be useful when it is not easy to train a DNN–HMM system in conditions that are similar to the testing ones. Finally, the presented results on the use of uncertainty are very competitive with those published elsewhere using the same database as the one employed here.
human-robot interaction | 2018
José Novoa; Jorge Wuth; Juan Pablo Escudero; Josué Fredes; Rodrigo Mahu; Néstor Becerra Yoma
In this paper, we propose to replace the classical black box integration of automatic speech recognition technology in HRI applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this paper focuses on the environment representation and modeling by training a deep neural network-hidden Markov model based automatic speech recognition engine combining clean utterances with the acoustic-channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. Moreover, different speech recognition testing conditions were produced by recording two types of acoustics sources, i.e. a loudspeaker and human speakers, using a Microsoft Kinect mounted on top of the PR2 robot, while performing head rotations and movements towards and away from the fixed sources. In this generic HRI scenario, the resulting automatic speech recognition engine provided a word error rate that is at least 26% and 38% lower than publicly available speech recognition APIs with the playback (i.e. loudspeaker) and human testing databases, respectively, with a limited amount of training data.
IEEE Signal Processing Letters | 2017
Josué Fredes; José Novoa; Simon King; Richard M. Stern; Néstor Becerra Yoma
This letter describes modifications to locally normalized filter banks (LNFB), which substantially improve their performance on the Aurora-4 robust speech recognition task using a Deep Neural Network-Hidden Markov Model (DNN-HMM)-based speech recognition system. The modified coefficients, referred to as LNFB features, are a filter-bank version of locally normalized cepstral coefficients (LNCC), which have been described previously. The ability of the LNFB features is enhanced through the use of newly proposed dynamic versions of them, which are developed using an approach that differs somewhat from the traditional development of delta and delta–delta features. Further enhancements are obtained through the use of mean normalization and mean–variance normalization, which is evaluated both on a per-speaker and a per-utterance basis. The best performing feature combination (typically LNFB combined with LNFB delta and delta–delta features and mean–variance normalization) provides an average relative reduction in word error rate of 11.4% and 9.4%, respectively, compared to comparable features derived from Mel filter banks when clean and multinoise training are used for the Aurora-4 evaluation. The results presented here suggest that the proposed technique is more robust to channel mismatches between training and testing data than MFCC-derived features and is more effective in dealing with channel diversity.
conference of the international speech communication association | 2016
Víctor Poblete; Juan Pablo Escudero; Josué Fredes; José Novoa; Richard M. Stern; Simon King; Néstor Becerra Yoma
We describe the ability of LNCC features (Locally Normalized Cepstral Coefficients) to improve speaker recognition accuracy in highly reverberant environments. We used a realistic test environment, in which we changed the number and nature of reflective surfaces in the room, creating four increasingly reverberant times from approximately 1 to 9 seconds. In this room, we re-recorded reverberated versions of the Yoho speaker verification corpus. The recordings were made using four speaker-to-microphone distances, from 0.32m to 2.56m. Experimental results for a speaker verification task suggest that LNCC features are an attractive alternative to MFCC features under such reverberant conditions, as they were observed to improve verification accuracy compared to baseline MFCC features in all cases where the reverberation time exceeded 1 second or with a greater speaker-microphone distance (i.e. 2.56 m).
conference of the international speech communication association | 2015
Josué Fredes; José Novoa; Víctor Poblete; Simon King; Richard M. Stern; Néstor Becerra Yoma
conference of the international speech communication association | 2017
José Novoa; Jorge Wuth; Juan Pablo Escudero; Josué Fredes; Rodrigo Mahu; Richard M. Stern; Néstor Becerra Yoma
arxiv:eess.AS | 2018
Juan Pablo Escudero; Víctor Poblete; José Novoa; Jorge Wuth; Josué Fredes; Rodrigo Mahu; Richard M. Stern; Néstor Becerra Yoma
arxiv:eess.AS | 2018
José Novoa; Juan Pablo Escudero; Jorge Wuth; Víctor Poblete; Simon King; Richard M. Stern; Néstor Becerra Yoma
arxiv:eess.AS | 2018
Juan Pablo Escudero; José Novoa; Rodrigo Mahu; Jorge Wuth; Fernando Huenupan; Richard M. Stern; Néstor Becerra Yoma
arXiv: Human-Computer Interaction | 2018
José Novoa; Juan Pablo Escudero; Josué Fredes; Jorge Wuth; Rodrigo Mahu; Néstor Becerra Yoma