Pablo Peso Parada | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Pablo Peso Parada is active.

Explore More

Publication

Featured researches published by Pablo Peso Parada.

international conference on acoustics, speech, and signal processing | 2014

NON-INTRUSIVE ESTIMATION OF THE LEVEL OF REVERBERATION IN SPEECH

Pablo Peso Parada; Dushyant Sharma; Patrick A. Naylor

We show corroborating evidence that, among a set of common acoustic parameters, the clarity index C50 provides a measure of reverberation that is well correlated with speech recognition accuracy. We also present a data driven method for non-intrusive C50 parameter estimation from a single channel speech signal. The method extracts a number of features from the speech signal and uses a binary regression tree, trained on appropriate training data, to estimate the C50. Evaluation is carried out using speech utterances convolved with real and simulated room impulse responses, and additive babble noise. The new method outperforms a baseline approach in our evaluation.

IEEE Transactions on Audio, Speech, and Language Processing | 2016

A single-channel non-intrusive C50 estimator correlated with speech recognition performance

Pablo Peso Parada; Dushyant Sharma; Jose Lainez; Daniel A. Barreda; Toon van Waterschoot; Patrick A. Naylor

Several intrusive measures of reverberation can be computed from measured and simulated room impulse responses, over the full frequency band or for each individual mel-frequency subband. It is initially shown that full-band clarity index C50 is the most correlated measure on average with reverberant speech recognition performance. This corroborates previous findings but now for the dataset to be used in this study. We extend the previous findings to show that C50 also exhibits the highest mutual information on average. Motivated by these extended findings, a nonintrusive room acoustic (NIRA) estimation method is proposed to estimate C50 from only the reverberant speech signal. The NIRA method is a data-driven approach based on computing a number of features from the speech signal and it employs these features to train a model used to perform the estimation. The choice of features and learning techniques are explored in this work using an evaluation set which comprises approximately 100 000 different reverberant signals (around 93 h of speech) including reverberation from measured and simulated room impulse responses. The feature importance of each feature with respect to the estimation of the target C50 is analysed following two different approaches. In both cases, the newly chosen set of features shows high importance for the target. The best C50 estimator provides a root-mean-square deviation around 3 dB on average for all reverberant test environments.

EURASIP Journal on Advances in Signal Processing | 2015

Reverberant speech recognition exploiting clarity index estimation

Pablo Peso Parada; Dushyant Sharma; Patrick A. Naylor; Toon van Waterschoot

We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C50). Our best performing method includes the estimated value of C50 in the ASR feature vector and also uses C50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

workshop on applications of signal processing to audio and acoustics | 2015

Single-channel speaker diarization based on spatial features

Mathieu Hu; Pablo Peso Parada; Dushyant Sharma; Simon Doclo; Toon van Waterschoot; Mike Brookes; Patrick A. Naylor

Speaker diarization has gained much importance over the past five years in helping overcome key challenges faced by automatic meeting transcription systems. Current state-of-the-art algorithms can only utilize spatial information when multi-microphone recordings are available. In this paper, we propose the novel use of reverberation as a source of spatial information obtained from single-channel recordings to perform speaker diarization. The proposed system is shown to reduce speaker classification errors by 34% when compared with current MFCC based single-channel systems.

international workshop on acoustic signal enhancement | 2014

A quantitative comparison of blind C 50 estimators

Pablo Peso Parada; Dushyant Sharma; Jose Lainez; Daniel A. Barreda; Patrick A. Naylor; T. van Waterschoot

The problem of blind estimation of the room acoustic clarity index C50 from single-channel reverberant speech signals is presented in this paper. We analyze the performance of several machine learning methods for a regression task using 309 features derived from the speech signal and modeled with a Deep Belief Network (DBN), Classification And Regression Tree (CART) and Linear Regression (LR). These techniques are evaluated on a large test database (86 hours) that includes babble noise and reverberation using both artificial and real room impulses responses (RIRs). All methods are trained on a database which contains noise, speech and simulated RIRs different from the test set. The performance results show that the DBN model gives the lowest error for the simulated RIRs whereas the LR model gives the best generalization performance with the highest accuracy for real RIRs.

ieee global conference on signal and information processing | 2014

Reverberant speech recognition: A phoneme analysis

Pablo Peso Parada; Dushyant Sharma; Patrick A. Naylor; Toon van Waterschoot

We present a phoneme confusion analysis that models the impact of reverberation on automatic speech recognition performance by formulating the problem in a Bayesian framework. Our analysis under reverberant conditions shows the relative robustness to reverberation of each phoneme and also indicates that substitutions and deletions correspond to the most common errors in a phoneme recognition task. Finally, a model is proposed to estimate the confusability of each phoneme depending on the reverberation level which is evaluated using two independent data sets.

Archive | 2013