Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ramón Fernández Astudillo is active.

Publication


Featured researches published by Ramón Fernández Astudillo.


Eurasip Journal on Audio, Speech, and Music Processing | 2010

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Dorothea Kolossa; Ramón Fernández Astudillo; Eugen Hoffmann; Reinhold Orglmeister

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Computing MMSE Estimates and Residual Uncertainty Directly in the Feature Domain of ASR using STFT Domain Speech Distortion Models

Ramón Fernández Astudillo; Reinhold Orglmeister

In this paper we demonstrate how uncertainty propagation allows the computation of minimum mean square error (MMSE) estimates in the feature domain for various feature extraction methods using short-time Fourier transform (STFT) domain distortion models. In addition to this, a measure of estimate reliability is also attained which allows either feature re-estimation or the dynamic compensation of automatic speech recognition (ASR) models. The proposed method transforms the posterior distribution associated to a Wiener filter through the feature extraction using the STFT Uncertainty Propagation formulas. It is also shown that non-linear estimators in the STFT domain like the Ephraim-Malah filters can be seen as special cases of a propagation of the Wiener posterior. The method is illustrated by developing two MMSE-Mel-frequency Cepstral Coefficient (MFCC) estimators and combining them with observation uncertainty techniques. We discuss similarities with other MMSE-MFCC estimators and show how the proposed approach outperforms conventional MMSE estimators in the STFT domain on the AURORA4 robust ASR task.


IEEE Journal of Selected Topics in Signal Processing | 2010

An Uncertainty Propagation Approach to Robust ASR Using the ETSI Advanced Front-End

Ramón Fernández Astudillo; Dorothea Kolossa; Philipp Mandelartz; Reinhold Orglmeister

In this paper, we show how uncertainty propagation, combined with observation uncertainty techniques, can be applied to a realistic implementation of robust distributed speech recognition (DSR) to improve recognition robustness furthermore, with little increase in computational complexity. Uncertainty propagation, or error propagation, techniques employ a probabilistic description of speech to reflect the information lost during speech enhancement or source separation in the time or frequency domain. This uncertain description is then propagated through the feature extraction process to the domain of features used in speech recognition. In this domain, the statistical information can be combined with the statistical parameters of the recognition model by employing observation uncertainty techniques. We show that the combination of a piecewise uncertainty propagation scheme with front-end uncertainty decoding or modified imputation improves the baseline of the advanced front-end (AFE), the state of the art algorithm of the European Telecommunications Standards Institute (ETSI), on the AURORA5 database. We compare this method with other observation uncertainty techniques and show how the use of uncertainty propagation reduces the word error rates without the need for any kind of adaptation to noise using stereo data or iterative parameter estimation.


international joint conference on natural language processing | 2015

Learning Word Representations from Scarce and Noisy Data with Embedding Subspaces

Ramón Fernández Astudillo; Silvio Amir; Wang Ling; Mário J. Silva; Isabel Trancoso

We investigate a technique to adapt unsupervised word embeddings to specific applications, when only small and noisy labeled datasets are available. Current methods use pre-trained embeddings to initialize model parameters, and then use the labeled data to tailor them for the intended task. However, this approach is prone to overfitting when the training is performed with scarce and noisy data. To overcome this issue, we use the supervised data to find an embedding subspace that fits the task complexity. All the word representations are adapted through a projection into this task-specific subspace, even if they do not occur on the labeled dataset. This approach was recently used in the SemEval 2015 Twitter sentiment analysis challenge, attaining state-of-the-art results. Here we show results improving those of the challenge, as well as additional experiments in a Twitter Part-Of-Speech tagging task.


asilomar conference on signals, systems and computers | 2006

Recognition of Convolutive Speech Mixtures by Missing Feature Techniques for ICA

Dorothea Kolossa; Hiroshi Sawada; Ramón Fernández Astudillo; Reinhold Orglmeister; Shoji Makino

One challenging problem for robust speech recognition is the cocktail party effect, where multiple speaker signals are active simultaneously in an overlapping frequency range. In that case, independent component analysis (ICA) can separate the signals in reverberant environments, also. However, incurred feature distortions prove detrimental for speech recognition. To reduce consequential recognition errors, we describe the use of ICA for the additional estimation of uncertainty information. This information is subsequently used in missing feature speech recognition, which leads to far more correct and accurate recognition also in reverberant situations at RT60 = 300ms.


north american chapter of the association for computational linguistics | 2015

INESC-ID: A Regression Model for Large Scale Twitter Sentiment Lexicon Induction

Silvio Amir; Ramón Fernández Astudillo; Wang Ling; Bruno Martins; Mário J. Silva; Isabel Trancoso

We present the approach followed by INESCID in the SemEval 2015 Twitter Sentiment Analysis challenge, subtask E. The goal was to determine the strength of the association of Twitter terms with positive sentiment. Using two labeled lexicons, we trained a regression model to predict the sentiment polarity and intensity of words and phrases. Terms were represented as word embeddings induced in an unsupervised fashion from a corpus of tweets. Our system attained the top ranking submission, attesting the general adequacy of the proposed approach.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Corpus-Based Speech Enhancement With Uncertainty Modeling and Cepstral Smoothing

Robert M. Nickel; Ramón Fernández Astudillo; Dorothea Kolossa; Rainer Martin

We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiaos method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.


IEEE Signal Processing Letters | 2013

Noise-Adaptive LDA: A New Approach for Speech Recognition Under Observation Uncertainty

Dorothea Kolossa; Steffen Zeiler; Rahim Saeidi; Ramón Fernández Astudillo

Automatic speech recognition (ASR) performance suffers severely from non-stationary noise, precluding widespread use of ASR in natural environments. Recently, so-termed uncertainty-of-observation techniques have helped to recover good performance. These consider the clean speech features as a hidden variable, of which the observable features are only an imperfect estimate. An estimated error variance of features is therefore used to further guide recognition. Based on the same idea, we introduce a new strategy: Reducing the speech feature dimensionality for optimal discriminance under observation uncertainty can yield significantly improved recognition performance, and is derived easily via Fishers criterion of discriminant analysis.


IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016

Uncertain LDA: Including Observation Uncertainties in Discriminative Transforms

Rahim Saeidi; Ramón Fernández Astudillo; Dorothea Kolossa

Linear discriminant analysis (LDA) is a powerful technique in pattern recognition to reduce the dimensionality of data vectors. It maximizes discriminability by retaining only those directions that minimize the ratio of within-class and between-class variance. In this paper, using the same principles as for conventional LDA, we propose to employ uncertainties of the noisy or distorted input data in order to estimate maximally discriminant directions. We demonstrate the efficiency of the proposed uncertain LDA on two applications using state-of-the-art techniques. First, we experiment with an automatic speech recognition task, in which the uncertainty of observations is imposed by real-world additive noise. Next, we examine a full-scale speaker recognition system, considering the utterance duration as the source of uncertainty in authenticating a speaker. The experimental results show that when employing an appropriate uncertainty estimation algorithm, uncertain LDA outperforms its conventional LDA counterpart.


international conference on acoustics, speech, and signal processing | 2014

Accounting for the residual uncertainty of multi-layer perceptron based features

Ramón Fernández Astudillo; Alberto Abad; Isabel Trancoso

Multi-Layer Perceptrons (MLPs) are often interpreted as modeling a posterior distribution over classes given input features using the mean field approximation. This approximation is fast but neglects the residual uncertainty of inference at each layer, making inference less robust. In this paper we introduce a new approximation of MLP inference that takes under consideration this residual uncertainty. The proposed algorithm propagates not only the mean, but also the variance of inference through the network. At the current stage, the proposed method can not be used with soft-max layers. Therefore, we illustrate the benefits of this algorithm in a tandem scheme. We use the residual uncertainty of inference of MLP-based features to compensate a GMM-HMM backend with uncertainty decoding. Experiments on the Aurora4 corpus show consistent improvement of performance against conventional MLPs for all scenarios, in particular for clean speech and multi-style training.

Collaboration


Dive into the Ramón Fernández Astudillo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Isabel Trancoso

Instituto Superior Técnico

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Reinhold Orglmeister

Technical University of Berlin

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Wang Ling

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge