Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Eduardo Lleida is active.

Publication


Featured researches published by Eduardo Lleida.


international carnahan conference on security technology | 2011

Preventing replay attacks on speaker verification systems

Jesús Villalba; Eduardo Lleida

In this paper, we describe a system for detecting spoofing attacks on speaker verification systems. We understand as spoofing the fact of impersonating a legitimate user. We focus on detecting two types of low technology spoofs. On the one side, we try to expose if the test segment is a far-field microphone recording of the victim that has been replayed on a telephone handset using a loudspeaker. On the other side, we want to determine if the recording has been created by cutting and pasting short recordings to forge the sentence requested by a text dependent system. This kind of attacks is of critical importance for security applications like access to bank accounts. To detect the first type of spoof we extract several acoustic features from the speech signal. Spoofs and non-spoof segments are classified using a support vector machine (SVM). The cut and paste is detected comparing the pitch and MFCC contours of the enrollment and test segments using dynamic time warping (DTW). We performed experiments using two databases created for this purpose. They include signals from land line and GSM telephone channels of 20 different speakers. We present results of the performance separately for each spoofing detection system and the fusion of both. We have achieved error rates under 10% for all the conditions evaluated. We show the degradation on the speaker verification performance in the presence of this kind of attack and how to use the spoofing detection to mitigate that degradation.


BioID'11 Proceedings of the COST 2101 European conference on Biometrics and ID management | 2011

Detecting replay attacks from far-field recordings on speaker verification systems

Jesús Villalba; Eduardo Lleida

In this paper, we describe a system for detecting spoofing attacks on speaker verification systems. By spoofing we mean an attempt to impersonate a legitimate user. We focus on detecting if the test segment is a far-field microphone recording of the victim. This kind of attack is of critical importance in security applications like access to bank accounts. We present experiments on databases created for this purpose, including land line and GSM telephone channels. We present spoofing detection results with EER between 0% and 9% depending on the condition. We show the degradation on the speaker verification performance in the presence of this kind of attack and how to use the spoofing detection to mitigate that degradation.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition

Luis Buera; Eduardo Lleida; Antonio Miguel; Alfonso Ortega; Oscar Saz

In this paper, a set of feature vector normalization methods based on the minimum mean square error (MMSE) criterion and stereo data is presented. They include multi-environment model-based linear normalization (MEMLIN), polynomial MEMLIN (P-MEMLIN), multi-environment model-based histogram normalization (MEMHIN), and phoneme-dependent MEMLIN (PD-MEMLIN). Those methods model clean and noisy feature vector spaces using Gaussian mixture models (GMMs). The objective of the methods is to learn a transformation between clean and noisy feature vectors associated with each pair of clean and noisy model Gaussians. The direct approach to learn the transformation is by using stereo data; that is, noisy feature vectors and the corresponding clean feature vectors. In this paper, however, a nonstereo data based training procedure, is presented. The transformations can be modeled just like a bias vector (MEMLIN), or by using a first-order polynomial (P-MEMLIN) or a nonlinear function based on histogram equalization (MEMHIN). Further improvements are obtained by using phoneme-dependent bias vector transformation (PD-MEMLIN). In PD-MEMLIN, the clean and noisy feature vector spaces are split into several phonemes, and each of them is modeled as a GMM. Those methods achieve significant word error rate improvements over others that are based on similar targets. The experimental results using the SpeechDat Car database show an average improvement in word error rate greater than 68% in all cases compared to the baseline when using the original clean acoustic models, and up to 83% when training acoustic models on the new normalized feature space


international conference on acoustics speech and signal processing | 1998

Robust continuous speech recognition system based on a microphone array

Eduardo Lleida; Julian Fernández; Enrique Masgrau

A robust speech recognition system for videoconference applications is presented based on a microphone array. By means of a microphone array, the speech recognition system is able to know the position of the users and increase the signal-to-noise ratio (SNR) between the desired speaker signal and the interference from the other users. The user positions are estimated by means of the combination of a direction of arrival (DOA) estimation method with a speaker identification system. The beamforming is performed by using the spatial references of the desired speaker and the interference locations. A minimum variance algorithm with spatial constraints working in the frequency domain is used to design the weights of the broadband microphone array. Results of the speech recognition system are reported in a simulated environment with several users asking questions to a geographic data base.


IberSPEECH | 2012

Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit

David Martinez; Eduardo Lleida; Alfonso Ortega; Antonio Miguel; Jesús Villalba

The paper presents a set of experiments on pathological voice detection over the Saarbrucken Voice Database (SVD) by using the MultiFocal toolkit for a discriminative calibration and fusion. The SVD is freely available online containing a collection of voice recordings of different pathologies, including both functional and organic. A generative Gaussian mixture model trained with mel-frequency cepstral coefficients, harmonics-to-noise ratio, normalized noise energy and glottal-to-noise excitation ratio, is used as classifier. Scores are calibrated to increase performance at the desired operating point. Finally, the fusion of different recordings for each speaker, in which vowels /a/, /i/ and /u/ are pronounced with normal, low, high, and low-high-low intonations, offers a great increase in the performance. Results are compared with the Massachusetts Eye and Ear Infirmary (MEEI) database, which makes possible to see that SVD is much more challenging.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition

Luis Buera; Antonio Miguel; Oscar Saz; Alfonso Ortega; Eduardo Lleida

In this paper, an unsupervised data-driven robust speech recognition approach is proposed based on a joint feature vector normalization and acoustic model adaptation. Feature vector normalization reduces the acoustic mismatch between training and testing conditions by mapping the feature vectors towards the training space. Model adaptation modifies the parameters of the acoustic models to match the test space. However, since neither is optimal, both approaches use an intermediate space between training and testing spaces to map either the feature vectors or acoustic models. The joint optimization of both approaches provides a common intermediate space with a better match between normalized feature vectors and adapted acoustic models. In this paper, feature vector normalization is based on a minimum mean square error (MMSE) criterion. A class dependent multi-environment model linear normalization (CD-MEMLIN) based on two classes (silence/speech) with a cross probability model (CD-MEMLIN-CPM) is used. CD-MEMLIN-CPM assumes that each class of clean and noisy spaces can be modeled with a Gaussian mixture model (GMM), training a linear transformation for each pair of Gaussians in an unsupervised data-driven training process. This feature vector normalization maps the recognition space feature vector to a normalized space. The acoustic model adaptation maps the training space to the normalized space by defining a set of linear transformations over an expanded HMM-state space, compensating for those degradations that the feature vector normalization is not able to model, like rotations. Experiments have been carried out with the Spanish SpeechDat Car database and Aurora 2 databases using both the standard Mel-frequency cepstral coefficient (MFCC) and advanced ETSI front-ends. Consistent improvements were reached for both corpora and front-ends. Using the standard MFCC front-end, a 92.08% average improvement on WER for Spanish SpeechDat Car and a 69.75% average improvement for clean condition evaluation of Aurora 2 was obtained, improving those results reached with ETSI advanced front-end (83.28% and 67.41%, respectively). Using the ETSI advanced front-end with the proposed solution, a 75.47% average improvement was obtained for the clean condition evaluation of Aurora 2 database.


IEEE Transactions on Speech and Audio Processing | 2005

Speech reinforcement system for car cabin communications

Alfonso Ortega; Eduardo Lleida; Enrique Masgrau

A speech reinforcement system is presented to improve communication between the front and the rear passengers in large motor vehicles. This type of communication can be difficult due to a number of factors, including distance between speakers, noise and lack of visual contact. The system described makes use of a set of microphones to pick up the speech of each passenger, then it amplifies these signals and plays them back to the cabin through the car audio loudspeaker system. The two main problems are noise amplification and electro-acoustic coupling between loudspeakers and microphones. To overcome these problems the system uses a set of acoustic echo cancellers, echo suppression filters and noise reduction stages. In this paper, the stability of a speech reinforcement system is studied. We propose a solution based on echo cancellers and residual echo suppression filters. The spectral estimation method for the power spectral density of the residual echo existing after the echo canceller is presented along with the derivation of the optimal residual echo suppression filter. Some results about the performance of the proposed system are also provided.


Speech Communication | 2012

A prelingual tool for the education of altered voices

William Ricardo Rodríguez; Oscar Saz; Eduardo Lleida

This paper addresses the problem of Computer-Aided Voice Therapy for altered voices. The proposal of the work is to develop a set of free activities called PreLingua for providing interactive voice therapy to a population of individuals with voice disorders. The interactive tools are designed to train voice skills like: voice production, intensity, blow, vocal onset, phonation time, tone, and vocalic articulation for Spanish language. The development of these interactive tools along with the underlying speech technologies that support them requires the existence of speech processing, whose algorithms must be robust with respect to the sources of speech variability that are characteristic of this population of speakers. One of the main problem addressed is how to estimate reliably formant frequencies in high-pitched speech (typical in children and women) and how to normalize these estimations independently of the characteristics of the speakers. Linear prediction coding, homomorphic analysis and modeling of the vocal tract are the core of the speech processing techniques used to allow such normalization through vocal tract length. This paper also presents the result of an experimental study where PreLingua was applied in a population with voice disorders and pathologies in special education centers in Spain and Colombia. Promising results were obtained in this preliminary study after 12 weeks of therapy, as it showed improvements in the voice capabilities of a remarkable number of users and the ability of the tool to educate impaired users with voice alterations. This improvement was assessed by the evaluation of the educators before and after the study and also by the performance of the subjects in the activities of PreLingua. The results were very encouraging to keep working in this direction, with the overall aim of providing further functionalities and robustness to the system.


international conference on acoustics, speech, and signal processing | 2013

Handling i-vectors from different recording conditions using multi-channel simplified PLDA in speaker recognition

Jesús Villalba; Eduardo Lleida

In this work, we address the problem of having i-vectors that have been produced in different channel conditions. Traditionally, this problem has been handled training the LDA covariance matrices pooling the data of all the conditions or averaging the covariance matrices of each condition in different ways. We present a PLDA variant that we call, multi-channel SPLDA, where the speaker space distribution is common to all i-vectors and the channel space distribution depends on the type of channel where the segment has been recorded. We test our approach on the telephone part of the NIST SRE10 extended condition where we added some additive noises to the test segments. We compare results of a SPLDA model trained only with clean data, SPLDA trained with pooled noisy and clean data and our MCSPLDA model.


international conference on acoustics, speech, and signal processing | 2008

E-inclusion technologies for the speech handicapped

Carlos Vaquero; Oscar Saz; Eduardo Lleida; William Ricardo Rodríguez

This paper addresses the problem that disabled people face when accessing the new systems and technologies that are available nowadays. The use of speech technologies, specially helpful for motor handicapped people, becomes unapproachable when these people also suffer speech impairments, making the gap in the society wider for them. As a way to include speech impaired people in the technological society of today, two lines of work have been carried out. On one hand, a computer-aided speech therapy software has been developed for the speech training of children with different disabilities. This tool, available for free distribution, makes use of different state-of-the-art speech technologies to train different levels of the language. As a result of this work, the software is being used currently in several centers for special education with a very encouraging feedback about the capabilities of the system. On the other hand, research on the use of automatic speech recognition (ASR) systems for the speech impaired has been carried out. This work has focused on current techniques of speaker adaptation to know how these techniques, fruitfully used in other tasks, can deal with this specific kind of speech. The use of Maximum A Posterior (MAP) obtains an improvement of 60.61% compared to the results of a baseline speaker independent model.

Collaboration


Dive into the Eduardo Lleida's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Oscar Saz

University of Zaragoza

View shared research outputs
Top Co-Authors

Avatar

Luis Buera

University of Zaragoza

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

José B. Mariño

Polytechnic University of Catalonia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Climent Nadeu

Polytechnic University of Catalonia

View shared research outputs
Researchain Logo
Decentralizing Knowledge