Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Luis Fernando D'Haro is active.

Publication


Featured researches published by Luis Fernando D'Haro.


Speech Communication | 2008

Speech to sign language translation system for Spanish

Rubén San-Segundo; R. Barra; Ricardo de Córdoba; Luis Fernando D'Haro; F. Fernández; Javier Ferreiros; J.M. Lucas; Javier Macias-Guarasa; Juan Manuel Montero; José Manuel Pardo

This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card. The system translates official explanations into Spanish Sign Language (LSE: Lengua de Signos Espanola) for Deaf people. The translation system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the hand movements). Two proposals for natural language translation have been evaluated: a rule-based translation module (that computes sign confidence measures from the word confidence measures obtained in the speech recognition module) and a statistical translation module (in this case, parallel corpora were used for training the statistical model). The best configuration reported 31.6% SER (Sign Error Rate) and 0.5780 BLEU (BiLingual Evaluation Understudy). The paper also describes the eSIGN 3D avatar animation module (considering the sign confidence), and the limitations found when implementing a strategy for reducing the delay between the spoken utterance and the sign sequence animation.


international conference on acoustics, speech, and signal processing | 2006

Prosodic and Segmental Rubrics in Emotion Identification

R. Barra; Juan Manuel Montero; Javier Macias-Guarasa; Luis Fernando D'Haro; Rubén San-Segundo; Ricardo de Córdoba

It is well known that the emotional state of a speaker usually alters the way she/he speaks. Although all the components of the voice can be affected by emotion in some statistically-significant way, not all these deviations from a neutral voice are identified by human listeners as conveying emotional information. In this paper we have carried out several perceptual and objective experiments that show the relevance of prosody and segmental spectrum in the characterization and identification of four emotions in Spanish. A Bayes classifier has been used in the objective emotion identification task. Emotion models were generated as the contribution of every emotion to the build-up of a universal background emotion codebook. According to our experiments, surprise is primarily identified by humans through its prosodic rubric (in spite of some automatically-identifiable segmental characteristics); while for anger the situation is just the opposite. Sadness and happiness need a combination of prosodic and segmental rubrics to be reliably identified


international conference on acoustics, speech, and signal processing | 2014

Extended phone log-likelihood ratio features and acoustic-based i-vectors for language recognition

Luis Fernando D'Haro; Ricardo de Córdoba; C. Salamea; J. D. Echeverry

This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.


international conference on acoustics, speech, and signal processing | 2013

Low-resource language recognition using a fusion of phoneme posteriorgram counts, acoustic and glottal-based i-vectors

Luis Fernando D'Haro; Ricardo de Córdoba; Miguel Ánguel Caraballo; José Manuel Pardo

This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and post-evaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.


IEEE Aerospace and Electronic Systems Magazine | 2006

Automatic Understanding of ATC Speech

F. Fernández; Javier Ferreiros; José Manuel Pardo; Valentín Sama; R. de Córdoba; J. Marias-Guarasa; Juan Manuel Montero; R. San Segundo; Luis Fernando D'Haro; M. Santamaria; G. Gonzalez

In this paper we make a critical revision of the state-of-the-art in automatic speech processing as applied to air traffic control. We present the development of a new ATC speech understanding system comparing its performance and advantages to previously published experiences. The system has innovative solutions such as detecting the air/ground language spoken by air traffic controllers in an international airport with two official languages and the ability to adapt to new situations by automatically learning stochastic grammars from data, eliminating the need to write expensive and eternally incomplete grammars. A relevant new feature is the use of a speech understanding module able to extract semantically relevant information from the transcription of the sentences delivered by the speech recognizers. Two main assessment objectives are pursued and discussed throughout the paper: the effects of human spontaneity and the lack of linguistic coverage in understanding performance. The potential of this technology, ways of improvement, and proposals for the future are also presented


2006 IEEE Odyssey - The Speaker and Language Recognition Workshop | 2006

Integration of acoustic information and PPRLM scores in a multiple-Gaussian classifier for Language Identification

Ricardo de Córdoba; Rubén San-Segundo; J. Macias; Juan Manuel Montero; R. Barra; Luis Fernando D'Haro; J.C. Plaza; Javier Ferreiros

In this paper, we present several innovative techniques that can be applied in a PPRLM system for language identification (LID). We will show how we obtained a 53.5% relative error reduction from our base system using several techniques. First, the application of a variable threshold in score computation, dependent on the average scores in the language model, provided a 35% error reduction. A random selection of sentences for the different sets and the use of silence models also improved the system. Then, to improve the classifier, we compared the bias removal technique (up to 19% error reduction) and a Gaussian classifier (up to 37% error reduction). Finally, we included the acoustic score in the Gaussian classifier (2% error reduction) and increased the number of Gaussians to have a multiple-Gaussian classifier (14% error reduction). We will show how all these improvements are remarkable as they have been mostly additive


human-agent interaction | 2016

A Web-based Platform for Collection of Human-Chatbot Interactions

Lue Lin; Luis Fernando D'Haro; Rafael E. Banchs

Over recent years, the world has seen multiple uses for conversational agents. Chatbots has been implemented into ecommerce systems, such as Amazon Echos Alexa [1]. Businesses and organizations like Facebook are also implementing bots into their applications. While a number of amazing chatbot platform exists, there are still difficulties in creating data-driven-systems as they large amount of data is needed for development and training. This paper we describe an advanced platform for evaluating and annotating human-chatbot interactions, its main features and goals, as well as the future plans we have for it.


IEEE Latin America Transactions | 2009

TelecomI+D04: Speech into Sign Language Statistical Translation System for Deaf People

B. Gallo; Rubén San-Segundo; J.M. Lucas; R. Barra; Luis Fernando D'Haro; F. Fernandez

This paper presents a set of experiments used to develop a statistical system from translating speech to sign language for deaf people. This system is composed of an Automatic Speech Recognition (ASR) system, followed by a statistical translation module and an animated agent that represents the different signs. Two different approaches have been used to perform the translations: a phrase-based system and a finite state transducer. For the evaluation, the followings figures have been considered: WER (Word Error Rate), BLEU and NIST. The paper presents translation results of reference sentences and sentences from the automatic speech recognizer. Also three different configurations have been evaluated for the speech recognizer. The best results were obtained with the finite state transducer, with a word error rate of 28.21% for the reference text, and 29.27% using the ASR output.


IEEE Aerospace and Electronic Systems Magazine | 2006

Air traffic control speech recognition system cross-task & speaker adaptation

R. de Córdoba; Javier Ferreiros; Rubén San-Segundo; Javier Macias-Guarasa; Juan Manuel Montero; F. Fernández; Luis Fernando D'Haro; José Manuel Pardo

We present an overview of the most common techniques used in automatic speech recognition to adapt a general system to a different environment (known as cross-task adaptation) such as in an air traffic control system (ATC). The conditions present in ATC are very specific: very spontaneous, the presence of noise, and high speed speech. So, with a typical speech recognizer the recognition results are unsatisfactory. We have to decide on the best option for the modeling: to develop acoustic models specific to those conditions from scratch using the data available for the new environment, or to carry out cross-task adaptation starting from reliable HMM models (usually requiring less data in the target domain). We begin with a description of the main techniques considered for cross-task adaptation, namely maximum a posteriori (MAP), maximum likelihood linear regression (MLLR), and the two together. We have applied each in two speech recognizers for air traffic control tasks, one for spontaneous speech and the other for a command interface. We show the performance of these techniques and compare them with the development of a new system from scratch. We also show the results obtained for speaker adaptation using a variable amount of adaptation data. The main conclusion is that MLLR can outperform MAP when a large number of transforms is used, and MLLR followed by MAP is the best option. All of these techniques are better than developing a new system from scratch, showing the effectiveness of mean and variance adaptation


Odyssey 2016 | 2016

On the use of phone-gram units in recurrent neural networks for language identification.

Christian Salamea; Luis Fernando D'Haro; Ricardo de Córdoba; Rubén San-Segundo

In this paper we present our results on using RNN-ba sed LM scores trained on different phone-gram orders and u sing different phonetic ASR recognizers. In order to avoi d data sparseness problems and to reduce the vocabulary of all possible n-gram combinations, a K-means clustering procedure was performed using phone-vector embeddin gs as a pre-processing step. Additional experiments to opti mize the amount of classes, batch-size, hidden neurons, stat e-unfolding, are also presented. We have worked with the KALAKA3 database for the plenty-closed condition [1]. Thank s to our clustering technique and the combination of high le vel phonegrams, our phonotactic system performs ~13% better than the unigram-based RNNLM system. Also, the obtained RNNLM scores are calibrated and fused with other scores f rom an acoustic-based i-vector system and a traditional PP RLM system. This fusion provides additional improvement s showing that they provide complementary information t the LID system.

Collaboration


Dive into the Luis Fernando D'Haro's collaboration.

Top Co-Authors

Avatar

Ricardo de Córdoba

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Juan Manuel Montero

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Javier Ferreiros

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

José Manuel Pardo

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Rubén San-Segundo

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Rubén San Segundo

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

F. Fernández

Technical University of Madrid

View shared research outputs
Researchain Logo
Decentralizing Knowledge