Ricardo de Córdoba
Technical University of Madrid
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ricardo de Córdoba.
Speech Communication | 2008
Rubén San-Segundo; R. Barra; Ricardo de Córdoba; Luis Fernando D'Haro; F. Fernández; Javier Ferreiros; J.M. Lucas; Javier Macias-Guarasa; Juan Manuel Montero; José Manuel Pardo
This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card. The system translates official explanations into Spanish Sign Language (LSE: Lengua de Signos Espanola) for Deaf people. The translation system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the hand movements). Two proposals for natural language translation have been evaluated: a rule-based translation module (that computes sign confidence measures from the word confidence measures obtained in the speech recognition module) and a statistical translation module (in this case, parallel corpora were used for training the statistical model). The best configuration reported 31.6% SER (Sign Error Rate) and 0.5780 BLEU (BiLingual Evaluation Understudy). The paper also describes the eSIGN 3D avatar animation module (considering the sign confidence), and the limitations found when implementing a strategy for reducing the delay between the spoken utterance and the sign sequence animation.
international conference on acoustics, speech, and signal processing | 2006
R. Barra; Juan Manuel Montero; Javier Macias-Guarasa; Luis Fernando D'Haro; Rubén San-Segundo; Ricardo de Córdoba
It is well known that the emotional state of a speaker usually alters the way she/he speaks. Although all the components of the voice can be affected by emotion in some statistically-significant way, not all these deviations from a neutral voice are identified by human listeners as conveying emotional information. In this paper we have carried out several perceptual and objective experiments that show the relevance of prosody and segmental spectrum in the characterization and identification of four emotions in Spanish. A Bayes classifier has been used in the objective emotion identification task. Emotion models were generated as the contribution of every emotion to the build-up of a universal background emotion codebook. According to our experiments, surprise is primarily identified by humans through its prosodic rubric (in spite of some automatically-identifiable segmental characteristics); while for anger the situation is just the opposite. Sadness and happiness need a combination of prosodic and segmental rubrics to be reliably identified
Pattern Analysis and Applications | 2012
Rubén San-Segundo; Juan Manuel Montero; Ricardo de Córdoba; V. Sama; F. Fernández; L. F. D’Haro; V. López-Ludeña; D. Sánchez; A. García
This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).
international conference on acoustics, speech, and signal processing | 2014
Luis Fernando D'Haro; Ricardo de Córdoba; C. Salamea; J. D. Echeverry
This paper presents new techniques with relevant improvements added to the primary system presented by our group to the Albayzin 2012 LRE competition, where the use of any additional corpora for training or optimizing the models was forbidden. In this work, we present the incorporation of an additional phonotactic subsystem based on the use of phone log-likelihood ratio features (PLLR) extracted from different phonotactic recognizers that contributes to improve the accuracy of the system in a 21.4% in terms of Cavg (we also present results for the official metric during the evaluation, Fact). We will present how using these features at the phone state level provides significant improvements, when used together with dimensionality reduction techniques, especially PCA. We have also experimented with applying alternative SDC-like configurations on these PLLR features with additional improvements. Also, we will describe some modifications to the MFCC-based acoustic i-vector system which have also contributed to additional improvements. The final fused system outperformed the baseline in 27.4% in Cavg.
Journal of Visual Languages and Computing | 2008
Rubén San-Segundo; Juan Manuel Montero; Javier Macias-Guarasa; Ricardo de Córdoba; Javier Ferreiros; José Manuel Pardo
This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information.
Speech Communication | 2002
Rubén San-Segundo; José Colás; Ricardo de Córdoba; José Manuel Pardo
In this paper we present a hypothesis-verification approach for a Spanish recognizer of continuously spelled names over the telephone. We give a detailed description of the spelling task for Spanish where the most confusable letter sets are described. We introduce a new HMM topology with contextual silences incorporated into the letter model to deal with pauses between letters, increasing the Letter Accuracy by 6.6 points compared with a single silence model approach. For the final configuration of the hypothesis step we obtain a Letter Accuracy of 88.1% and a Name Recognition Rate of 94.2% for a 1000 names dictionary. In this configuration, we also use noise models for reducing letter insertions, and a Letter Graph to incorporate N-gram language models and to calculate the N-best letter sequences. In the verification step, we consider the M-best candidates provided by the hypothesis step. We evaluate the whole system for different dictionaries, obtaining more than 90.0% Name Recognition Rate for a 10,000 names dictionary. Finally, we demonstrate the utility of incorporating a Spelled Name Recognizer in a Directory Assistance Service over the telephone increasing the percentage of calls automatically serviced from 39.4% to 58.7%.
annual meeting of the special interest group on discourse and dialogue | 2001
Rubén San-Segundo; Juan Manuel Montero; Javier Ferreiros; Ricardo de Córdoba; José Manuel Pardo
In this paper, we propose an approach for designing the confirmation strategies in a Railway Information system for Spanish, based on confidence measures obtained from recognition. We also present several error recover and user modelling techniques incorporated in this system. In the field evaluation, it is shown that more than 60% of the confirmations were implicit ones. This kind of confirmations, in combination with fast error recover and user modelling techniques, makes the dialogue faster, obtaining a mean call duration of 204 seconds.
international conference on acoustics, speech, and signal processing | 2002
Ricardo de Córdoba; Philip C. Woodland; Mark J. F. Gales
This paper investigates the cross-task recognition and adaptation performance of HMMs trained using either conventional maximum likelihood estimation or the discriminative maximum mutual information estimation (MMIE) criterion. Initial experiments used models trained on the low noise North American Business news corpus of read speech. Cross-task testing on Broadcast News data showed that the MMIE models yielded lower error rates both across-task as well as within-task. This result was confirmed using models trained on the Switchboard corpus which were tested on Voicemail (VM)data. This setup was also used to investigate the performance of task-adaptation when using a limited amount of VM data for both acoustic and language modelling. The setup that gave the best performance on the VM test data used Switchboard models trained using MMIE and then adapted to VM data using maximum a posteriori adaptation techniques.
Computer Speech & Language | 2012
Verónica López-Ludeña; Rubén San-Segundo; Juan Manuel Montero; Ricardo de Córdoba; Javier Ferreiros; José Manuel Pardo
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Drivers License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
international conference on acoustics, speech, and signal processing | 2013
Luis Fernando D'Haro; Ricardo de Córdoba; Miguel Ánguel Caraballo; José Manuel Pardo
This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and post-evaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper.