Javier Ferreiros | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Javier Ferreiros is active.

Explore More

Publication

Featured researches published by Javier Ferreiros.

Speech Communication | 2008

Speech to sign language translation system for Spanish

Rubén San-Segundo; R. Barra; Ricardo de Córdoba; Luis Fernando D'Haro; F. Fernández; Javier Ferreiros; J.M. Lucas; Javier Macias-Guarasa; Juan Manuel Montero; José Manuel Pardo

This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card. The system translates official explanations into Spanish Sign Language (LSE: Lengua de Signos Espanola) for Deaf people. The translation system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the hand movements). Two proposals for natural language translation have been evaluated: a rule-based translation module (that computes sign confidence measures from the word confidence measures obtained in the speech recognition module) and a statistical translation module (in this case, parallel corpora were used for training the statistical model). The best configuration reported 31.6% SER (Sign Error Rate) and 0.5780 BLEU (BiLingual Evaluation Understudy). The paper also describes the eSIGN 3D avatar animation module (considering the sign confidence), and the limitations found when implementing a strategy for reducing the delay between the spoken utterance and the sign sequence animation.

Interacting with Computers | 2010

Spoken Spanish generation from sign language

Rubén San-Segundo; José Manuel Pardo; Javier Ferreiros; V. Sama; Roberto Barra-Chicote; J.M. Lucas; D. Sánchez; A. García

This paper describes the development of a Spoken Spanish generator from sign-writing. The sign language considered was the Spanish sign language (LSE: Lengua de Signos Espanola). This system consists of an advanced visual interface (where a deaf person can specify a sequence of signs in sign-writing), a language translator (for generating the sequence of words in Spanish), and finally, a text to speech converter. The visual interface allows a sign sequence to be defined using several sign-writing alternatives. The paper details the process for designing the visual interface proposing solutions for HCI-specific challenges when working with the Deaf (i.e. important difficulties in writing Spanish or limited sign coverage for describing abstract or conceptual ideas). Three strategies were developed and combined for language translation to implement the final version of the language translator module. The summative evaluation, carried out with Deaf from Madrid and Toledo, includes objective measurements from the system and subjective information from questionnaires. The paper also describes the first Spanish-LSE parallel corpus for language processing research focused on specific domains. This corpus includes more than 4000 Spanish sentences translated into LSE. These sentences focused on two restricted domains: the renewal of the identity document and drivers license. This corpus also contains all sign descriptions in several sign-writing specifications generated with a new version of the eSign Editor. This new version includes a grapheme to phoneme system for Spanish and a SEA-HamNoSys converter.

Journal of Visual Languages and Computing | 2008

Proposing a speech to gesture translation architecture for Spanish deaf people

Rubén San-Segundo; Juan Manuel Montero; Javier Macias-Guarasa; Ricardo de Córdoba; Javier Ferreiros; José Manuel Pardo

This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information.

Speech Communication | 1999

Improving continuous speech recognition in Spanish by phone-class semicontinuous HMMs with pausing and multiple pronunciations

Javier Ferreiros; José Manuel Pardo

Abstract This paper presents a comprehensive study of continuous speech recognition in Spanish. It shows the use and optimisation of several well-known techniques together with the application for the first time to Spanish of language specific knowledge to these systems, i.e. the careful selection of the phone inventory, the phone-classes used, and the selection of alternative pronunciation rules. We have developed a semicontinuous phone-class dependent contextual modelling. Using four phone-classes, we have obtained recognition error rate reductions roughly equivalent to the percentage increase of the number of parameters, compared to baseline semicontinuous contextual modelling. We also show that the use of pausing in the training system and multiple pronunciations in the vocabulary help to improve recognition rates significantly. The actual pausing of the training sentences and the application of assimilation effects improve the transcription into context-dependent units. Multiple pronunciation possibilities are generated using general rules that are easily applied to any Spanish vocabulary. With all these ideas we have reduced the recognition errors of the baseline system by more than 30% in a task parallel to DARPA-RM translated into Spanish with a vocabulary of 979 words. Our database contains four speakers with 600 training sentences and 100 testing sentences each. All experiments have been carried out with a perplexity of 979, and even slightly higher in the case of multiple pronunciations, to be able to study the acoustic modelling power of the systems with no grammar constraints.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Speaker Diarization Based on Intensity Channel Contribution

Roberto Barra-Chicote; José Manuel Pardo; Javier Ferreiros; Juan Manuel Montero

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last years RT09 evaluation data.

spoken language technology workshop | 2008

Evaluation of a spoken dialogue system for controlling a Hifi audio system

F. Fernandez Martinez; J. Blazquez; Javier Ferreiros; R. Barra; Javier Macias-Guarasa; J.M. Lucas-Cuesta

In this paper, a Bayesian networks, BNs, approach to dialogue modelling is evaluated in terms of a battery of both subjective and objective metrics. A significant effort in improving the contextual information handling capabilities of the system has been done. Consequently, besides typical dialogue measurement rates for usability like task or dialogue completion rates, dialogue time, etc. we have included a new figure measuring the contextuality of the dialogue as the number of turns where contextual information is helpful for dialogue resolution. The evaluation is developed through a set of predefined scenarios according to different initiative styles and focusing on the impact of the users level of experience.

annual meeting of the special interest group on discourse and dialogue | 2001

Designing confirmation mechanisms and error recover techniques in a Railway Information system for Spanish

Rubén San-Segundo; Juan Manuel Montero; Javier Ferreiros; Ricardo de Córdoba; José Manuel Pardo

In this paper, we propose an approach for designing the confirmation strategies in a Railway Information system for Spanish, based on confidence measures obtained from recognition. We also present several error recover and user modelling techniques incorporated in this system. In the field evaluation, it is shown that more than 60% of the confirmations were implicit ones. This kind of confirmations, in combination with fast error recover and user modelling techniques, makes the dialogue faster, obtaining a mean call duration of 204 seconds.

Computer Speech & Language | 2012

Automatic categorization for improving Spanish into Spanish Sign Language machine translation

Verónica López-Ludeña; Rubén San-Segundo; Juan Manuel Montero; Ricardo de Córdoba; Javier Ferreiros; José Manuel Pardo

This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Drivers License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.

international conference on spoken language processing | 1996

Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in telephone environment

Javier Macias-Guarasa; A. Gallardo; Javier Ferreiros; José Manuel Pardo; L. Villarrubia

We are improving a flexible, large-vocabulary, speaker-independent, isolated-word recognition system in a telephone environment, originally designed as an integrated system doing all the recognition process in one step. We have transformed it by adopting the hypothesis-verification paradigm. In this paper, we describe the architecture and results of the hypothesis subsystem. We show the system evolution and the modifications adopted to face such a difficult task, achieving significant improvements using automatically clustered phoneme-like units, semi-continuous HMMs and multiple models per unit. The system behavior for vocabulary-dependent and vocabulary-independent tasks and for vocabularies up to 10,000 words are tested.

international conference on acoustics, speech, and signal processing | 2009

A Bayesian NETWORKS approach for dialog modeling: The fusion BN

F. Fernández Martínez; Javier Ferreiros; Ricardo de Córdoba; Juan Manuel Montero; Rubén San-Segundo; José Manuel Pardo

Bayesian Networks, BNs, are suitable for mixed-initiative dialog modeling allowing a more flexible and natural spoken interaction. This solution can be applied to identify the intention of the user considering the concepts extracted from the last utterance and the dialog context. Subsequently, in order to make a correct decision regarding how the dialog should continue, unnecessary, missing, wrong, optional and required concepts have to be detected according to the inferred goals. This information is useful to properly drive the dialog prompting for missing concepts, clarifying for wrong concepts, ignoring unnecessary concepts and retrieving those required and optional. This paper presents a novel BNs approach where a single BN is obtained from N goal-specific BNs through a fusion process. The new fusion BN enables a single concept analysis which is more consistent with the whole dialog context.

Explore More