Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where José Colás is active.

Publication


Featured researches published by José Colás.


database and expert systems applications | 2007

Ontology-Based Retrieval of Human Speech

Javier Tejedor; Roberto García; Miriam Fernández; Fernando López-Colino; Ferran Perdrix; José A. Macías; Rosa Gil; Marta Oliva; Diego Moya; José Colás; Pablo Castells

As part of the general growth and diversification of media in different modalities, the presence of information in the form of human speech in the world-wide body of digital content is becoming increasingly significant, in terms of both volume and value. We present a semantic- based search model for human speech corpora, stressing the search for meanings rather than words. Our framework embraces the complete recognition/retrieval cycle, from word spotting to semantic annotation, query processing, and search result presentation.


Speech Communication | 2008

A comparison of grapheme and phoneme-based units for Spanish spoken term detection

Javier Tejedor; Dong Wang; Joe Frankel; Simon King; José Colás

The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech and used to find matches. Lattice search (referred to as spoken term detection), uses a pre-indexing of speech data in terms of word or sub-word units, which can then quickly be searched for arbitrary terms without referring to the original audio. In both cases, the search term can be modelled in terms of sub-word units, typically phonemes. For in-vocabulary words (i.e. words that appear in the pronunciation dictionary), the letter-to-sound conversion systems are accepted to work well. However, for out-of-vocabulary (OOV) search terms, letter-to-sound conversion must be used to generate a pronunciation for the search term. This is usually a hard decision (i.e. not probabilistic and with no possibility of backtracking), and errors introduced at this step are difficult to recover from. We therefore propose the direct use of graphemes (i.e., letter-based sub-word units) for acoustic modelling. This is expected to work particularly well in languages such as Spanish, where despite the letter-to-sound mapping being very regular, the correspondence is not one-to-one, and there will be benefits from avoiding hard decisions at early stages of processing. In this article, we compare three approaches for Spanish keyword spotting or spoken term detection, and within each of these we compare acoustic modelling based on phone and grapheme units. Experiments were performed using the Spanish geographical-domain Albayzin corpus. Results achieved in the two approaches proposed for spoken term detection show us that trigrapheme units for acoustic modelling match or exceed the performance of phone-based acoustic models. In the method proposed for keyword spotting, the results achieved with each acoustic model are very similar.


Speech Communication | 2002

Spanish recognizer of continuously spelled names over the telephone

Rubén San-Segundo; José Colás; Ricardo de Córdoba; José Manuel Pardo

In this paper we present a hypothesis-verification approach for a Spanish recognizer of continuously spelled names over the telephone. We give a detailed description of the spelling task for Spanish where the most confusable letter sets are described. We introduce a new HMM topology with contextual silences incorporated into the letter model to deal with pauses between letters, increasing the Letter Accuracy by 6.6 points compared with a single silence model approach. For the final configuration of the hypothesis step we obtain a Letter Accuracy of 88.1% and a Name Recognition Rate of 94.2% for a 1000 names dictionary. In this configuration, we also use noise models for reducing letter insertions, and a Letter Graph to incorporate N-gram language models and to calculate the N-best letter sequences. In the verification step, we consider the M-best candidates provided by the hypothesis step. We evaluate the whole system for different dictionaries, obtaining more than 90.0% Name Recognition Rate for a 10,000 names dictionary. Finally, we demonstrate the utility of incorporating a Spelled Name Recognizer in a Directory Assistance Service over the telephone increasing the percentage of calls automatically serviced from 39.4% to 58.7%.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Query-by-Example Spoken Term Detection ALBAYZIN 2012 evaluation: overview, systems, results, and discussion

Javier Tejedor; Doroteo Torre Toledano; Xavier Anguera; Amparo Varona; Lluís F. Hurtado; Antonio Miguel; José Colás

Query-by-Example Spoken Term Detection (QbE STD) aims at retrieving data from a speech data repository given an acoustic query containing the term of interest as input. Nowadays, it has been receiving much interest due to the high volume of information stored in audio or audiovisual format. QbE STD differs from automatic speech recognition (ASR) and keyword spotting (KWS)/spoken term detection (STD) since ASR is interested in all the terms/words that appear in the speech signal and KWS/STD relies on a textual transcription of the search term to retrieve the speech data. This paper presents the systems submitted to the ALBAYZIN 2012 QbE STD evaluation held as a part of ALBAYZIN 2012 evaluation campaign within the context of the IberSPEECH 2012 Conferencea. The evaluation consists of retrieving the speech files that contain the input queries, indicating their start and end timestamps within the appropriate speech file. Evaluation is conducted on a Spanish spontaneous speech database containing a set of talks from MAVIR workshopsb, which amount at about 7 h of speech in total. We present the database metric systems submitted along with all results and some discussion. Four different research groups took part in the evaluation. Evaluation results show the difficulty of this task and the limited performance indicates there is still a lot of room for improvement. The best result is achieved by a dynamic time warping-based search over Gaussian posteriorgrams/posterior phoneme probabilities. This paper also compares the systems aiming at establishing the best technique dealing with that difficult task and looking for defining promising directions for this relatively novel task.


Computer Speech & Language | 2014

Feature analysis for discriminative confidence estimation in spoken term detection

Javier Tejedor; Doroteo Torre Toledano; Dong Wang; Simon King; José Colás

Discriminative condence based on multi-layer perceptrons (MLPs) and multiple features has shown signicant advantage compared to the widely used lattice-based condence in spoken term detection (STD). Although the MLP-based framework can handle any features derived from a multitude of sources, choosing all possible features may lead to over complex models and hence less generality. In this paper, we design an extensive set of features and analyze their contribution to STD individually and as a group. The main goal is to choose a small set of features that are suciently informative while keeping the model simple and generalizable. We employ two established models to conduct the analysis: one is linear regression which targets for the most relevant features and the other is logistic linear regression which targets for the most discriminative features. We nd the most informative features are comprised of those derived from diverse sources (ASR decoding, duration and lexical properties) and the two models deliver highly consistent feature ranks. STD experiments on both English and Spanish data demonstrate signicant performance gains with the proposed feature sets.


Computer Speech & Language | 2014

A rule-based translation from written Spanish to Spanish Sign Language glosses

Jordi Porta; Fernando López-Colino; Javier Tejedor; José Colás

HighlightsRule-based approach for a Spanish-to-LSE machine translation system for wide-domain application.The proposed approach is based on the transfer at the level of syntatic functions from dependency analysis and employs three different stages: analysis, transfer and generation.Morpho-lexical and lexical-semantic relationships are used in to expand the bilingual Spanish-LSE lexicon aiming at reducing the lexical gap between Spanish and LSE.The rule-based approach also integrates classifier predicate generation.Results present a 0.30 BLEU and 42% TER. Includes experiments and comparison with data-driven approaches, experiments with surface order, and linguistic-oriented error analysis. One of the aims of Assistive Technologies is to help people with disabilities to communicate with others and to provide means of access to information. As an aid to Deaf people, we present in this work a production-quality rule-based machine system for translating from Spanish to Spanish Sign Language (LSE) glosses, which is a necessary precursor to building a full machine translation system that eventually produces animation output. The system implements a transfer-based architecture from the syntactic functions of dependency analyses. A sketch of LSE is also presented. Several topics regarding translation to sign languages are addressed: the lexical gap, the bootstrapping of a bilingual lexicon, the generation of word order for topic-oriented languages, and the treatment of classifier predicates and classifier names. The system has been evaluated with an open-domain testbed, reporting a 0.30 BLEU (BiLingual Evaluation Understudy) and 42% TER (Translation Error Rate). These results show consistent improvements over a statistical machine translation baseline, and some improvements over the same system preserving the word order in the source sentence. Finally, the linguistic analysis of errors has identified some differences due to a certain degree of structural variation in LSE.


IEEE Signal Processing Letters | 2007

Blind Feature Compensation for Time-Variant Band-Limited Speech Recognition

Nicolás Morales; Doroteo Torre Toledano; John H. L. Hansen; José Colás

Mismatch in speech bandwidth between training and real operation greatly affects automatic speech recognition. This letter extends previous work on feature compensation of band-limited speech to establish a framework for blind compensation of speech data of unknown bandwidth, valid even when the distortion (band-limiting channel) changes rapidly and continuously in time. The available bandwidth of the input speech signal is automatically detected, and the band-limited feature vectors are compensated prior to being compared against full-bandwidth acoustic models. For a fixed bandwidth limitation, phoneme recognition performance using the proposed method is similar to that achieved by models adapted to match the distortion. However, compared to model adaptation approaches, this new approach can seamlessly be extended to rapidly time-varying conditions while maintaining low computational and memory costs


Journal of Visual Languages and Computing | 2012

Spanish Sign Language synthesis system

Fernando López-Colino; José Colás

This work presents a new approach to the synthesis of Spanish Sign Language (LSE). Its main contributions are the use of a centralized relational database for storing sign descriptions, the proposal of a new input notation and a new avatar design, the skeleton structure of which improves the synthesis process. The relational database facilitates a highly detailed phonologic description of the signs that include parameter synchronization and timing. The centralized database approach has been introduced to allow the representation of each sign to be validated by the LSE National Institution, FCNSE. The input notation, designated HLSML, presents multiple levels of abstraction compared with current input notations. Redesigned input notation is used to simplify the description and the manual definition of LSE messages. Synthetic messages obtained using our approach have been evaluated by deaf users; in this evaluation a maximum recognition rate of 98.5% was obtained for isolated signs and a recognition rate of 95% was achieved for signed sentences.


international conference on acoustics, speech, and signal processing | 2006

Unsupervised Class-Based Feature Compensation for Time-Variable Bandwidth-Limited Speech

Nicolás Morales; Doroteo Torre Toledano; John H. L. Hansen; Javier Garrido; José Colás

This paper deals with the problem of speech recognition on band-limited speech. In our previous work we showed how a simple polynomial correction framework could be used for compensation of band-limited speech to minimize the mismatch using full-bandwidth acoustic models. This paper extends this approach to time-varying multiple-channel environments. The compensation framework is extended to perform automatic channel classification prior to compensation, thus allowing for unsupervised multi-channel compensation without the need for an explicit channel classifier. Performance is demonstrated on a wide range of channel bandwidth conditions. This extension makes our compensation approach potentially applicable in a much wider range of scenarios with only very limited performance degradation compared to the supervised approach


Universal Access in The Information Society | 2012

Hybrid paradigm for Spanish Sign Language synthesis

Fernando López-Colino; José Colás

This work presents a hybrid approach to sign language synthesis. This approach allows the hand-tuning of the phonetic description of the signs, which focuses on the time aspect of the sign. Therefore, the approach retains the capacity for the performing of morpho-phonological operations, like notation-based approaches, and improves the synthetic signing performance, such as the hand-tuned animations approach. The proposed approach simplifies the input message description using a new high-level notation and storage of sign phonetic descriptions in a relational database. Such relational database allows for more flexible sign phonetic descriptions; it also allows for a description of sign timing and the synchronization between sign phonemes. The new notation, named HLSML, is a gloss-based notation focusing on message description in it. HLSML introduces several tags that allow for the modification of the signs in the message that defines dialect and mood variations, both of which are defined in the relational database, and message timing, including transition durations and pauses. A new avatar design is also proposed that simplifies the development of the synthesizer and avoids any interference with the independence of the sign language phonemes during animation. The obtained results showed an increase of the sign recognition rate compared to other approaches. This improvement was based on the active role that the sign language experts had in the description of signs, which was the result of the flexibility of the sign storage approach. The approach will simplify the description of synthesizable signed messages, thus facilitating the creation of multimedia-signed contents.

Collaboration


Dive into the José Colás's collaboration.

Top Co-Authors

Avatar

Fernando López-Colino

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Javier Tejedor

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Javier Garrido

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

José Manuel Pardo

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Doroteo Torre Toledano

Autonomous University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Javier Ferreiros

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Juan Manuel Montero

Technical University of Madrid

View shared research outputs
Top Co-Authors

Avatar

Simon King

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge