Juan Manuel Montero | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juan Manuel Montero is active.

Explore More

Publication

Featured researches published by Juan Manuel Montero.

IEEE Transactions on Education | 2006

A project-based learning approach to design electronic systems curricula

Javier Macias-Guarasa; Juan Manuel Montero; Rubén San-Segundo; Alvaro Araujo; Octavio Nieto-Taladriz

This paper presents an approach to design Electronic Systems Curricula for making electronics more appealing to students. Since electronics is an important grounding for other disciplines (computer science, signal processing, and communications), this approach proposes the development of multidisciplinary projects using the project-based learning (PBL) strategy for increasing the attractiveness of the curriculum. The proposed curriculum structure consists of eight courses: four theoretical courses and four PBL courses (including a compulsory Masters thesis). In PBL courses, the students, working together in groups, develop multidisciplinary systems, which become progressively more complex. To address this complexity, the Department of Electronic Engineering has invested in the last five years in many resources for developing software tools and a common hardware. This curriculum has been evaluated successfully for the last four academic years: the students have increased their interest in electronics and have given the courses an average grade of more than 71% for all PBL course evaluations (data extracted from students surveys). The students have also acquired new skills and obtained very good academic results: the average grade was more than 74% for all PBL courses. An important result is that all students have developed more complex and sophisticated electronic systems, while considering that the results are worth the effort invested

Speech Communication | 2008

Speech to sign language translation system for Spanish

Rubén San-Segundo; R. Barra; Ricardo de Córdoba; Luis Fernando D'Haro; F. Fernández; Javier Ferreiros; J.M. Lucas; Javier Macias-Guarasa; Juan Manuel Montero; José Manuel Pardo

This paper describes the development of and the first experiments in a Spanish to sign language translation system in a real domain. The developed system focuses on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card. The system translates official explanations into Spanish Sign Language (LSE: Lengua de Signos Espanola) for Deaf people. The translation system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the hand movements). Two proposals for natural language translation have been evaluated: a rule-based translation module (that computes sign confidence measures from the word confidence measures obtained in the speech recognition module) and a statistical translation module (in this case, parallel corpora were used for training the statistical model). The best configuration reported 31.6% SER (Sign Error Rate) and 0.5780 BLEU (BiLingual Evaluation Understudy). The paper also describes the eSIGN 3D avatar animation module (considering the sign confidence), and the limitations found when implementing a strategy for reducing the delay between the spoken utterance and the sign sequence animation.

Speech Communication | 2010

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

Roberto Barra-Chicote; Junichi Yamagishi; Simon King; Juan Manuel Montero; Javier Macias-Guarasa

We have applied two state-of-the-art speech synthesis techniques (unit selection and HMM-based synthesis) to the synthesis of emotional speech. A series of carefully designed perceptual tests to evaluate speech quality, emotion identification rates and emotional strength were used for the six emotions which we recorded -happiness, sadness, anger, surprise, fear, disgust. For the HMM-based method, we evaluated spectral and source components separately and identified which components contribute to which emotion. Our analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns. Whilst synthetic speech produced using the unit selection method has better emotional strength scores than the HMM-based method, the HMM-based method has the ability to manipulate the emotional strength. For emotions that are characterized by both spectral and prosodic components, synthetic speech using unit selection methods was more accurately identified by listeners. For emotions mainly characterized by prosodic components, HMM-based synthetic speech was more accurately identified. This finding differs from previous results regarding listener judgements of speaker similarity for neutral speech. We conclude that unit selection methods require improvements to prosodic modeling and that HMM-based methods require improvements to spectral modeling for emotional speech. Certain emotions cannot be reproduced well by either method.

international conference on acoustics, speech, and signal processing | 2006

Prosodic and Segmental Rubrics in Emotion Identification

R. Barra; Juan Manuel Montero; Javier Macias-Guarasa; Luis Fernando D'Haro; Rubén San-Segundo; Ricardo de Córdoba

It is well known that the emotional state of a speaker usually alters the way she/he speaks. Although all the components of the voice can be affected by emotion in some statistically-significant way, not all these deviations from a neutral voice are identified by human listeners as conveying emotional information. In this paper we have carried out several perceptual and objective experiments that show the relevance of prosody and segmental spectrum in the characterization and identification of four emotions in Spanish. A Bayes classifier has been used in the objective emotion identification task. Emotion models were generated as the contribution of every emotion to the build-up of a universal background emotion codebook. According to our experiments, surprise is primarily identified by humans through its prosodic rubric (in spite of some automatically-identifiable segmental characteristics); while for anger the situation is just the opposite. Sadness and happiness need a combination of prosodic and segmental rubrics to be reliably identified

Pattern Analysis and Applications | 2012

Design, development and field evaluation of a Spanish into sign language translation system

Rubén San-Segundo; Juan Manuel Montero; Ricardo de Córdoba; V. Sama; F. Fernández; L. F. D’Haro; V. López-Ludeña; D. Sánchez; A. García

This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE).

Signal Processing | 2016

Feature extraction from smartphone inertial signals for human activity segmentation

Rubén San-Segundo; Juan Manuel Montero; Roberto Barra-Chicote; Fernando Fernández; José Manuel Pardo

This paper proposes the adaptation of well-known strategies successfully used in speech processing: Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual Linear Prediction (PLP) coefficients. Additionally characteristics like RASTA filtering or delta coefficients are also considered and evaluated for inertial signal processing. These adaptations have been incorporated into a Human Activity Recognition and Segmentation (HARS) system based on Hidden Markov Models (HMMs) for recognizing and segmenting six different physical activities: walking, walking-upstairs, walking-downstairs, sitting, standing and lying.All experiments have been done using a publicly available dataset named UCI Human Activity Recognition Using Smartphones, which includes several sessions with physical activity sequences from 30 volunteers. This dataset has been randomly divided into six subsets for performing a six-fold cross validation procedure. For every experiment, average values from the six-fold cross-validation procedure are shown.The results presented in this paper overcome significantly baseline error rates, constituting a relevant contribution in the field. Adapted MFCC and PLP coefficients improve human activity recognition and segmentation accuracies while reducing feature vector size considerably. RASTA-filtering and delta coefficients contribute significantly to reduce the segmentation error rate obtaining the best results: an Activity Segmentation Error Rate lower than 0.5%. Human activity segmentation using Hidden Markov Models.Frequency-based feature extraction from Inertial Signals.RASTA filtering analysis and delta coefficients.Important dimensionality reduction.

Journal of Visual Languages and Computing | 2008

Proposing a speech to gesture translation architecture for Spanish deaf people

Rubén San-Segundo; Juan Manuel Montero; Javier Macias-Guarasa; Ricardo de Córdoba; Javier Ferreiros; José Manuel Pardo

This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information.

IEEE Signal Processing Letters | 2010

Histogram Equalization-Based Features for Speech, Music, and Song Discrimination

Ascensión Gallardo-Antolín; Juan Manuel Montero

In this letter, we present a new class of segment-based features for speech, music and song discrimination. These features, called PHEQ (Polynomial-Fit Histogram Equalization), are derived from the nonlinear relationship between the short-term feature distributions computed at segment level and a reference distribution. Results show that PHEQ characteristics outperform short-term features such as Mel Frequency Cepstrum Coefficients (MFCC) and conventional segment-based ones such as MFCC mean and variance. Furthermore, the combination of short-term and PHEQ features significantly improves the performance of the whole system.

IEEE Transactions on Audio, Speech, and Language Processing | 2011

Speaker Diarization Based on Intensity Channel Contribution

Roberto Barra-Chicote; José Manuel Pardo; Javier Ferreiros; Juan Manuel Montero

The time delay of arrival (TDOA) between multiple microphones has been used since 2006 as a source of information (localization) to complement the spectral features for speaker diarization. In this paper, we propose a new localization feature, the intensity channel contribution (ICC) based on the relative energy of the signal arriving at each channel compared to the sum of the energy of all the channels. We have demonstrated that by joining the ICC features and the TDOA features, the robustness of the localization features is improved and that the diarization error rate (DER) of the complete system (using localization and spectral features) has been reduced. By using this new localization feature, we have been able to achieve a 5.2% DER relative improvement in our development data, a 3.6% DER relative improvement in the RT07 evaluation data and a 7.9% DER relative improvement in the last years RT09 evaluation data.

annual meeting of the special interest group on discourse and dialogue | 2001

Designing confirmation mechanisms and error recover techniques in a Railway Information system for Spanish

Rubén San-Segundo; Juan Manuel Montero; Javier Ferreiros; Ricardo de Córdoba; José Manuel Pardo

In this paper, we propose an approach for designing the confirmation strategies in a Railway Information system for Spanish, based on confidence measures obtained from recognition. We also present several error recover and user modelling techniques incorporated in this system. In the field evaluation, it is shown that more than 60% of the confirmations were implicit ones. This kind of confirmations, in combination with fast error recover and user modelling techniques, makes the dialogue faster, obtaining a mean call duration of 204 seconds.

Explore More