Ángel de la Torre
University of Granada
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ángel de la Torre.
Speech Communication | 2004
Javier Ramírez; José C. Segura; M. Carmen Benítez; Ángel de la Torre; Antonio J. Rubio
Abstract Currently, there are technology barriers inhibiting speech processing systems working under extreme noisy conditions. The emerging applications of speech technology, especially in the fields of wireless communications, digital hearing aids or speech recognition, are examples of such systems and often require a noise reduction technique operating in combination with a precise voice activity detector (VAD). This paper presents a new VAD algorithm for improving speech detection robustness in noisy environments and the performance of speech recognition systems. The algorithm measures the long-term spectral divergence (LTSD) between speech and noise and formulates the speech/non-speech decision rule by comparing the long-term spectral envelope to the average noise spectrum, thus yielding a high discriminating decision rule and minimizing the average number of decision errors. The decision threshold is adapted to the measured noise energy while a controlled hang-over is activated only when the observed signal-to-noise ratio is low. It is shown by conducting an analysis of the speech/non-speech LTSD distributions that using long-term information about speech signals is beneficial for VAD. The proposed algorithm is compared to the most commonly used VADs in the field, in terms of speech/non-speech discrimination and in terms of recognition performance when the VAD is used for an automatic speech recognition system. Experimental results demonstrate a sustained advantage over standard VADs such as G.729 and adaptive multi-rate (AMR) which were used as a reference, and over the VADs of the advanced front-end for distributed speech recognition.
international conference on acoustics, speech, and signal processing | 2002
Ángel de la Torre; José C. Segura; M. Carmen Benítez; Antonio M. Peinado; Antonio J. Rubio
The noise usually produces a non-linear distortion of the feature space considered for Automatic Speech Recognition. This distortion causes a mismatch between the training and recognition conditions which significantly degrades the performance of speech recognizers. In this contribution we analyze the effect of the additive noise over cepstral based representations and we compare several approaches to compensate this effect. We discuss the importance of the non-linearities introduced by the noise and we propose a method (based on the histogram equalization technique) specifically oriented to the compensation of the non-linear transformation caused by the additive noise. The proposed method has been evaluated using the AURORA-2 database and task. The recognition results show significant improvements with respect to other compensation methods reported in the bibliography and reveals the importance of the non-linear effects of the noise and the utility of the proposed method.
Sensors | 2013
Carlos Medina; José C. Segura; Ángel de la Torre
This paper describes the TELIAMADE system, a new indoor positioning system based on time-of-flight (TOF) of ultrasonic signal to estimate the distance between a receiver node and a transmitter node. TELIAMADE system consists of a set of wireless nodes equipped with a radio module for communication and a module for the transmission and reception of ultrasound. The access to the ultrasonic channel is managed by applying a synchronization algorithm based on a time-division multiplexing (TDMA) scheme. The ultrasonic signal is transmitted using a carrier frequency of 40 kHz and the TOF measurement is estimated by applying a quadrature detector to the signal obtained at the A/D converter output. Low sampling frequencies of 17.78 kHz or even 12.31 kHz are possible using quadrature sampling in order to optimize memory requirements and to reduce the computational cost in signal processing. The distance is calculated from the TOF taking into account the speed of sound. An excellent accuracy in the estimation of the TOF is achieved using parabolic interpolation to detect of maximum of the signal envelope at the matched filter output. The signal phase information is also used for enhancing the TOF measurement accuracy. Experimental results show a root mean square error (rmse) less than 2 mm and a standard deviation less than 0.3 mm for pseudorange measurements in the range of distances between 2 and 6 m. The system location accuracy is also evaluated by applying multilateration. A sub-centimeter location accuracy is achieved with an average rmse of 9.6 mm.
Speech Communication | 2003
Antonio M. Peinado; Victoria E. Sánchez; José L. Pérez-Córdoba; Ángel de la Torre
The emergence of distributed speech recognition has generated the need to mitigate the degradations that the transmission channel introduces in the speech features used for recognition. This work proposes a hidden Markov model (HMM) framework from which different mitigation techniques oriented to wireless channels can be derived. First, we study the performance of two techniques based on the use of a minimum mean square error (MMSE) esti- mation, a raw MMSE and a forward MMSE estimation, over additive white Gaussian noise (AWGN) channels. These techniques are also adapted to bursty channels. Then, we propose two new mitigation methods specially suitable for bursty channels. The first one is based on a forward-backward MMSE estimation and the second one on the well- known Viterbi algorithm. Different experiments are carried out, dealing with several issues such as the application of hard decisions on the received bits or the influence of the estimated channel SNR. The experimental results show that the HMM-based techniques can effectively mitigate channel errors, even in very poor channel conditions. 2003 Elsevier B.V. All rights reserved.
Speech Communication | 1996
Ángel de la Torre; Antonio M. Peinado; Antonio J. Rubio; Victoria E. Sánchez; Jesús E. Díaz
The use of signal transformations is a necessary step for feature extraction in pattern recognition systems. These transformations should take into account the main goal of pattern recognition: the error-rate minimization. In this paper we propose a new method to obtain feature space transformations based on the Minimum Classification Error criterion. The goal of these transformations is to obtain a new representation space where the Euclidean distance is optimal for classification. The proposed method is tested on a speech recognition system using different types of Hidden Markov Models. The comparison with standard pre-processing techniques shows that our method provides an error-rate reduction in all the performed experiments.
Ear and Hearing | 2010
Isaac Alvarez; Ángel de la Torre; Manuel Sainz; Cristina Roldán; Hansjoerg Schoesser; Philipp Spitzer
Objective: In this study, we analyze how electrically evoked compound action potential (ECAP) responses can be used to assess whether electrodes should be activated in the map and to estimate C levels in the Med-El Tempo+ Cochlear Implant Speech Processor. Design: ECAP thresholds were measured using the ECAP Recording System of the Pulsar CI100 implant. Twenty-one postlingually and 28 prelingually deafened patients participated in this study. The relationship between ECAP responses and the activation of electrodes was analyzed. Because an error in the estimation of T levels (behavioral thresholds) has less effect on hearing quality than an error in the estimation of C levels in the Tempo+ cochlear implant speech processor (maximum comfort levels), correlation and regression analyses were performed between ECAP thresholds and C levels. Results: The observation of an evoked potential generally implied that the electrode was activated because only 3.5% of electrodes that yielded measurable evoked responses were deactivated, because of collateral stimulations or an unpleasant hearing sensation. In contrast, the absence of an evoked potential did not imply that an electrode should be deactivated, because 20% of these electrodes provided a useful auditory sensation. ECAP responses did not predict the absolute behavioral comfort levels because of the excessive error between behavioral C levels and those derived from ECAP thresholds (the mean relative error is 43.78%). However, by applying a normalization procedure, ECAP measurements allowed the C-level profile to be predicted with a mean relative error of 6%; that is, they provided useful data to determine the C level of each electrode relative to the average C level of the patient. Conclusions: ECAP is a reliable and an useful objective measurement that can assist in the fitting of the Tempo+ cochlear implant speech processor. From results presented in this work, a protocol is proposed for fitting this cochlear implant system. This protocol facilitates appropriate cochlear implant fitting, particularly for children or uncooperative patients.
Clinical and Experimental Otorhinolaryngology | 2012
Jose Luis Vargas; Manuel Sainz; Cristina Roldán; Isaac Alvarez; Ángel de la Torre
Objectives The stimulation levels programmed in cochlear implant systems are affected by an evolution since the first switch-on of the processor. This study was designed to evaluate the changes in stimulation levels over time and the relationship between post-implantation physiological changes and with the hearing experience provided by the continuous use of the cochlear implant. Methods Sixty-two patients, ranging in age from 4 to 68 years at the moment of implantation participated in this study. All subjects were implanted with the 12 channels COMBI 40+ cochlear implant at San Cecilio University Hospital, Granada, Spain. Hearing loss etiology and progression characteristics varied across subjects. Results The analyzed programming maps show that the stimulation levels suffer a fast evolution during the first weeks after the first switch-on of the processor. Then, the evolution becomes slower and the programming parameters tend to be stable at about 6 months after the first switch-on. The evolution of the stimulation levels implies an increment of the electrical dynamic range, which is increased from 15.4 to 20.7 dB and improves the intensity resolution. A significant increment of the sensitivity to acoustic stimuli is also observed. For some patients, we have also observed transitory changes in the electrode impedances associated to secretory otitis media, which cause important changes in the programming maps. Conclusion We have studied the long-term evolution of the stimulation levels in cochlear implant patients. Our results show the importance of systematic measurements of the electrode impedances before the revision of the programming map. This report also highlights that the evolution of the programming maps is an important factor to be considered in order to determine an adequate calendar fitting of the cochlear implant processor.
Journal of Neuroscience Methods | 2007
Isaac Alvarez; Ángel de la Torre; Manuel Sainz; Cristina Roldán; Hansjoerg Schoesser; Philipp Spitzer
Stimulus artifact is one of the main limitations when considering electrically evoked compound action potential for clinical applications. Alternating stimulation (average of recordings obtained with anodic-cathodic and cathodic-anodic bipolar stimulation pulses) is an effective method to reduce stimulus artifact when evoked potentials are recorded. In this paper we extend the concept of alternating stimulation by combining anodic-cathodic and cathodic-anodic recordings with a weight in general different to 0.5. We also provide an automatic method to obtain an estimation of the optimal weights. Comparison with conventional alternating, triphasic stimulation and masker-probe paradigm shows that the generalized alternating method improves the quality of electrically evoked compound action potential responses.
IEEE Geoscience and Remote Sensing Letters | 2012
Isaac Alvarez; Luz García; Guillermo Cortés; Carmen Benítez; Ángel de la Torre
Feature extraction is a critical element in automatic pattern classification. In this letter, we propose different sets of parameters for classification of volcano-seismic signals, and the discriminative feature selection (DFS) method is applied for selecting the minimum number of features containing most of the discriminative information. We have applied DFS to a conventional cepstral-based parameterization (with 39 features) and to an extended set of parameters (including 84 features). Classification experiments using seismograms recorded at Colima Volcano (Mexico) show that, for the most complex classifier and using the cepstral-based parameterization, DFS provided a reduction of the error rate from 24.3% (using 39 features) to 15.5% (ten components). When DFS is applied to the extended parameterization, the error rate decreased from 27.9% (84 features) to 13.8% (14 features). These results show the utility of DFS for identifying the best components from the original feature vector and for exploring new parameterizations for the classification of volcano-seismic signals.
Speech Communication | 2002
Ángel de la Torre; Antonio M. Peinado; Antonio J. Rubio; José C. Segura; M. Carmen Benítez
The Discriminative Feature Extraction (DFE) method provides an appropriate formalism for the design of the front-end feature extraction module in pattern classification systems. In the recent years, this formalism has been successfully applied to different speech recognition problems, like classification of vowels, classification of phonemes or isolated word recognition. The DFE formalism can be applied to weight the contribution of the components in the feature vector. This variant of DFE, that we call Discriminative Feature Weighting (DFW), improves the pattern classification systems by enhancing those components more relevant for the discrimination among the different classes. This paper is dedicated to the application of the DFW formalism to Continuous Speech Recognizers (CSR) based on Hidden Markov Models (HMMs). Two different types of HMM-based speech recognizers are considered: recognizers based on Discrete-HMMs (DHMMs) (for which the acoustic evaluation is based on an Euclidean distance measure) and Semi-Continuous-HMMs (SCHMMs) (for which the acoustic evaluation is performed making use of a mixture of multivariated Gaussians). We report how the components can be weighted and how the weights can be discriminatively trained and applied to the speech recognizers. We present recognition results for several continuous speech recognition tasks. The experimental results show the utility of DFW for HMM-based continuous speech recognizers.