Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Angel M. Gomez is active.

Publication


Featured researches published by Angel M. Gomez.


IEEE Transactions on Wireless Communications | 2006

Recognition of coded speech transmitted over wireless channels

Angel M. Gomez; Antonio M. Peinado; Victoria E. Sánchez; Antonio J. Rubio

Network-based speech recognition (NSR) and distributed speech recognition (DSR) have been proposed as solutions to translate speech recognition technologies to mobile environments. NSR is the most straightforward solution since it does not require any modification in the mobile phone, however DSR offers higher robustness against codec compression and transmission channel degradation. This paper explores an alternative approach for remote speech recognition which combines the advantages of NSR and DSR. In this scheme, a standard speech codec is used for speech transmission but the recognition is performed from the received codec parameters. In particular, we focus on the effect of transmission channel errors, which can cause a more severe performance reduction on speech recognition than codec distortion. First, we show that an NSR solution can approach DSR through a reconstruction technique along with an adapted noise reduction technique originally proposed for acoustic noise. Then, these results are improved by working with recognition features directly extracted from the codec bitstream by means of parameter transcoding. Required modifications on current networks in order to access the bitstream are described. The network upgrading with the tandem free operation (TFO) protocol is an attractive solution. This upgrade not only offers an overall improvement on the end-to-end speech quality, but would also allow a recognition performance similar, and even higher in poor channel conditions, to that obtained by DSR when parameter transcoding along with the proposed mitigation techniques are applied


IEEE Transactions on Audio, Speech, and Language Processing | 2013

MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

José A. González; Antonio M. Peinado; Ning Ma; Angel M. Gomez; Jon Barker

This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.


IEEE Transactions on Multimedia | 2006

Combining Media-Specific FEC and Error Concealment for Robust Distributed Speech Recognition Over Loss-Prone Packet Channels

Angel M. Gomez; Antonio M. Peinado; Victoria E. Sánchez; Antonio J. Rubio

This paper presents a mixed recovery scheme for robust distributed speech recognition (DSR) implemented over a packet channel which suffers packet losses. The scheme combines media-specific forward error correction (FEC) and error concealment (EC). Media-specific FEC is applied at the client side, where FEC bits representing strongly quantized versions of the speech vectors are introduced. At the server side, the information provided by those FEC bits is used by the EC algorithm to improve the recognition performance. We investigate the adaptation of two different EC techniques, namely minimum mean square error (MMSE) estimation, which operates at the decoding stage, and weighted Viterbi recognition (WVR), where EC is applied at the recognition stage, in order to be used along with FEC. The experimental results show that a significant increase in recognition accuracy can be obtained with very little bandwidth increase, which may be null in practice, and a limited increase in latency, which in any case is not so critical for an application such as DSR


international conference on acoustics, speech, and signal processing | 2005

Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition

Antonio M. Peinado; Angel M. Gomez; Victoria E. Sánchez; José L. Pérez-Córdoba; Antonio J. Rubio

This paper proposes a new packet loss concealment technique based on the inclusion in each packet of a few FEC bits, representing data replicas, combined with a minimum mean square error estimation (MMSE). This technique is developed for an Aurora-2 distributed speech recognition system working over an IP network. In addition to the data representing the transmitted speech frames, each packet includes some FEC bits representing a strongly VQ-quantized version (replicas) of previous and subsequent frames. When a loss burst occurs, the lost frames can be reconstructed from the VQ replicas. In order to mitigate the degradation introduced by the coarse VQ quantization of the replicas, a model-based MMSE estimation is applied. The experimental results show that, under a strongly degraded channel, it is possible to obtain up to 83.31 % of word accuracy with only 4 FEC bits or 88.47 % with 8 FEC bits per packet, when the Aurora mitigation algorithm only obtains 76.98 %.


Speech Communication | 2012

Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio

Angel M. Gomez; Belinda Marie Schwerin; Kuldip Kumar Paliwal

In this paper we propose a novel objective method for intelligibility prediction of enhanced speech which is based on the negative distortion ratio (NDR) - that is, the amount of power spectra that has been removed in comparison to the original clean speech signal, likely due to a bad noise estimate during the speech enhancement procedure. While negative spectral distortions can have a significant importance in subjective intelligibility assessment of processed speech, most of the objective measures in the literature do not well account for this type of distortion. The proposed method focuses on a very specific type of noise, so it is not intended to be used alone but in combination with other techniques, to jointly achieve a better intelligibility prediction. In order to find an appropriate technique to be combined with, in this paper we also review a number of recently proposed methods based on correlation and coherence measures. These methods have already shown a high correlation with human recognition scores, as they effectively detect the presence of nonlinearities, frequently found in noise-suppressed speech. However, when these techniques are jointly applied with the proposed method, significantly higher correlations (above r=0.9) are shown to be achieved.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

A Multipulse-Based Forward Error Correction Technique for Robust CELP-Coded Speech Transmission Over Erasure Channels

Angel M. Gomez; José L. Carmona; Antonio M. Peinado; Victoria E. Sánchez

The widely used code-excited linear prediction (CELP) paradigm relies on a strong interframe dependency which renders CELP-based codecs vulnerable to packet loss. The use of long-term prediction (LTP) or adaptive codebooks (ACB) is the main source of interframe dependency in these codecs, since they employ the excitation from previous frames. After a frame erasure, previous excitation is unavailable and a desynchronization between the encoder and the decoder appears, causing an additional distortion which is propagated to the subsequent frames. In this paper, we propose a novel media-specific Forward Error Correction (FEC) technique which retrieves LTP-resynchronization with no additional delay at the cost of a very small bit of overhead. In particular, the proposed FEC code contains a multipulse signal which replaces the excitation of the previous frame (i.e., ACB memory) when this has been lost. This multipulse description of the previous excitation is optimized to minimize the perceptual error between the synthesized speech signal and the original one. To this end, we develop a multipulse formulation which includes the additional CELP processing and, in addition, can cope with the presence of advanced LTP filters and the usual subframe segmentation applied in modern codecs. Finally, a quantization scheme is proposed to encode pulse parameters. Objective and subjective quality tests applied to our proposal show that the propagation error due to LTP filter can practically be removed with a very little bandwidth increase.


international conference on acoustics, speech, and signal processing | 2011

Robust parametrization for non-destructive evaluation of composites using ultrasonic signals

Nicolas Bochud; Angel M. Gomez; Guillermo Rus; José L. Carmona; Antonio M. Peinado

Anticipating and characterizing damages in layered carbon fiber-reinforced polymers is a challenging problem. Non-destructive evaluation using ultrasonic signals is a well-established method to obtain physically relevant parameters to characterize damages in isotropic homogeneous materials. However, ultrasonic signals obtained from composites require special care in signal interpretation due to their structural complexity. In this paper, some enhancements on the interpretation are done by adapting classical parametrization techniques to extract relevant features from the ultrasonic signals. Thus, a cepstral-based feature extractor is firstly designed and optimized by using a classification system based on cepstral distances. Then, this feature extractor is applied in an analysis-by-synthesis scheme which, by using a numerical model of the specimen, infers the values of the damage parameters.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

José A. González; Antonio M. Peinado; Angel M. Gomez; José L. Carmona

This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.


IEEE Transactions on Multimedia | 2011

One-Pulse FEC Coding for Robust CELP-Coded Speech Transmission Over Erasure Channels

Angel M. Gomez; José L. Carmona; José A. González; Victoria E. Sánchez

In this paper, we present an improved quantization scheme for the redundancy data of a forward error correction (FEC) technique proposed for the transmission of code-excited linear prediction (CELP)-coded speech over erasure channels. The use of a FEC-based error protection scheme is motivated by the well-known fact that, after a frame erasure, the previous excitation is not available and a desynchronization between the encoder and the decoder long-term prediction (LTP) filters appears, causing an additional distortion which is propagated to subsequent frames. LTP synchronization can be recovered by means of a single-pulse representation of the previous excitation. No additional delay is introduced by this technique which only requires a small transmission bandwidth increase. In this paper, we focus on the efficient encoding of this pulse. Thus, an optimization procedure, which takes into account the overall synthesis error, is proposed in order to provide better pulse-position and pulse-amplitude quantization codebooks. Moreover, by extending the previous procedure, an efficient joint position-amplitude quantization can be obtained. Objective quality tests applied to our proposal show that, by means of the proposed codebooks, the number of bits required to represent the resynchronization pulse is effectively reduced. In addition, a discontinuous transmission mechanism is derived from the cost functional used during joint position-amplitude quantization, further reducing the bit-rate.


international conference on acoustics, speech, and signal processing | 2012

Combining missing-data reconstruction and uncertainty decoding for robust speech recognition

José A. González; Antonio M. Peinado; Angel M. Gomez; Ning Ma; Jon Barker

This paper proposes a novel approach for noise-robust speech recognition which combines a missing-data (MD) derived spectral reconstruction technique and uncertainty decoding based on the weighted Viterbi algorithm (WVA). First, the noisy feature vectors are compensated by using a novel MD imputation technique based on the integration of truncated Gaussian pdfs. Although the proposed MD estimator has both the advantages of MD techniques and the use of cepstral features, it may still be affected by a number of uncertainty sources. In order to deal with these uncertainties, WVA-based uncertainty decoding is proposed. Our experiments on the Aurora-2 and Aurora-4 tasks show that the proposed MD estimator outperforms other MD imputation techniques. Also, we show that the combination of MD imputation with WVA provides better results than the combination with other uncertainty processing techniques such as the use of evidence pdfs for the estimated features.

Collaboration


Dive into the Angel M. Gomez's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ning Ma

University of Sheffield

View shared research outputs
Researchain Logo
Decentralizing Knowledge