Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where José L. Carmona is active.

Publication


Featured researches published by José L. Carmona.


international conference on acoustics, speech, and signal processing | 2011

A pitch based noise estimation technique for robust speech recognition with Missing Data

Juan Andres Morales-Cordovilla; Ning Ma; Victoria E. Sánchez; José L. Carmona; Antonio M. Peinado; Jon Barker

This paper presents a noise estimation technique based on knowledge of pitch information for robust speech recognition. In the first stage the noise is estimated by means of extrapolating the noise from frames where speech is believed to be absent. These frames are detected with a proposed pitch based VAD (Voice Activity Detector). In the second stage the noise estimation is revised in voiced frames using harmonic tunnelling thechnique. The tunnelling noise estimation is used at high SNRs as an upper bound of the noise rather than a suitable estimation. A spectrogram MD (Missing Data) recognition system is chosen to evaluate the proposed noise estimation. The proposed system is compared in Aurora-2 with other similar techniques like cepstral SS (Spectral Subtraction).


IEEE Transactions on Audio, Speech, and Language Processing | 2010

A Multipulse-Based Forward Error Correction Technique for Robust CELP-Coded Speech Transmission Over Erasure Channels

Angel M. Gomez; José L. Carmona; Antonio M. Peinado; Victoria E. Sánchez

The widely used code-excited linear prediction (CELP) paradigm relies on a strong interframe dependency which renders CELP-based codecs vulnerable to packet loss. The use of long-term prediction (LTP) or adaptive codebooks (ACB) is the main source of interframe dependency in these codecs, since they employ the excitation from previous frames. After a frame erasure, previous excitation is unavailable and a desynchronization between the encoder and the decoder appears, causing an additional distortion which is propagated to the subsequent frames. In this paper, we propose a novel media-specific Forward Error Correction (FEC) technique which retrieves LTP-resynchronization with no additional delay at the cost of a very small bit of overhead. In particular, the proposed FEC code contains a multipulse signal which replaces the excitation of the previous frame (i.e., ACB memory) when this has been lost. This multipulse description of the previous excitation is optimized to minimize the perceptual error between the synthesized speech signal and the original one. To this end, we develop a multipulse formulation which includes the additional CELP processing and, in addition, can cope with the presence of advanced LTP filters and the usual subframe segmentation applied in modern codecs. Finally, a quantization scheme is proposed to encode pulse parameters. Objective and subjective quality tests applied to our proposal show that the propagation error due to LTP filter can practically be removed with a very little bandwidth increase.


international conference on acoustics, speech, and signal processing | 2011

Robust parametrization for non-destructive evaluation of composites using ultrasonic signals

Nicolas Bochud; Angel M. Gomez; Guillermo Rus; José L. Carmona; Antonio M. Peinado

Anticipating and characterizing damages in layered carbon fiber-reinforced polymers is a challenging problem. Non-destructive evaluation using ultrasonic signals is a well-established method to obtain physically relevant parameters to characterize damages in isotropic homogeneous materials. However, ultrasonic signals obtained from composites require special care in signal interpretation due to their structural complexity. In this paper, some enhancements on the interpretation are done by adapting classical parametrization techniques to extract relevant features from the ultrasonic signals. Thus, a cepstral-based feature extractor is firstly designed and optimized by using a classification system based on cepstral distances. Then, this feature extractor is applied in an analysis-by-synthesis scheme which, by using a numerical model of the specimen, infers the values of the damage parameters.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Efficient MMSE Estimation and Uncertainty Processing for Multienvironment Robust Speech Recognition

José A. González; Antonio M. Peinado; Angel M. Gomez; José L. Carmona

This paper presents a feature compensation framework based on minimum mean square error (MMSE) estimation and stereo training data for robust speech recognition. In our proposal, we model the clean and noisy feature spaces in order to obtain clean feature estimates. However, unlike other well-known MMSE compensation methods such as SPLICE or MEMLIN, which model those spaces with Gaussian mixture models (GMMs), in our case every feature space is characterized by a set of prototype vectors which can be alternatively considered as a vector quantization (VQ) codebook. The discrete nature of this feature space characterization introduces two significative advantages. First, it allows the implementation of a very efficient MMSE estimator in terms of accuracy and computational cost. On the other hand, time correlations can be exploited by means of hidden Markov modeling (HMM). In addition, a novel subregion-based modeling is applied in order to accurately represent the transformation between the clean and noisy domains. In order to deal with unknown environments, a multiple-model approach is also explored. Since this approach has been shown quite sensitive to incorrect environment classification, we adapt two uncertainty processing techniques, soft-data decoding and exponential weighting, to our estimation framework. As a result, environment miss-classifications are concealed, allowing a better performance under unknown environments. The experimental results on noisy digit recognition show a relative improvement of 87.93% in word accuracy regarding the baseline when clean acoustic models are used, while a 4.54% is achieved with multi-style trained models.


IEEE Transactions on Multimedia | 2011

One-Pulse FEC Coding for Robust CELP-Coded Speech Transmission Over Erasure Channels

Angel M. Gomez; José L. Carmona; José A. González; Victoria E. Sánchez

In this paper, we present an improved quantization scheme for the redundancy data of a forward error correction (FEC) technique proposed for the transmission of code-excited linear prediction (CELP)-coded speech over erasure channels. The use of a FEC-based error protection scheme is motivated by the well-known fact that, after a frame erasure, the previous excitation is not available and a desynchronization between the encoder and the decoder long-term prediction (LTP) filters appears, causing an additional distortion which is propagated to subsequent frames. LTP synchronization can be recovered by means of a single-pulse representation of the previous excitation. No additional delay is introduced by this technique which only requires a small transmission bandwidth increase. In this paper, we focus on the efficient encoding of this pulse. Thus, an optimization procedure, which takes into account the overall synthesis error, is proposed in order to provide better pulse-position and pulse-amplitude quantization codebooks. Moreover, by extending the previous procedure, an efficient joint position-amplitude quantization can be obtained. Objective quality tests applied to our proposal show that, by means of the proposed codebooks, the number of bits required to represent the resynchronization pulse is effectively reduced. In addition, a discontinuous transmission mechanism is derived from the cost functional used during joint position-amplitude quantization, further reducing the bit-rate.


international conference on acoustics, speech, and signal processing | 2008

A scalable coding scheme based on interframe dependency limitation

José L. Carmona; José L. Pérez-Córdoba; Antonio M. Peinado; Angel M. Gomez; José A. González

While VoIP (voice over IP) is gaining importance in comparison with other types of telephony, packet loss remains as the main source of degradation in VoIP systems. Traditional speech codecs, such as those based on the CELP (code excited linear prediction) paradigm, can achieve low bit-rates at the cost of introducing interframe dependencies. As a result, the effect of a packet loss burst is propagated to the frames correctly received after the burst. iLBC (internet low bit-rate codec) alleviates this problem by removing the interframe dependencies at the cost of a higher bit-rate. In this paper we propose a combination of iLBC with an ACELP (algebraic CELP) codec in which a variable number of ACELP-coded frames is inserted between every two iLBC-coded frames. The experimental results show that the combined codec can achieve a performance close to that of iLBC at different loss conditions but with a smaller bit-rate. Also, scalability is achieved by modifying the number of inserted ACELP-coded frames.


international conference on acoustics, speech, and signal processing | 2012

Model-based cepstral analysis for ultrasonic non-destructive evaluation of composites

Borja Fuentes; José L. Carmona; Nicolas Bochud; Angel M. Gomez; Antonio M. Peinado

The use of model-based cepstral features has been shown as an effective characterization of damaged materials tested with ultrasonic non-destructive evaluation (NDE) techniques. In this work, we focus our study on carbon-fiber reinforced polymer plates and show that the use of signal models with physical meaning can provide a cepstral representation with a high discriminative power. First, we introduce a complete digital signal model based on a physical analysis of wave propagation inside the plate. The resulting model has several drawbacks: a high number of parameters to estimate and the difficulty of expressing it as a classical rational transfer function, which does not allow a model parameter estimation through classical least-squares signal modeling techniques. In order to overcome these problems, we propose two simplifications of the physical model also based on a mechanical analysis of the system. We carry out a set of damage recognition experiments showing that cepstra extracted from these models are more discriminative than other previously used methods such as the LPC cepstrum (all-pole model) or a simple FFT cepstrum.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

MMSE-Based Packet Loss Concealment for CELP-Coded Speech Recognition

José L. Carmona; Antonio M. Peinado; José L. Pérez-Córdoba; Angel M. Gomez

In this paper, we analyze the performance of network speech recognition (NSR) over IP networks, adapting and proposing new solutions to the packet loss problem for code excited linear prediction (CELP) codecs. NSR has a client-server architecture which places the recognizer at the server side using a standard speech codec for speech transmission. Its main advantage is that no changes are required for the existing client devices and networks. However, the use of speech codecs degrades its performance, mainly in the presence of packet losses. First, we study the degradations introduced by CELP codecs in lossy packet networks. Later, we propose a reconstruction technique based on minimum mean square error (MMSE) estimation using hidden Markov models. This approach also allows us to obtain reliability measures associated to each estimate. We show how to use this information to improve the recognition performance by means of soft-data decoding and weighted Viterbi algorithm. The experimental results are obtained for two well-known CELP codecs, G.729 and AMR 12.2 kbps, carrying out recognition from decoded speech. Finally, we analyze an efficient and improved implementation of the proposed techniques using an NSR system which extracts speech recognition features directly from the bit-stream parameters. The experimental results show that the different proposed NSR systems achieve a comparable performance to distributed speech recognition (DSR).


IEEE Signal Processing Letters | 2013

Speech Spectral Envelope Enhancement by HMM-Based Analysis/Resynthesis

José L. Carmona; Jon Barker; Angel M. Gomez; Ning Ma

We propose a speech enhancement-by-resynthesis framework whose strength lies in a common statistical speech model that is shared by the analysis and synthesis stages. First, a spectro-temporal analysis is performed and masked spectro-temporal regions are identified using a noise model. Then, HMM synthesis is used to reconstruct the spectral envelope in masked regions in a manner which is conditioned on the reliable regions, preventing the resynthesis from regressing to the training data mean. As a demonstration we enhance noise-corrupted speech utterances from a small vocabulary corpus for which good statistical models are available. Perceptual evaluation of speech quality and log spectral distances demonstrate considerable performance improvements over baseline approaches that do not exploit strong speech knowledge. The letter is accompanied by audio examples.


Speech Communication | 2009

A robust scheme for distributed speech recognition over loss-prone packet channels

Angel M. Gomez; Antonio M. Peinado; Victoria E. Sánchez; José L. Carmona

In this paper, we propose a whole recovery scheme designed to improve robustness against packet losses in distributed speech recognition systems. This scheme integrates two sender-driven techniques, namely, media-specific forward error correction (FEC) and frame interleaving, along with a receiver-based error concealment (EC) technique, the weighted Viterbi algorithm (WVA). Although these techniques have been already tested separately, providing a significant increase of performance in clean acoustic environments, in this paper they are jointly applied and their performance in adverse acoustic conditions is evaluated. In particular, a noisy speech database and the ETSI Advanced Front-end are used, while the dynamic features, which play an important role in adverse acoustic environments, and their confidences for the WVA algorithm are examined. In order to solve the issue of mixing two sender-driven techniques (both causing a delay) whose direct composition causes an increase of the global latency, we propose a double stream scheme which limits the latency to the maximum delay of both techniques. As a result, with very few overhead bits and a very limited delay, the integrated scheme achieves a significant improvement in the performance of a DSR system over a degraded transmission channel, both in clean and noisy acoustic conditions.

Collaboration


Dive into the José L. Carmona's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jon Barker

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar

Ning Ma

University of Sheffield

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge