Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hugo Leonardo Rufiner is active.

Publication


Featured researches published by Hugo Leonardo Rufiner.


Computer Speech & Language | 2011

Spoken emotion recognition using hierarchical classifiers

Enrique M. Albornoz; Diego H. Milone; Hugo Leonardo Rufiner

The recognition of the emotional state of speakers is a multi-disciplinary research area that has received great interest over the last years. One of the most important goals is to improve the voice-based human-machine interactions. Several works on this domain use the prosodic features or the spectrum characteristics of speech signal, with neural networks, Gaussian mixtures and other standard classifiers. Usually, there is no acoustic interpretation of types of errors in the results. In this paper, the spectral characteristics of emotional signals are used in order to group emotions based on acoustic rather than psychological considerations. Standard classifiers based on Gaussian Mixture Models, Hidden Markov Models and Multilayer Perceptron are tested. These classifiers have been evaluated with different configurations and input features, in order to design a new hierarchical method for emotion classification. The proposed multiple feature hierarchical method for seven emotions, based on spectral and prosodic information, improves the performance over the standard classifiers and the fixed features.


Signal Processing | 2008

Fast communication: Perceptual evaluation of blind source separation for robust speech recognition

Leandro E. Di Persia; Diego H. Milone; Hugo Leonardo Rufiner; Masuzo Yanagida

In a previous article, an evaluation of several objective quality measures as predictors of recognition rate after the application of a blind source separation algorithm was reported. In this work, the experiments were repeated using some new measures, based on the perceptual evaluation of speech quality (PESQ), which is part of the ITU P862 standard for evaluation of communication systems. The raw PESQ and a nonlinearly transformed PESQ were evaluated, together with several composite measures. The results show that the PESQ-based measures outperformed all the measures reported in the previous work. Based on these results, we recommend the use of PESQ-based measures to evaluate blind source separation algorithms for automatic speech recognition.


Expert Systems With Applications | 2013

Genetic wavelet packets for speech recognition

Leandro Daniel Vignolo; Diego H. Milone; Hugo Leonardo Rufiner

Highlights? A set of features based on wavelet packets was optimized for speech recognition. ? A wrapper for feature selection was designed by means of a genetic algorithm. ? A non-orthogonal representation was obtained, which allowed to increase classification performance. ? The optimized features improved the classification results in noise conditions. The most widely used speech representation is based on the mel-frequency cepstral coefficients, which incorporates biologically inspired characteristics into artificial recognizers. However, the recognition performance with these features can still be enhanced, specially in adverse conditions. Recent advances have been made with the introduction of wavelet based representations for different kinds of signals, which have shown to improve the classification performance. However, the problem of finding an adequate wavelet based representation for a particular problem is still an important challenge. In this work we propose a genetic algorithm to evolve a speech representation, based on a non-orthogonal wavelet decomposition, for phoneme classification. The results, obtained for a set of spanish phonemes, show that the proposed genetic algorithm is able to find a representation that improves speech recognition results. Moreover, the optimized representation was evaluated in noise conditions.


Signal Processing | 2007

Objective quality evaluation in blind source separation for speech recognition in a real room

Leandro E. Di Persia; Masuzo Yanagida; Hugo Leonardo Rufiner; Diego H. Milone

The determination of quality of the signals obtained by blind source separation is a very important subject for development and evaluation of such algorithms. When this approach is used as a pre-processing stage for automatic speech recognition, the quality measure of separation applied for assessment should be related to the recognition rates of the system. Many measures have been used for quality evaluation, but in general these have been applied without prior research of their capabilities as quality measures in the context of blind source separation, and often they require experimentation in unrealistic conditions. Moreover, these measures just try to evaluate the amount of separation, and this value could not be directly related to recognition rates. Presented in this work is a study of several objective quality measures evaluated as predictors of recognition rate of a continuous speech recognizer. Correlation between quality measures and recognition rates is analyzed for a separation algorithm applied to signals recorded in a real room with different reverberation times and different kinds and levels of noise. A very good correlation between weighted spectral slope measure and the recognition rate has been verified from the results of this analysis. Furthermore, a good performance of total relative distortion and cepstral measures for rooms with relatively long reverberation time has been observed.


EURASIP Journal on Advances in Signal Processing | 2011

Evolutionary splines for cepstral filterbank optimization in phoneme classification

Leandro Daniel Vignolo; Hugo Leonardo Rufiner; Diego H. Milone; John C. Goddard

Mel-frequency cepstral coefficients have long been the most widely used type of speech representation. They were introduced to incorporate biologically inspired characteristics into artificial speech recognizers. Recently, the introduction of new alternatives to the classic mel-scaled filterbank has led to improvements in the performance of phoneme recognition in adverse conditions. In this work we propose a new bioinspired approach for the optimization of the filterbanks, in order to find a robust speech representation. Our approach—which relies on evolutionary algorithms—reduces the number of parameters to optimize by using spline functions to shape the filterbanks. The success rates of a phoneme classifier based on hidden Markov models are used as the fitness measure, evaluated over the well-known TIMIT database. The results show that the proposed method is able to find optimized filterbanks for phoneme recognition, which significantly increases the robustness in adverse conditions.


COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony | 2009

Pathological voice analysis and classification based on empirical mode decomposition

Gastón Schlotthauer; María Eugenia Torres; Hugo Leonardo Rufiner

Empirical mode decomposition (EMD) is an algorithm for signal analysis recently introduced by Huang. It is a completely data-driven non-linear method for the decomposition of a signal into AM - FM components. In this paper two new EMD-based methods for the analysis and classification of pathological voices are presented. They are applied to speech signals corresponding to real and simulated sustained vowels. We first introduce a method that allows the robust extraction of the fundamental frequency of sustained vowels. Its determination is crucial for pathological voice analysis and diagnosis. This new method is based on the ensemble empirical mode decomposition (EEMD) algorithm and its performance is compared with others from the state of the art. As a second EMD-based tool, we explore spectral properties of the intrinsic mode functions and apply them to the classification of normal and pathological sustained vowels. We show that just using a basic pattern classification algorithm, the selected spectral features of only three modes are enough to discriminate between normal and pathological voices.


Applied Soft Computing | 2011

Evolutionary cepstral coefficients

Leandro Daniel Vignolo; Hugo Leonardo Rufiner; Diego H. Milone; John C. Goddard

Evolutionary algorithms provide flexibility and robustness required to find satisfactory solutions in complex search spaces. This is why they are successfully applied for solving real engineering problems. In this work we propose an algorithm to evolve a robust speech representation, using a dynamic data selection method for reducing the computational cost of the fitness computation while improving the generalisation capabilities. The most commonly used speech representation are the mel-frequency cepstral coefficients, which incorporate biologically inspired characteristics into artificial recognizers. Recent advances have been made with the introduction of alternatives to the classic mel scaled filterbank, improving the phoneme recognition performance in adverse conditions. In order to find an optimal filterbank, filter parameters such as the central and side frequencies are optimised. A hidden Markov model is used as the classifier for the evaluation of the fitness for each individual. Experiments were conducted using real and synthetic phoneme databases, considering different additive noise levels. Classification results show that the method accomplishes the task of finding an optimised filterbank for phoneme recognition, which provides robustness in adverse conditions.


Biomedical Signal Processing and Control | 2009

Dimensionality reduction for visualization of normal and pathological speech data

John C. Goddard; Gastón Schlotthauer; María Eugenia Torres; Hugo Leonardo Rufiner

For an adequate analysis of pathological speech signals, a sizeable number of parameters is required, such as those related to jitter, shimmer and noise content. Often this kind of high-dimensional signal representation is difficult to understand, even for expert voice therapists and physicians. Data visualization of a high-dimensional dataset can provide a useful first step in its exploratory data analysis, facilitating an understanding about its underlying structure. In the present paper, eight dimensionality reduction techniques, both classical and recent, are compared on speech data containing normal and pathological speech. A qualitative analysis of their dimensionality reduction capabilities is presented. The transformed data are also quantitatively evaluated, using classifiers, and it is found that it may be advantageous to perform the classification process on the transformed data, rather than on the original. These qualitative and quantitative analyses allow us to conclude that a nonlinear, supervised method, called kernel local Fisher discriminant analysis is superior for dimensionality reduction in the actual context.


Computer Speech & Language | 2012

Bioinspired sparse spectro-temporal representation of speech for robust classification

César E. Martínez; John C. Goddard; Diego H. Milone; Hugo Leonardo Rufiner

In this work, a first approach to a robust phoneme recognition task by means of a biologically inspired feature extraction method is presented. The proposed technique provides an approximation to the speech signal representation at the auditory cortical level. It is based on an optimal dictionary of atoms, estimated from auditory spectrograms, and the Matching Pursuit algorithm to approximate the cortical activations. This provides a sparse coding with intrinsic noise robustness, which can be therefore exploited when using the system in adverse environments. The recognition task consisted in the classification of a set of 5 easily confused English phonemes, in both clean and noisy conditions. Multilayer perceptrons were trained as classifiers and the performance was compared to other classic and robust parameterizations: the auditory spectrogram, a probabilistic optimum filtering on Mel frequency cepstral coefficients and the perceptual linear prediction coefficients. Results showed a significant improvement in the recognition rate of clean and noisy phonemes by the cortical representation over these other parameterizations.


Advances in Adaptive Data Analysis | 2009

EMD OF GAUSSIAN WHITE NOISE: EFFECTS OF SIGNAL LENGTH AND SIFTING NUMBER ON THE STATISTICAL PROPERTIES OF INTRINSIC MODE FUNCTIONS

Gastón Schlotthauer; María Eugenia Torres; Hugo Leonardo Rufiner; Patrick Flandrin

This work presents a discussion on the probability density function of Intrinsic Mode Functions (IMFs) provided by the Empirical Mode Decomposition of Gaussian white noise, based on experimental simulations. The influence on the probability density functions of the data length and of the maximum allowed number of iterations is analyzed by means of kernel smoothing density estimations. The obtained results are confirmed by statistical normality tests indicating that the IMFs have non-Gaussian distributions. Our study also indicates that large data length and high number of iterations produce multimodal distributions in all modes.

Collaboration


Dive into the Hugo Leonardo Rufiner's collaboration.

Top Co-Authors

Avatar

Diego H. Milone

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

María Eugenia Torres

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

John C. Goddard

Universidad Autónoma Metropolitana

View shared research outputs
Top Co-Authors

Avatar

Julio Galli

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

Gastón Schlotthauer

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

Ruben D. Spies

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

Enrique M. Albornoz

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

Leandro Daniel Vignolo

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

L. Giovanini

National Scientific and Technical Research Council

View shared research outputs
Top Co-Authors

Avatar

L.E. Di Persia

National Scientific and Technical Research Council

View shared research outputs
Researchain Logo
Decentralizing Knowledge