Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roberto Gemello is active.

Publication


Featured researches published by Roberto Gemello.


Speech Communication | 2007

Linear hidden transformations for adaptation of hybrid ANN/HMM models

Roberto Gemello; Franco Mana; Stefano Scanzio; Pietro Laface; Renato De Mori

This paper focuses on the adaptation of Automatic Speech Recognition systems using Hybrid models combining Artificial Neural Networks (ANN) with Hidden Markov Models (HMM). Most adaptation techniques for ANNs reported in the literature consist in adding a linear transformation network connected to the input of the ANN. This paper describes the application of linear transformations not only to the input features, but also to the outputs of the internal layers. The motivation is that the outputs of an internal layer represent discriminative features of the input pattern suitable for the classification performed at the output of the ANN. In order to reduce the effect due to the lack of adaptation samples for some phonetic units we propose a new solution, called Conservative Training. Supervised adaptation experiments with different corpora and for different types of adaptation are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.


international conference on acoustics, speech, and signal processing | 1997

Dialogos: a robust system for human-machine spoken dialogue on the telephone

Dario Albesano; Paolo Baggia; Morena Danieli; Roberto Gemello; Elisabetta Gerbino; Claudio Rullent

This paper presents Dialogos, a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions both to users which get good recognition performance and to the ones which get lower scores. The robust behavior of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows to deal with partial or total breakdowns of the different levels of analysis. We report the field trial data of the system and the evaluation results of the overall system and of the submodules.


Machine Learning | 1991

Rigel: An Inductive Learning System

Roberto Gemello; Franco Mana

This paper is aimed at showing the benefits obtained by explicitly introducing a priori control knowledge into the inductive process. The starting point is Michalskis Induce system, which has been modified and augmented. Although the basic philosophy has been changed as little as possible, Induce has been radically modified from the algorithmic point of view, resulting in the new learning system Rigel. The main ideas taken from Induce are the sequential learning of descriptions of each concept against all the others, the Covering algorithm, the Star definition, and the VL2 representation language. The modifications consist of a new way of computing the Star, the use of a separate body of heuristic knowledge to strongly direct the search, the implementation of a larger subset of the VL2 language, a reasoned way of selecting the seed, and the use of rules to evaluate the worthiness of the inductive assertions. The effectiveness of Rigel has been tested both on artificial and on real-world case studies.


international symposium on neural networks | 1997

Continuous speech recognition with neural networks and stationary-transitional acoustic units

Roberto Gemello; Dario Albesano; Franco Mana

This paper proposes the use of a kind of acoustic units named stationary-transitional units within a hybrid hidden Markov model/neural network recognition framework as an alternative to standard context-independent phonemes. These units are made up of stationary parts of the context independent phonemes plus all the admissible transitions between them and represent a partition of the sounds of the language, like phonemes, but with more acoustic detail. These units are very suitable to be modeled with neural networks and their use may enhance the performances of hybrid HMM-NN systems by increasing their acoustic resolution. This hypothesis is verified for the Italian language, experimenting these units on a difficult domain of spontaneous speech recognition, namely railway timetable vocal access with the Dialogos system. The results show that a relevant improvement is achieved with respects to the use of the standard context independent phonemes.


Computer Speech & Language | 2006

Multiple resolution analysis for robust automatic speech recognition

Roberto Gemello; Franco Mana; Dario Albesano; Renato De Mori

This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented. A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis. Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared.


International Journal of Speech Technology | 1997

A robust system for human-machine dialogue in telephony-based applications

Dario Albesano; Paolo Baggia; Morena Danieli; Roberto Gemello; Elisabetta Gerbino; Claudio Rullent

This paper presents a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions even for people with poor recognition performance. The robust behaviour of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows the system to deal with partial or total breakdowns at other levels of analysis. We report the field trial data of the system with respect to speech recognition metrics of word accuracy and sentence understanding rate, time-to-completion, time-to-acquisition of crucial parameters, and degree of success of the interactions in providing the speakers with the information they required. The evaluation data show that most of the subjects were able to interact fruitfully with the system. These results suggest that the design choices made to achieve robust behaviour are a promising way to create usable spoken language telephone systems.


IEEE Signal Processing Letters | 2006

Automatic speech recognition with a modified Ephraim-Malah rule

Roberto Gemello; Franco Mana; R. De Mori

A soft decision gain modification is introduced and applied to the Ephraim-Malah gain function based on maximum mean-square-error estimation after amplitude compression. Nonlinear evaluations of the noise overestimation factor and spectral floor are used in the same way for the proposed gain modification and for nonlinear spectral subtraction (NSS). Consistent and statistically significant automatic speech recognition improvements of the proposed approach with respect to NSS are observed for different noise conditions considered in the Aurora-2 and Aurora-3 corpora. As the nonlinearity affects the two approaches in the same way, the result of comparison is particularly interesting.


international conference on acoustics, speech, and signal processing | 2004

A modified Ephraim-Malah noise suppression rule for automatic speech recognition

Roberto Gemello; Franco Mana; R. De Mori

A soft decision gain modification is introduced and applied to the Ephraim-Malah gain function based on maximum mean square error estimation (MMSE) (Ephraim, Y. and Malah, D., IEEE Trans. Acoust. Speech Sig. Process., vol.ASSP-32, no.6, p.1109-21, 1984; vol.ASSP-33, no.2, p.443-5, 1985) after amplitude compression. Non-linear evaluations of the noise overestimation factor and spectral floor are used in the same way for the proposed gain modification and for non-linear spectral subtraction (NSS). Consistent and statistically significant ASR improvements of the proposed approach with respect to NSS are observed for different noise conditions considered in the AURORA2 and AURORA3 corpora. As the non-linearity affects the two approaches in the same way, the comparison result is particularly interesting.


international symposium on neural networks | 1999

Multi-source neural networks for speech recognition

Roberto Gemello; Dario Albesano; Franco Mana

In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set.


international symposium on neural networks | 1992

Word recognition with recurrent network automata

Dario Albesano; Roberto Gemello; Franco Mana

The authors report a method to directly encode temporal information into a neural network by explicitly modeling that information with a left-to-right automaton, and teaching a recurrent network to identify the automaton states. The state length and position are adjusted with the usual train and re-segment iterative procedure. The global model is a hybrid of a recurrent neural network which implements the state transition models, and dynamic programming, which finds the best state sequence. The advantages achieved by using recurrent networks are outlined by applying the method to a speaker-independent digit recognition task.<<ETX>>

Collaboration


Dive into the Roberto Gemello's collaboration.

Researchain Logo
Decentralizing Knowledge