Dario Albesano | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dario Albesano is active.

Explore More

Publication

Featured researches published by Dario Albesano.

international conference on acoustics, speech, and signal processing | 1997

Dialogos: a robust system for human-machine spoken dialogue on the telephone

Dario Albesano; Paolo Baggia; Morena Danieli; Roberto Gemello; Elisabetta Gerbino; Claudio Rullent

This paper presents Dialogos, a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions both to users which get good recognition performance and to the ones which get lower scores. The robust behavior of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows to deal with partial or total breakdowns of the different levels of analysis. We report the field trial data of the system and the evaluation results of the overall system and of the submodules.

international symposium on neural networks | 1997

Continuous speech recognition with neural networks and stationary-transitional acoustic units

Roberto Gemello; Dario Albesano; Franco Mana

This paper proposes the use of a kind of acoustic units named stationary-transitional units within a hybrid hidden Markov model/neural network recognition framework as an alternative to standard context-independent phonemes. These units are made up of stationary parts of the context independent phonemes plus all the admissible transitions between them and represent a partition of the sounds of the language, like phonemes, but with more acoustic detail. These units are very suitable to be modeled with neural networks and their use may enhance the performances of hybrid HMM-NN systems by increasing their acoustic resolution. This hypothesis is verified for the Italian language, experimenting these units on a difficult domain of spontaneous speech recognition, namely railway timetable vocal access with the Dialogos system. The results show that a relevant improvement is achieved with respects to the use of the standard context independent phonemes.

Computer Speech & Language | 2006

Multiple resolution analysis for robust automatic speech recognition

Roberto Gemello; Franco Mana; Dario Albesano; Renato De Mori

This paper investigates the potential of exploiting the redundancy implicit in multiple resolution analysis for automatic speech recognition systems. The analysis is performed by a binary tree of elements, each one of which is made by a half-band filter followed by a down sampler which discards odd samples. Filter design and feature computation from samples are discussed and recognition performance with different choices is presented. A paradigm consisting in redundant feature extraction, followed by feature normalization, followed by dimensionality reduction is proposed. Feature normalization is performed by denoising algorithms. Two of them are considered and evaluated, namely, signal-to-noise ratio-dependent spectral subtraction and soft thresholding. Dimensionality reduction is performed with principal component analysis. Experiments using telephone corpora and the Aurora3 corpus are reported. They indicate that the proposed paradigm leads to a recognition performance with clean speech, measured in word error rate, marginally superior to the one obtained with perceptual linear prediction coefficients. Nevertheless, performance of the proposed analysis paradigm is significantly superior when used with noisy data and the same denoising algorithm is applied to all the analysis methods, which are compared.

International Journal of Speech Technology | 1997

A robust system for human-machine dialogue in telephony-based applications

Dario Albesano; Paolo Baggia; Morena Danieli; Roberto Gemello; Elisabetta Gerbino; Claudio Rullent

This paper presents a real-time system for human-machine spoken dialogue on the telephone in task-oriented domains. The system has been tested in a large trial with inexperienced users and it has proved robust enough to allow spontaneous interactions even for people with poor recognition performance. The robust behaviour of the system has been achieved by combining the use of specific language models during the recognition phase of analysis, the tolerance toward spontaneous speech phenomena, the activity of a robust parser, and the use of pragmatic-based dialogue knowledge. This integration of the different modules allows the system to deal with partial or total breakdowns at other levels of analysis. We report the field trial data of the system with respect to speech recognition metrics of word accuracy and sentence understanding rate, time-to-completion, time-to-acquisition of crucial parameters, and degree of success of the interactions in providing the speakers with the information they required. The evaluation data show that most of the subjects were able to interact fruitfully with the system. These results suggest that the design choices made to achieve robust behaviour are a promising way to create usable spoken language telephone systems.

international symposium on neural networks | 1999

Multi-source neural networks for speech recognition

Roberto Gemello; Dario Albesano; Franco Mana

In speech recognition the most diffused technology (hidden Markov models) is constrained by the condition of stochastic independence of its input features. That limits the simultaneous use of features derived from the speech signal with different processing algorithms. On the contrary artificial neural networks (ANN) are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this characteristic of ANNs to improve the speech recognition accuracy through the combined use of input features coming from different sources (different feature extraction algorithms). We integrate two input sources: the Mel based cepstral coefficients (MFCC) derived from FFT and the RASTA-PLP cepstral coefficients. The results show that this integration leads to an error reduction of 26% on a telephone quality test set.

international symposium on neural networks | 1992

Word recognition with recurrent network automata

Dario Albesano; Roberto Gemello; Franco Mana

The authors report a method to directly encode temporal information into a neural network by explicitly modeling that information with a left-to-right automaton, and teaching a recurrent network to identify the automaton states. The state length and position are adjusted with the usual train and re-segment iterative procedure. The global model is a hybrid of a recurrent neural network which implements the state transition models, and dynamic programming, which finds the best state sequence. The advantages achieved by using recurrent networks are outlined by applying the method to a speaker-independent digit recognition task.<<ETX>>

Neural Networks for Signal Processing VI. Proceedings of the 1996 IEEE Signal Processing Society Workshop | 1996

Speeding up neural network execution: an application to speech recognition

Dario Albesano; F. Mana; R. Gemello

Many papers have addressed the problem of speeding up neural network execution, most of them trying to reduce network size by weight and neuron pruning, and others making use of special hardware. In this paper we propose a new, different method able to reduce the computational effort needed to calculate the output activity of a neural network. The suggested technique can be applied to a wide class of connectionist models for processing of slow varying signals (for example: vocal, radar, sonar and video signals). In addition, neither specialized hardware nor big quantities of additional memory are required. For each neuron of the network, the method suggests comparing its activation value at a certain moment with the corresponding activation value computed at the previous net forward computation: if no change occurred the neuron does not perform any computation, otherwise it propagates to the connected neurons the difference of its two activations multiplied by its outcoming weights. The proposal is verified in a speech recognition framework on two main tasks with two different neural network architectures. The results show a drastic reduction of the execution time on both the neural architectures and no significant changes in recognition quality.

international conference on acoustics, speech, and signal processing | 2001

Integration of fixed and multiple resolution analysis in a speech recognition system

Roberto Gemello; Dario Albesano; L. Moisa; R. De Mori

Compares the performance of an operational automatic speech recognition system when Mel frequency-scaled cepstral coefficients (MFCCs), J-Rasta perceptual linear prediction coefficients (J-Rasta PLP) and energies from a multi resolution analysis (MRA) tree of filters are used as input features to a hybrid system consisting of a neural network (NN) which provides observation probabilities for a network of hidden Markov models (HMM). Furthermore, the paper compares the performance of the system when various combinations of these features are used showing a WER reduction of 16% w.r.t. the use of J-Rasta PLP coefficients, when J-Rasta PLP coefficients are combined with the energies computed at the output of the leaves of an MRA filter tree. Such a combination is practically feasible thanks to the NN architecture used in the system. Recognition is performed without any language model on a very large test set including many speakers uttering proper names from different locations of the Italian public telephone network.

international symposium on neural networks | 1998

Linear input network based speaker adaptation in the Dialogos system

Roberto Gemello; Franco Mana; Dario Albesano

Describes an activity devoted to experiment linear input networks (LIN) as a speaker adaptation technique for the neural recognition module of the Dialogos(R) system. The LIN technique is experimented with and some variants devoted to reduce the number of estimated parameters are introduced. The obtained results confirm the validity of LIN for speaker adaptation, while the introduced variants are a valid alternative when a reduced model size is important. The potentialities and drawbacks of supervised and unsupervised speaker adaptation are illustrated. Experimentations with a speaker dependent data base collected from real interactions with the Dialogos system are described in detail showing, in both cases, a relevant improvement in comparison with the speaker independent model.

international symposium on neural networks | 2000

Multi-source neural networks for speech recognition: a review of recent results

Roberto Gemello; Dario Albesano; Franco Mana; Loreta Moisa

Different parameterizations of the speech signal may potentially extract complementary information useful to increase the accuracy in discriminating between confusable sound classes. In spite of this a single parameterization has nearly universally been used in speech recognition because the most diffused matching technology (hidden Markov models) is bound by theoretical and practical constraints that limit the use of multiple features derived from the speech signal with different processing algorithms. On the contrary neural networks are capable of incorporating multiple heterogeneous input features, which do not need to be treated as independent, finding the optimal combination of these features for classification. The purpose of this work is the exploitation of this potentiality of neural networks to improve the speech recognition accuracy. The multiple input features coming from different parameterization algorithms are combined through a network architecture called multi-source NN, designed to obtain the best synergy from them. In this work, we report the last results obtained on this research line by combining the basic spectral features with two auditory inspired features, a formant like feature and the frequency derivatives. The results show that multi-source NN leads to significant error reductions on both isolated words and continuous speech test sets.

Explore More