Daniele Falavigna | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniele Falavigna is active.

Explore More

Publication

Featured researches published by Daniele Falavigna.

Speech Communication | 1993

Automatic segmentation and labeling of speech based on Hidden Markov Models

Fabio Brugnara; Daniele Falavigna; Maurizio Omologo

Abstract An accurate database documentation at phonetic level is very important for speech research: however, manual segmentation and labeling is a time consuming and error prone task. This article describes an automatic procedure for the segmentation of speech: given either the linguistic or the phonetic content of a speech utterance, the system provides phone boundaries. The technique is based on the use of an acoustic-phonetic unit Hidden Markov Model (HMM) recognizer: both the recognizer and the segmentation system have been designed exploiting the DARPA-TIMIT acoustic-phonetic continuous speech database of American English. Segmentation and labeling experiments have been conducted in different conditions to check the reliability of the resulting system. Satisfactory results have been obtained, especially when the system is trained with some manually presegmented material. The size of this material is a crucial factor; system performance has been evaluated with respect to this parameter. It turns out that the system provides 88.3% correct boundary location, given a tolerance of 20 ms, when only 256 phonetically balanced sentences are used for its training.

ieee automatic speech recognition and understanding workshop | 2015

Boosted acoustic model learning and hypotheses rescoring on the CHiME-3 task

Shahab Jalalvand; Daniele Falavigna; Marco Matassoni; Piergiorgio Svaizer; Maurizio Omologo

Speech recognition in a realistic noisy environment using multiple microphones is the focal point of the third CHiME challenge. Over the baseline ASR system provided for this challenge, we apply state of the art algorithms for boosting acoustic model learning and hypothesis rescoring to improve the final output. To this aim, we first use the automatic transcription of each channel to re-train the acoustic model for that channel and then we apply linear language model rescoring to find a better solution in the n-best list. LM rescoring is performed using an efficient set of N-gram and Recurrent Neural Network LM (RNNLM) trained on a wisely-selected text set. In the experiments, we show that the proposed approach improves not only the individual channel transcription, but also the enhanced channels produced by MVDR and delay-and-sum beamforming.

Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376) | 1998

Telephone speech recognition applications at IRST

Daniele Falavigna; Roberto Gretter

This paper presents work performed at IRST on automatic telephone speech recognition. The work focuses on development and installation of some systems that allow delivery of automatic services over the telephone. In particular, both a collect call service, for continuous digit recognition, and a voice dialing by name system is described. Both systems use phone-like units, first trained on wideband databases and then refined on telephone speech material collected at IRST. For each application, a test database was collected in the field, for which the results are given. A comparison between some techniques for increasing the robustness with respect to channel and noise effects is included. Finally, research activities aimed at developing dialogue models are summarized.

international conference on multimodal interfaces | 2004

A multi-modal architecture for cellular phones

Luca Nardelli; Marco Orlandi; Daniele Falavigna

Today, both Automatic Speech Recognition (ASR) and Text To Speech (TTS) systems are enough reliable to cope with a large set of telephone applications [1, 2]. At the same time, personal smart devices are becoming more and more powerful, in terms of computation and memory capabilities, while we are noticing a convergence between telephone and data networks, including WiFi networks. All of this is causing the increase in the demand of systems that can combine different interacting modalities. For example, ASR is necessary in absence of standard keyboards or in situations where user hands are busy; TTS is also needed when the devices have not sufficient display capabilities. In general, the combination of voice and graphic interfaces allows users to have a more comfortable interaction with a service, provided that the underlying system is transparent with respect to the chosen modality, i.e. the user has nothing to notify to the system itself. In a previous paper [3] we have presented a system architecture that can synchronize events coming from different devices (e.g. keyboard, mouse, microphone, etc), allowing to handle flexible multi-modal interactions using Personal Digital Assistants (PDAs) as client devices. In this paper we will describe a possible extension of the proposed architecture allowing to handle interactions with new generation cellular phones. Although the various modules of the system could be freely distributed inside a communication network, we will describe a “server side” solution, where both ASR and TTS resources are not embedded on the client smartphone, but are instead located on a remote server. This has the disadvantage that the speech signal needs be transmitted to the

international conference on acoustics, speech, and signal processing | 1997

Multilingual person to person communication at IRST

Bianca Angelini; Mauro Cettolo; Anna Corazza; Daniele Falavigna; Gianni Lazzari

This paper refers to a machine-mediated person-to-person multilingual communication system. Stress is put on robustness, that is the ability of the system to preserve communication even in presence of the variability and errors typical of spoken language systems. The statistical approach is adopted not only at the acoustic level, but also for the linguistic processing. Therefore, while an overview of the global architecture is briefly introduced, the focus is put on the acoustic recognizer and the understanding module. Experimental evaluations complete the presentation.

meeting of the association for computational linguistics | 2016

TranscRater: a Tool for Automatic Speech Recognition Quality Estimation.

Shahab Jalalvand; Matteo Negri; Marco Turchi; José G. C. de Souza; Daniele Falavigna; Mohammed R. H. Qwaider

We present TranscRater, an open-source tool for automatic speech recognition (ASR) quality estimation (QE). The tool allows users to perform ASR evaluation bypassing the need of reference transcripts and confidence information, which is common to current assessment protocols. TranscRater includes: i) methods to extract a variety of quality indicators from (signal, transcription) pairs and ii) machine learning algorithms which make possible to build ASR QE models exploiting the extracted features. Confirming the positive results of previous evaluations, new experiments with TranscRater indicate its effectiveness both in WER prediction and transcription ranking tasks.

spoken language technology workshop | 2010

Evaluation of automatic transcription systems for the judicial domain

Jonas Lööf; Daniele Falavigna; Ralf Schlüter; Diego Giuliani; Roberto Gretter; Hermann Ney

This paper describes two different automatic transcription systems developed for judicial application domains for the Polish and Italian languages. The judicial domain requires to cope with several factors which are known to be critical for automatic speech recognition, such as: background noise, reverberation, spontaneous and accented speech, overlapped speech, cross channel effects, etc.

ieee automatic speech recognition and understanding workshop | 2009

Phone-to-word decoding through statistical machine translation and complementary system combination

Daniele Falavigna; Matteo Gerosa; Roberto Gretter; Diego Giuliani

In this paper, phone-to-word transduction is first investigated by coupling a speech recognizer, generating for each speech segment a phone sequence or a phone confusion network, with the efficient decoder of confusion networks adopted by MOSES, a popular statistical machine translation toolkit. Then, system combination is investigated by combining the outputs of several conventional ASR systems with the output of a system embedding phone-to-word decoding through statistical machine translation. Experiments are carried out in the context of a large vocabulary speech recognition task consisting of transcription of speeches delivered in English during the European Parliament Plenary Sessions (EPPS). While only a marginal performance improvements is achieved in system combination experiments when the output of the phone-to-word transducer is included in the combination, partial results show a great potential for improvements.

Speech Communication | 2006

Design and evaluation of acoustic and language models for large scale telephone services

Andrea Facco; Daniele Falavigna; Roberto Gretter; Marcello Viganò

Abstract This paper describes the specification, design and development phases of two widely used telephone services based on automatic speech recognition. The effort spent for evaluating and tuning these services will be discussed in detail. In developing the first service, mainly based on the recognition of “alphanumeric” sequences, a significant part of the work consisted in refining the acoustic models. To increase recognition accuracy we adopted algorithms and methods consolidated in the past over broadcast news transcription tasks. A significant result shows that the use of task specific context dependent phone models reduces the word error rate by about 40% relative to using context independent phone models. Note that the latter result was achieved over a small vocabulary task, significantly different from those generally used in broadcast news transcription. We also investigated both unsupervised and supervised training procedures. Moreover, we studied a novel partly supervised technique that allows us to select in some “optimal” way the speech material to manually transcribe and use for acoustic model training. A significant result shows that the proposed procedure gives performance close to that obtained with a completely supervised training method. In the second service, mainly based on phrase spotting, a wide effort was devoted to language model refinement. In particular, several types of rejection networks were studied to detect out of vocabulary words for the given task; a major result demonstrates that using rejection networks based on a class trigram language model reduces the word error rate from 36.7% to 11.1% with respect to using a phone loop network. For the latter service, the benefits and related costs brought by regular grammars, stochastic language models and mixed language models will be also reported and discussed. Finally, notice that most of experiments described in this paper were carried out on field databases collected through the developed services.

text speech and dialogue | 2000

Some Improvements on the IRST Mixed Initiative Dialogue Technology

Cristina Barbero; Daniele Falavigna; Roberto Gretter; Marco Orlandi; Emanuele Pianta

The paper describes the ITC-irst approach for handling spoken dialogue interactions over the telephone network. Barge-in and utterance verification capabilities are going to be introduced into the developed software architecture. Some research activities that should enable accessing information in a new large applicative domain (i.e. the tourism domain) have been started. Objectives of the research are: language model adaptation and efficient information presentation, using a mixed representation approach.

Explore More