Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roberto Gretter is active.

Publication


Featured researches published by Roberto Gretter.


international conference on acoustics, speech, and signal processing | 2001

On-line learning of language models with word error probability distributions

Roberto Gretter; Giuseppe Riccardi

We are interested in the problem of learning stochastic language models on-line (without speech transcriptions) for adaptive speech recognition and understanding. We propose an algorithm to adapt to variations in the language model distributions based on speech input only and without its true transcription. The on-line probability estimate is defined. as a function of the prior and word error distributions. We show the effectiveness of word-lattice based error probability distributions in terms of receiver operating characteristics (ROC) curves and word accuracy. We apply the new estimates P/sub adapt/(w) to the task of adapting on-line an initial large vocabulary trigram language model and show improvement in word accuracy with respect to the baseline speech recognizer.


ieee automatic speech recognition and understanding workshop | 2015

The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments

Mirco Ravanelli; Luca Cristoforetti; Roberto Gretter; Marco Pellin; Alessandro Sosi; Maurizio Omologo

This paper introduces the contents and the possible usage of the DIRHA-ENGLISH multi-microphone corpus, recently realized under the EC DIRHA project. The reference scenario is a domestic environment equipped with a large number of microphones and microphone arrays distributed in space. The corpus is composed of both real and simulated material, and it includes 12 US and 12 UK English native speakers. Each speaker uttered different sets of phonetically-rich sentences, newspaper articles, conversational speech, keywords, and commands. From this material, a large set of 1-minute sequences was generated, which also includes typical domestic background noise as well as inter/intra-room reverberation effects. Dev and test sets were derived, which represent a very precious material for different studies on multi-microphone speech processing and distant-speech recognition. Various tasks and corresponding Kaldi recipes have already been developed. The paper reports a first set of baseline results obtained using different techniques, including Deep Neural Networks (DNN), aligned with the state-of-the-art at international level.


international conference on acoustics, speech, and signal processing | 2013

Comparing two methods for crowdsourcing speech transcription

Rachele Sprugnoli; Giovanni Moretti; Matteo Fuoli; Diego Giuliani; Luisa Bentivogli; Emanuele Pianta; Roberto Gretter; Fabio Brugnara

This paper presents the results of an experimental study conducted with the aim of comparing two methods for crowdsourcing speech transcription that incorporate two different quality control mechanisms (i.e. explicit versus implicit) and that are based on two different processes (i.e. parallel versus iterative). In the Gold Standard method the same speech segment is transcribed in parallel by multiple contributors whose reliability is checked with respect to some reference transcriptions provided by experts. On the other hand, in the Dual Pathway method two independent groups of contributors work on the same set of transcriptions refining them in an iterative way until they converge, and thus eliminating the need to have reference transcriptions and to check transcription quality in a separate phase. These two methods were tested on about half an hour of broadcast news speech and for two different European languages, namely German and Italian. Both methods obtained good results in terms of Word Error Rate (WER) and compare well with the word disagreement rate of experts on the same data.


Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. | 2002

A modified "PaIntE" model for Italian TTS

Piero Cosi; Cinzia Avesani; Fabio Tesser; Roberto Gretter; Fabio Pianesi

In this work, a slightly modified version of the original PaIntE model, based on an F0 parametrization with an especially designed approximation function, is considered. The models parameters have been automatically optimized using a small set of Italian ToBI labeled sentences. This method drives our ongoing data-based approach to intonation modeling for Italian TTS. The quality of the model has been assessed by numerical measures and preliminary tests show quite promising results.


Proceedings 1998 IEEE 4th Workshop Interactive Voice Technology for Telecommunications Applications. IVTTA '98 (Cat. No.98TH8376) | 1998

Telephone speech recognition applications at IRST

Daniele Falavigna; Roberto Gretter

This paper presents work performed at IRST on automatic telephone speech recognition. The work focuses on development and installation of some systems that allow delivery of automatic services over the telephone. In particular, both a collect call service, for continuous digit recognition, and a voice dialing by name system is described. Both systems use phone-like units, first trained on wideband databases and then refined on telephone speech material collected at IRST. For each application, a test database was collected in the field, for which the results are given. A comparison between some techniques for increasing the robustness with respect to channel and noise effects is included. Finally, research activities aimed at developing dialogue models are summarized.


international conference on spoken language processing | 1996

SpeeData: multilingual spoken data entry

Ulla Ackermann; Bianca Angelini; Fabio Brugnara; Marcello Federico; Diego Giuliani; Roberto Gretter; Gianni Lazzari; Heinrich Niemann

The authors present a multilingual application for speech technology. The SpeeData project aims at building a demonstrator that provides a user-friendly interface for spoken data-entry in two languages: Italian and German. The application domain is the land register of an Italian region in which both languages are officially spoken. The considered data-entry task is particularly challenging as it considers many different types of data-e.g. long texts, numbers, proper names, tables, etc.-and a variety of of pronunciations, since dialects are present and users will not always speak in their native language.


spoken language technology workshop | 2010

Evaluation of automatic transcription systems for the judicial domain

Jonas Lööf; Daniele Falavigna; Ralf Schlüter; Diego Giuliani; Roberto Gretter; Hermann Ney

This paper describes two different automatic transcription systems developed for judicial application domains for the Polish and Italian languages. The judicial domain requires to cope with several factors which are known to be critical for automatic speech recognition, such as: background noise, reverberation, spontaneous and accented speech, overlapped speech, cross channel effects, etc.


international conference on acoustics, speech, and signal processing | 2009

On-line speaker adaptation on telephony speech data with adaptively trained acoustic models

Diego Giuliani; Roberto Gretter; Fabio Brugnara

This paper addresses speaker adaptive acoustic modeling, based on feature space maximum likelihood linear regression, in the context of on-line telephony applications. An adaptive acoustic modeling method, that we previously proved effective in off-line applications, is used to train acoustic models to be used in text-dependent and text-independent on-line adaptation. Experiments on telephony speech data indicate that feature space maximum a posteriori linear regression (fMAPLR) greatly helps to cope with sparse adaptation data when performing instantaneous and incremental adaptation with both baseline models and speaker adaptively trained models. The use of speaker adaptively trained models in conjunction with fMAPLR leads to the best recognition results in both instantaneous and incremental adaptation. The proposed text-independent adaptation approach, exploiting speaker adaptively trained models, is also proven effective.


ieee automatic speech recognition and understanding workshop | 2009

Phone-to-word decoding through statistical machine translation and complementary system combination

Daniele Falavigna; Matteo Gerosa; Roberto Gretter; Diego Giuliani

In this paper, phone-to-word transduction is first investigated by coupling a speech recognizer, generating for each speech segment a phone sequence or a phone confusion network, with the efficient decoder of confusion networks adopted by MOSES, a popular statistical machine translation toolkit. Then, system combination is investigated by combining the outputs of several conventional ASR systems with the output of a system embedding phone-to-word decoding through statistical machine translation. Experiments are carried out in the context of a large vocabulary speech recognition task consisting of transcription of speeches delivered in English during the European Parliament Plenary Sessions (EPPS). While only a marginal performance improvements is achieved in system combination experiments when the output of the phone-to-word transducer is included in the combination, partial results show a great potential for improvements.


artificial intelligence methodology systems applications | 2008

Dealing with Spoken Requests in a Multimodal Question Answering System

Roberto Gretter; Milen Kouylekov; Matteo Negri

This paper reports on experiments performed in the development of the QALL-ME system, a multilingual QA infrastructure capable of handling input requests both in written and spoken form. Our objective is to estimate the impact of dealing with automatically transcribed (i.e.noisy) requests on a specific question interpretation task, namely the extraction of relations from natural language questions. A number of experiments are presented, featuring different combinations of manually and automatically transcribed questions datasets to train and evaluate the system. Results (ranging from 0.624 to 0.634 F-measure in the recogniton of the relations expressed by a question) demonstrate that the impact of noisy data on question interpretation is negligible with all the combinations of training/test data. This shows that the benefits of enabling speech access capabilities, allowing for a more natural human-machine interaction, outweight the minimal loss in terms of performance.

Collaboration


Dive into the Roberto Gretter's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Diego Giuliani

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar

Fabio Brugnara

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge