Mauro Cettolo
fondazione bruno kessler
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mauro Cettolo.
workshop on statistical machine translation | 2007
Marcello Federico; Mauro Cettolo
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolkit for statistical machine translation. Experiments are reported with a high performing baseline, trained on the Chinese-English NIST 2006 Evaluation task and running on a standard Linux 64-bit PC architecture. Comparative tests show that our representation halves the memory required by SRI LM Toolkit, at the cost of 44% slower translation speed. However, as it can take advantage of memory mapping on disk, the proposed implementation seems to scale-up much better to very large language models: decoding with a 289-million 5-gram language model runs in 2.1Gb of RAM.
international conference on acoustics, speech, and signal processing | 2003
Erwin Leeuwis; Marcello Federico; Mauro Cettolo
Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved.
international conference on acoustics, speech, and signal processing | 2003
Mauro Cettolo; Michele Vescovi
A widely adopted algorithm for the audio segmentation is based on the Bayesian information criterion (BIC), applied within a sliding variable-size analysis window. In this work, three different implementations of that algorithm are analyzed in detail: (i) one that keeps updated a pair of sums, that of input vectors and that of square input vectors, in order to save computations in estimating covariance matrixes on partially shared data; (ii) one, recently proposed in the literature, that exploits the encoding of the input signal with cumulative statistics for the efficient estimation of covariance matrixes; and (iii) an original one, that encodes the input stream with the cumulative pair of sums of the first approach. The three approaches have been compared both theoretically and experimentally, and the proposed original approach is shown to be the most efficient.
Computer Speech & Language | 2005
Mauro Cettolo; Michele Vescovi; Romeo Rizzi
Abstract The Bayesian Information Criterion (BIC) is a widely adopted method for audio segmentation, and has inspired a number of dominant algorithms for this application. At present, however, literature lacks in analytical and experimental studies on these algorithms. This paper tries to partially cover this gap. Typically, BIC is applied within a sliding variable-size analysis window where single changes in the nature of the audio are locally searched. Three different implementations of the algorithm are described and compared: (i) the first keeps updated a pair of sums, that of input vectors and that of square input vectors, in order to save computations in estimating covariance matrices on partially shared data; (ii) the second implementation, recently proposed in literature, is based on the encoding of the input signal with cumulative statistics for an efficient estimation of covariance matrices; (iii) the third implementation consists of a novel approach, and is characterized by the encoding of the input stream with the cumulative pair of sums of the first approach. Furthermore, a dynamic programming algorithm is presented that, within the BIC model, finds a globally optimal segmentation of the input audio stream. All algorithms are analyzed in detail from the viewpoint of the computational cost, experimentally evaluated on proper tasks, and compared.
international conference on acoustics, speech, and signal processing | 1995
Giuliano Antoniol; Fabio Brugnara; Mauro Cettolo; Marcello Federico
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1.41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation.
Computer Speech & Language | 1995
Marcello Federico; Mauro Cettolo; Fabio Brugnara; Giuliano Antoniol
Abstract This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by a hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of perplexity are given for three text corpora with different data sparseness conditions, while speech recognition accuracy tests are presented for a 10 000-word real-time, speaker independent dictation task. The Stacked estimation method compares favourably with the others, by achieving about 93% of word accuracy. If better language model estimates can improve recognition accuracy, representations better suited to the search algorithm can improve its speed as well. Two static representations of language models are introduced: linear and tree-based. Results show that the latter organization is better exploited by the beam-search algorithm as it provides a five times faster response with same word accuracy. Finally, an off-line reduction algorithm is presented that cuts the space requirements of the tree-based topology to about 40%.The proposed solutions presented here have been successfully employed in a real-time, speaker independent, 10 000-word real-time dictation system for radiological reporting.
empirical methods in natural language processing | 2016
Luisa Bentivogli; Arianna Bisazza; Mauro Cettolo; Marcello Federico
Within the field of Statistical Machine Translation (SMT), the neural approach (NMT) has recently emerged as the first technology able to challenge the long-standing dominance of phrase-based approaches (PBMT). In particular, at the IWSLT 2015 evaluation campaign, NMT outperformed well established state-of-the-art PBMT systems on English-German, a language pair known to be particularly hard because of morphology and syntactic differences. To understand in what respects NMT provides better translation quality than PBMT, we perform a detailed analysis of neural versus phrase-based SMT outputs, leveraging high quality post-edits performed by professional translators on the IWSLT data. For the first time, our analysis provides useful insights on what linguistic phenomena are best modeled by neural models -- such as the reordering of verbs -- while pointing out other aspects that remain to be improved.
empirical methods in natural language processing | 2015
Christian Hardmeier; Preslav Nakov; Sara Stymne; Jörg Tiedemann; Yannick Versley; Mauro Cettolo
We describe the design, the evaluation setup, and the results of the DiscoMT 2015 shared task, which included two subtasks, relevant to both the machine translation (MT) and the discourse communities: (i) pronoun-focused translation, a practical MT task, and (ii) cross-lingual pronoun prediction, a classification task that requires no specific MT expertise and is interesting as a machine learning task in its own right. We focused on the English‐French language pair, for which MT output is generally of high quality, but has visible issues with pronoun translation due to differences in the pronoun systems of the two languages. Six groups participated in the pronoun-focused translation task and eight groups in the cross-lingual pronoun prediction task.
international conference on acoustics, speech, and signal processing | 2001
Nicola Bertoldi; E. Brugnara; Mauro Cettolo; Marcello Federico; Diego Giuliani
Reports on experiments of porting the ITC-irst Italian broadcast news recognition system to two spontaneous dialogue domains. The trade-off between performance and the required amount of task specific data was investigated. Porting was experimented by applying supervised adaptation methods to acoustic and language models. By using two hours of manually transcribed speech, word error rates of 26.0% and 28.4% were achieved by the adapted systems. Two reference systems, developed on a larger training corpus, achieved word error rates of 22.6% and 21.2%, respectively.
Machine Translation | 2014
Nicola Bertoldi; Patrick Simianer; Mauro Cettolo; Katharina Wäschle; Marcello Federico; Stefan Riezler
Recent research has shown that accuracy and speed of human translators can benefit from post-editing output of machine translation systems, with larger benefits for higher quality output. We present an efficient online learning framework for adapting all modules of a phrase-based statistical machine translation system to post-edited translations. We use a constrained search technique to extract new phrase-translations from post-edits without the need of re-alignments, and to extract phrase pair features for discriminative training without the need for surrogate references. In addition, a cache-based language model is built on