Fabio Brugnara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fabio Brugnara is active.

Explore More

Publication

Featured researches published by Fabio Brugnara.

Speech Communication | 1993

Automatic segmentation and labeling of speech based on Hidden Markov Models

Fabio Brugnara; Daniele Falavigna; Maurizio Omologo

Abstract An accurate database documentation at phonetic level is very important for speech research: however, manual segmentation and labeling is a time consuming and error prone task. This article describes an automatic procedure for the segmentation of speech: given either the linguistic or the phonetic content of a speech utterance, the system provides phone boundaries. The technique is based on the use of an acoustic-phonetic unit Hidden Markov Model (HMM) recognizer: both the recognizer and the segmentation system have been designed exploiting the DARPA-TIMIT acoustic-phonetic continuous speech database of American English. Segmentation and labeling experiments have been conducted in different conditions to check the reliability of the resulting system. Satisfactory results have been obtained, especially when the system is trained with some manually presegmented material. The size of this material is a crucial factor; system performance has been evaluated with respect to this parameter. It turns out that the system provides 88.3% correct boundary location, given a tolerance of 20 ms, when only 256 phonetically balanced sentences are used for its training.

Speech Communication | 2007

Acoustic variability and automatic recognition of children's speech

Matteo Gerosa; Diego Giuliani; Fabio Brugnara

This paper presents several acoustic analyses carried out on read speech collected from Italian children aged from 7 to 13 years and North American children aged from 5 to 17 years. These analyses aimed at achieving a better understanding of spectral and temporal changes in speech produced by children of various ages in view of the development of automatic speech recognition applications. The results of these analyses confirm and complement the results reported in the literature, showing that characteristics of childrens speech change with age and that spectral and temporal variability decrease as age increases. In fact, younger children show a substantially higher intra- and inter-speaker variability with respect to older children and adults. We investigated the use of several methods for speaker adaptive acoustic modeling to cope with inter-speaker spectral variability and to improve recognition performance for children. These methods proved to be effective in recognition of read speech with a vocabulary of about 11k words.

international conference on acoustics, speech, and signal processing | 2005

Adaptive training using simple target models [speech recognition applications]

Georg Stemmer; Fabio Brugnara; Diego Giuliani

Adaptive training aims at reducing the influence of speaker, channel and environment variability on the acoustic models. We describe an acoustic normalization approach to adaptive training. Phonetically irrelevant acoustic variability is reduced at the beginning of the training procedure w.r.t. a set of target models. The set of target models can be a set of HMMs or a Gaussian mixture model (GMM). CMLLR is applied to normalize the acoustic features. The normalized data contains less unwanted variability and is used to generate and train the recognition models. Employing a GMM as a target model leads to a text-independent procedure that can be embedded into the acoustic front-end. On a broadcast news transcription task we obtain relative reductions in WER of 7.8% in the first recognition pass over a conventionally trained system and of 3.4% in the second recognition pass over a SAT-trained system.

international conference on acoustics, speech, and signal processing | 1995

Language model representations for beam-search decoding

Giuliano Antoniol; Fabio Brugnara; Mauro Cettolo; Marcello Federico

This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1.41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation.

Computer Speech & Language | 1995

Language modelling for efficient beam-search

Marcello Federico; Mauro Cettolo; Fabio Brugnara; Giuliano Antoniol

Abstract This paper considers the problems of estimating bigram language models and of efficiently representing them by a finite state network, which can be employed by a hidden Markov model based, beam-search, continuous speech recognizer. A review of the best known bigram estimation techniques is given together with a description of the original Stacked model. Language model comparisons in terms of perplexity are given for three text corpora with different data sparseness conditions, while speech recognition accuracy tests are presented for a 10 000-word real-time, speaker independent dictation task. The Stacked estimation method compares favourably with the others, by achieving about 93% of word accuracy. If better language model estimates can improve recognition accuracy, representations better suited to the search algorithm can improve its speed as well. Two static representations of language models are introduced: linear and tree-based. Results show that the latter organization is better exploited by the beam-search algorithm as it provides a five times faster response with same word accuracy. Finally, an off-line reduction algorithm is presented that cuts the space requirements of the tree-based topology to about 40%.The proposed solutions presented here have been successfully employed in a real-time, speaker independent, 10 000-word real-time dictation system for radiological reporting.

Speech Communication | 2009

Towards age-independent acoustic modeling

Matteo Gerosa; Diego Giuliani; Fabio Brugnara

In automatic speech recognition applications, due to significant differences in voice characteristics, adults and children are usually treated as two population groups, for which different acoustic models are trained. In this paper, age-independent acoustic modeling is investigated in the context of large vocabulary speech recognition. Exploiting a small amount (9h) of childrens speech and a more significant amount (57h) of adult speech, age-independent acoustic models are trained using several methods for speaker adaptive acoustic modeling. Recognition results achieved using these models are compared with those achieved using age-dependent acoustic models for children and adults, respectively. Recognition experiments are performed on four Italian speech corpora, two consisting of childrens speech and two of adult speech, using 64k word and 11k word trigram language models. Methods for speaker adaptive acoustic modeling prove to be effective for training age-independent acoustic models ensuring recognition results at least as good as those achieved with age-dependent acoustic models for adults and children.

international conference on acoustics, speech, and signal processing | 2006

Integration of Heteroscedastic Linear Discriminant Analysis (HLDA) Into Adaptive Training

Georg Stemmer; Fabio Brugnara

The paper investigates the integration of heteroscedastic linear discriminant analysis (HLDA) into adaptively trained speech recognizers. Two different approaches are compared: the first is a variant of CMLLR-SAT, the second is based on our previously introduced method constrained maximum-likelihood speaker normalization (CMLSN). For the latter both HLDA projection and speaker-specific transformations for normalization are estimated w.r.t. a set of simple target-models. It is investigated if additional robustness can be achieved by estimating HLDA on normalized data. Experimental results are provided for a broadcast news task and a collection of parliamentary speeches. We show that the proposed methods lead to relative reductions in word error rate (WER) of 8% over an adapted baseline system that already includes an HLDA transform. The best performance for both tasks is achieved for the algorithm that is based on CMLSN. When compared to the combination of HLDA and CMLLR-SAT, this method leads to a considerable reduction in computational effort and to a significantly lower WER

international conference on acoustics, speech, and signal processing | 1992

A family of parallel hidden Markov models

Fabio Brugnara; R. De Mori; Diego Giuliani; Maurizio Omologo

Stochastic signal models represent a powerful tool for automatic speech recognition. A particular type of stochastic modeling based on first-order hidden Markov models (HMMs), has been increasingly popular, because it has a solid theoretical basis and offers practical advantages. The authors extend the standard HMM theory to parallel hidden Markov models (PHMMs). The parallel model consists of two statistically related HMMs. This configuration has mixture densities of HMM observations whose weights can be made variable depending on the probability of other HMMs being in certain states. This allows one to dynamically adapt observation statistics to acoustic contexts. Some preliminary experiments have been carried out in order to compare the PHMMs with standard HMMs and the results are presented.<<ETX>>

international conference on acoustics, speech, and signal processing | 2000

A baseline for the transcription of Italian broadcast news

Fabio Brugnara; Mauro Cettolo; Marcello Federico; Diego Giuliani

The paper presents the first achievements in the development of a broadcast news transcription system to be applied for the processing of huge audio archives. In particular, the Italian broadcast news corpus under collection is introduced, and the first implemented baseline system is outlined. The baseline system consists of an audio segmentation module and a speech recognizer featuring a recursive Viterbi beam search, a 64k word lexicon, a tree-based trigram LM representation, and MLLR adaptation. The word error rate of the baseline was 20.9% on planned studio speech and 28.8% on the whole test set.

international conference on acoustics, speech, and signal processing | 2013

Comparing two methods for crowdsourcing speech transcription

Rachele Sprugnoli; Giovanni Moretti; Matteo Fuoli; Diego Giuliani; Luisa Bentivogli; Emanuele Pianta; Roberto Gretter; Fabio Brugnara

This paper presents the results of an experimental study conducted with the aim of comparing two methods for crowdsourcing speech transcription that incorporate two different quality control mechanisms (i.e. explicit versus implicit) and that are based on two different processes (i.e. parallel versus iterative). In the Gold Standard method the same speech segment is transcribed in parallel by multiple contributors whose reliability is checked with respect to some reference transcriptions provided by experts. On the other hand, in the Dual Pathway method two independent groups of contributors work on the same set of transcriptions refining them in an iterative way until they converge, and thus eliminating the need to have reference transcriptions and to check transcription quality in a separate phase. These two methods were tested on about half an hour of broadcast news speech and for two different European languages, namely German and Italian. Both methods obtained good results in terms of Word Error Rate (WER) and compare well with the word disagreement rate of experts on the same data.

Explore More