Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marco Matassoni is active.

Publication


Featured researches published by Marco Matassoni.


international conference on acoustics, speech, and signal processing | 1997

Microphone array based speech recognition with different talker-array positions

Maurizio Omologo; Marco Matassoni; Piergiorgio Svaizer; Diego Giuliani

The use of a microphone array for hands-free continuous speech recognition in noisy and reverberant environment is investigated. An array of eight omnidirectional microphones was placed at different angles and distances from the talker. A time delay compensation module was used to provide a beamformed signal as input to a hidden Markov model (HMM) based recognizer. A phone HMM adaptation, based on a small amount of phonetically rich sentences, further improved the recognition rate obtained by applying only beamforming. These results were confirmed both by experiments conducted in a noisy and reverberant environment and by simulations. In the latter case, different conditions were recreated by using the image method to reproduce synthetic versions of the array microphone signals.


Speech Communication | 1998

Environmental conditions and acoustic transduction in hands-free speech recognition

Maurizio Omologo; Piergiorgio Svaizer; Marco Matassoni

Abstract Hands-free interaction represents a key-point for increase of flexibility of present applications and for the development of new speech recognition applications, where the user cannot be encumbered by either hand-held or head-mounted microphones. When the microphone is far from the speaker, the transduced signal is affected by degradation of different nature, that is often unpredictable. Special microphones and multi-microphone acquisition systems represent a way of reducing some environmental noise effects. Robust processing and adaptation techniques can be further used in order to compensate for different kinds of variability that may be present in the recognizer input. The purpose of this paper is to re-visit some of the assumptions about the different sources of this variability and to discuss both on special transducer systems and on compensation/adaptation techniques that can be adopted. In particular, the paper will refer to the use of multi-microphone systems to overcome some undesired effects caused by room acoustics (e.g. reverberation) and by coherent/incoherent noise (e.g. competitive talkers, computer fans). The paper concludes with the description of some experiments that were conducted both on real and simulated speech data.


international conference on acoustics speech and signal processing | 1999

Training of HMM with filtered speech material for hands-free recognition

Diego Giuliani; Marco Matassoni; Maurizio Omologo; Piergiorgio Svaizer

This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. Filtering consists in a convolution with the acoustic impulse response between the speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on maximum likelihood linear regression (MLLR) adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array.


international conference on acoustics, speech, and signal processing | 1995

Hands free continuous speech recognition in noisy environment using a four microphone array

Diego Giuliani; Marco Matassoni; Maurizio Omologo; Piergiorgio Svaizer

This paper describes advances in the use of HMM based technology for speaker independent continuous speech recognition, in noisy environment, under hands free interaction mode. For this purpose an array of four omnidirectional microphones is employed as the acquisition system. The processing of phase information in the cross-power spectrum provides the capability both of locating the talker position and of reconstructing an enhanced speech spectrum. Two enhancement techniques are described, that provide recognition improvement in the case of clean input speech as well as under different adverse conditions. The results refer to the use of a new multichannel corpus, collected in a real environment by a microphone array as well as a close-talk microphone.


Computer Speech & Language | 2013

Blind source extraction for robust speech recognition in multisource noisy environments

Francesco Nesta; Marco Matassoni

This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion. While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate.


international conference on acoustics speech and signal processing | 1998

Experiments of HMM adaptation for hands-free connected digit recognition

Diego Giuliani; Marco Matassoni; Maurizio Omologo; Piergiorgio Svaizer

A scenario concerning hands-free connected digit recognition in a noisy office environment is investigated. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a hidden Markov model (HMM) based recognizer. Two different techniques of phone HMM adaptation have been considered, to reduce the mismatch between training and test conditions. Adaptation material and test material were collected in two different sessions. Results show that a digit accuracy close to 98% can be achieved when the talker is at 1.5 m distance from the array. This result has to be compared with 99.5% accuracy obtained by using a close-talk microphone.


international conference on acoustics, speech, and signal processing | 2002

On the joint use of noise reduction and MLLR adaptation for in-car hands-free speech recognition

Marco Matassoni; Maurizio Omologo; Alfiero Santarelli; Piergiorgio Svaizer

This paper refers to an activity under way at the speech recognition technology level for the development of a hands-free dialogue interaction system in the car environment. The work here presented concerns the use of two noise reduction techniques, as well as of MLLR adaptation, for recognition error reduction in low and medium complexity tasks, namely connected digits and spelling with or without bigram/trigram statistical constraints. Experiments are based on the use of SpeechDat Car database, a corpus collected under real noisy conditions. Results show the additive improvements in performance, obtained by adopting noise reduction techniques and MLLR adaptation.


ieee automatic speech recognition and understanding workshop | 2015

Boosted acoustic model learning and hypotheses rescoring on the CHiME-3 task

Shahab Jalalvand; Daniele Falavigna; Marco Matassoni; Piergiorgio Svaizer; Maurizio Omologo

Speech recognition in a realistic noisy environment using multiple microphones is the focal point of the third CHiME challenge. Over the baseline ASR system provided for this challenge, we apply state of the art algorithms for boosting acoustic model learning and hypothesis rescoring to improve the final output. To this aim, we first use the automatic transcription of each channel to re-train the acoustic model for that channel and then we apply linear language model rescoring to find a better solution in the n-best list. LM rescoring is performed using an efficient set of N-gram and Recurrent Neural Network LM (RNNLM) trained on a wisely-selected text set. In the experiments, we show that the proposed approach improves not only the individual channel transcription, but also the enhanced channels produced by MVDR and delay-and-sum beamforming.


european signal processing conference | 2015

Multi-room speech activity detection using a distributed microphone network in domestic environments

Panagiotis Giannoulis; Alessio Brutti; Marco Matassoni; Alberto Abad; Athanasios Katsamanis; Miguel Matos; Gerasimos Potamianos; Petros Maragos

Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech processing algorithms. In this application scenario, a crucial task is the detection and localization of speech events generated by users within the various rooms. A specific challenge of multi-room environments is the inter-room interference that negatively affects speech activity detectors. In this paper, we present and compare different solutions for the multi-room speech activity detection task. The combination of a model-based room-independent speech activity detection module with a room-dependent inside/outside classification stage, based on specific features, provides satisfactory performance. The proposed methods are evaluated on a multi-room, multi-channel corpus, where spoken commands and other typical acoustic events occur in different rooms.


international conference on acoustics, speech, and signal processing | 2014

On the use of Early-To-Late Reverberation ratio for ASR in reverberant environments

Alessio Brutti; Marco Matassoni

This work presents an analysis of distant-talking speech recognition in a variety of reverberant conditions, correlating ASR performance to the acoustic characteristics of a given propagation channel. In particular we show how, for a digit recognition task, the ASR accuracy is directly related to the Early-to-Late Reverberation ratio of the room impulse response, capturing in a single parameter the reverberation properties of a given channel independently of the setup. Consequently, this measure can be successfully considered for acoustic model training either selecting the most suitable model for a given spatial configuration, or defining the subset of RIRs to be used for the creation of multi-condition models. Experimental results on simulated data as well as on data generated with real impulse responses support our claims.

Collaboration


Dive into the Marco Matassoni's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Diego Giuliani

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hari Krishna Maganti

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Alessio Brutti

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge