Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Maurizio Omologo is active.

Publication


Featured researches published by Maurizio Omologo.


IEEE Transactions on Speech and Audio Processing | 1997

Use of the crosspower-spectrum phase in acoustic event location

Maurizio Omologo; Piergiorgio Svaizer

The article reports on the use of crosspower-spectrum phase (CSP) analysis as an accurate time delay estimation (TDE) technique. It is used in a microphone array system for the location of acoustic events in noisy and reverberant environments. A corresponding coherence measure (CM) and its graphical representation are introduced to show the TDE accuracy. Using a two-microphone pair array, real experiments show less than a 10 cm average location error in a 6 m/spl times/6 m area.


international conference on acoustics, speech, and signal processing | 1994

Acoustic event localization using a crosspower-spectrum phase based technique

Maurizio Omologo; Piergiorgio Svaizer

Linear microphone arrays can be employed for acoustic event localization in a noisy environment using time delay estimation. Three techniques are investigated that allow delay estimation, namely normalized cross correlation, LMS adaptive filters, crosspower-spectrum phase: they are combined with a bidimensional representation, the coherence measure, in order to emphasize information that can be exploited for estimating position of both non-moving and moving acoustic sources. To compare the given techniques, different acoustic sources were considered, that generated events in different positions in space. Expressing performance in terms of accuracy of the wavefront direction angle, experiments showed that the crosspower-spectrum phase based technique outperforms the other two. This technique provided very promising preliminary results also in terms of source position estimation.<<ETX>>


Speech Communication | 1993

Automatic segmentation and labeling of speech based on Hidden Markov Models

Fabio Brugnara; Daniele Falavigna; Maurizio Omologo

Abstract An accurate database documentation at phonetic level is very important for speech research: however, manual segmentation and labeling is a time consuming and error prone task. This article describes an automatic procedure for the segmentation of speech: given either the linguistic or the phonetic content of a speech utterance, the system provides phone boundaries. The technique is based on the use of an acoustic-phonetic unit Hidden Markov Model (HMM) recognizer: both the recognizer and the segmentation system have been designed exploiting the DARPA-TIMIT acoustic-phonetic continuous speech database of American English. Segmentation and labeling experiments have been conducted in different conditions to check the reliability of the resulting system. Satisfactory results have been obtained, especially when the system is trained with some manually presegmented material. The size of this material is a crucial factor; system performance has been evaluated with respect to this parameter. It turns out that the system provides 88.3% correct boundary location, given a tolerance of 20 ms, when only 256 phonetically balanced sentences are used for its training.


international conference on acoustics speech and signal processing | 1996

Acoustic source location in noisy and reverberant environment using CSP analysis

Maurizio Omologo; Piergiorgio Svaizer

A linear four microphone array can be employed for acoustic event location in a real environment using an accurate time delay estimation. This paper refers to the use of a specific technique, based on crosspower spectrum phase (CSP) analysis, that yielded accurate location performance. The behavior of this technique is investigated under different noise and reverberation conditions. Real experiments as well as simulations were conducted to analyze a wide variety of situations. Results show system robustness at quite critical environmental conditions.


international conference on acoustics, speech, and signal processing | 1997

Microphone array based speech recognition with different talker-array positions

Maurizio Omologo; Marco Matassoni; Piergiorgio Svaizer; Diego Giuliani

The use of a microphone array for hands-free continuous speech recognition in noisy and reverberant environment is investigated. An array of eight omnidirectional microphones was placed at different angles and distances from the talker. A time delay compensation module was used to provide a beamformed signal as input to a hidden Markov model (HMM) based recognizer. A phone HMM adaptation, based on a small amount of phonetically rich sentences, further improved the recognition rate obtained by applying only beamforming. These results were confirmed both by experiments conducted in a noisy and reverberant environment and by simulations. In the latter case, different conditions were recreated by using the image method to reproduce synthetic versions of the array microphone signals.


CLEaR | 2006

CLEAR evaluation of acoustic event detection and classification systems

Andrey Temko; Robert G. Malkin; Christian Zieger; Dusan Macho; Climent Nadeu; Maurizio Omologo

In this paper, we present the results of the Acoustic Event Detection (AED) and Classification (AEC) evaluations carried out in February 2006 by the three participant partners from the CHIL project. The primary evaluation task was AED of the testing portions of the isolated sound databases and seminar recordings produced in CHIL. Additionally, a secondary AEC evaluation task was designed using only the isolated sound databases. The set of meeting-room acoustic event classes and the metrics were agreed by the three partners and ELDA was in charge of the scoring task. In this paper, the various systems for the tasks of AED and AEC and their results are presented.


Speech Communication | 1998

Environmental conditions and acoustic transduction in hands-free speech recognition

Maurizio Omologo; Piergiorgio Svaizer; Marco Matassoni

Abstract Hands-free interaction represents a key-point for increase of flexibility of present applications and for the development of new speech recognition applications, where the user cannot be encumbered by either hand-held or head-mounted microphones. When the microphone is far from the speaker, the transduced signal is affected by degradation of different nature, that is often unpredictable. Special microphones and multi-microphone acquisition systems represent a way of reducing some environmental noise effects. Robust processing and adaptation techniques can be further used in order to compensate for different kinds of variability that may be present in the recognizer input. The purpose of this paper is to re-visit some of the assumptions about the different sources of this variability and to discuss both on special transducer systems and on compensation/adaptation techniques that can be adopted. In particular, the paper will refer to the use of multi-microphone systems to overcome some undesired effects caused by room acoustics (e.g. reverberation) and by coherent/incoherent noise (e.g. competitive talkers, computer fans). The paper concludes with the description of some experiments that were conducted both on real and simulated speech data.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Convolutive BSS of Short Mixtures by ICA Recursively Regularized Across Frequencies

Francesco Nesta; Piergiorgio Svaizer; Maurizio Omologo

This paper proposes a new method of frequency-domain blind source separation (FD-BSS), able to separate acoustic sources in challenging conditions. In frequency-domain BSS, the time-domain signals are transformed into time-frequency series and the separation is generally performed by applying independent component analysis (ICA) at each frequency envelope. When short signals are observed and long demixing filters are required, the number of time observations for each frequency is limited and the variance of the ICA estimator increases due to the intrinsic statistical bias. Furthermore, common methods used to solve the permutation problem fail, especially with sources recorded under highly reverberant conditions. We propose a recursively regularized implementation of the ICA (RR-ICA) that overcomes the mentioned problem by exploiting two types of deterministic knowledge: 1) continuity of the demixing matrix across frequencies; 2) continuity of the time-activity of the sources. The recursive regularization propagates the statistics of the sources across frequencies reducing the effect of statistical bias and the occurrence of permutations. Experimental results on real-data show that the algorithm can successfully perform a fast separation of short signals (e.g., 0.5-1s), by estimating long demixing filters to deal with highly reverberant environments (e.g., ms).


IEEE Transactions on Audio, Speech, and Language Processing | 2012

Generalized State Coherence Transform for Multidimensional TDOA Estimation of Multiple Sources

Francesco Nesta; Maurizio Omologo

According to the physical meaning of the frequency-domain blind source separation (FD-BSS), each mixing matrix estimated by independent component analysis (ICA) contains information on the physical acoustic propagation related to each source and then can be used for localization purposes. In this paper, we analyze the Generalized State Coherence Transform (GSCT) which is a non-linear transform of the space represented by the whole demixing matrices. The transform enables an accurate estimation of the propagation time-delay of multiple sources in multiple dimensions. Furthermore, it is shown that with appropriate nonlinearities and a statistical model for the reverberation, GSCT can be considered an approximated kernel density estimator of the acoustic propagation time-delay. Experimental results confirm the good properties of the transform and its effectiveness in addressing multiple source TDOA detection (e.g., 2-D TDOA estimation of several sources with only three microphones).


international conference on acoustics speech and signal processing | 1999

Training of HMM with filtered speech material for hands-free recognition

Diego Giuliani; Marco Matassoni; Maurizio Omologo; Piergiorgio Svaizer

This paper addresses the problem of hands-free speech recognition in a noisy office environment. An array of six omnidirectional microphones and a corresponding time delay compensation module are used to provide a beamformed signal as input to a HMM-based recognizer. Training of HMMs is performed either using a clean speech database or using a filtered version of the same database. Filtering consists in a convolution with the acoustic impulse response between the speaker and microphone, to reproduce the reverberation effect. Background noise is summed to provide the desired SNR. The paper shows that the new models trained on these data perform better than the baseline ones. Furthermore, the paper investigates on maximum likelihood linear regression (MLLR) adaptation of the new models. It is shown that a further performance improvement is obtained, allowing to reach a 98.7% WRR in a connected digit recognition task, when the talker is at 1.5 m distance from the array.

Collaboration


Dive into the Maurizio Omologo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alessio Brutti

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marco Matassoni

Center for Information Technology

View shared research outputs
Top Co-Authors

Avatar

Diego Giuliani

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Fabio Brugnara

fondazione bruno kessler

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge