Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Simone Cifani is active.

Publication


Featured researches published by Simone Cifani.


Journal of Electrical and Computer Engineering | 2010

Comparative evaluation of single-channel MMSE-Based noise reduction schemes for speech recognition

Simone Cifani; Rudy Rotili; Stefano Squartini; Francesco Piazza

One of the big challenges in the field of Automatic Speech Recognition (ASR) consists in developing suitable solutions able to work properly also in adverse acoustic conditions, like in presence of additive noise and/or in reverberant rooms. Recently a certain attention has been paid to deeply integrate the noise suppressor in the feature extraction pipeline. In this paper, different single-channel MMSE-based noise reduction schemes have been implemented both in the frequency and cepstral domains and the related recognition performances evaluated on the AURORA2 and AURORA4 databases, therefore providing a useful reference for the scientific community.


conference on human system interactions | 2009

Keyword spotting based system for conversation fostering in tabletop scenarios: Preliminary evaluation

Simone Cifani; Cesare Rocchi; Stefano Squartini; Francesco Piazza

In this paper, a speech-interfaced system for fostering group conversations is proposed. The system captures conversation keywords, and shows visual stimuli in a tabletop display. A stimulus can be a feedback to the current conversation or a cue to discuss new topics. This work briefly describes the overall system architecture, and then explains details about the design choices and implementation of the Speech Processing Unit (SPU). The performance of the keyword spotter, part of the SPU, is measured in experiments carried out in a realistic scenario. Results demonstrate that the proposed SPU is suitable for tracking conversation topics and so fostering group conversations by visualizing appropriate stimuli.


2008 Hands-Free Speech Communication and Microphone Arrays | 2008

A Multichannel Noise Reduction Front-End Based on Psychoacoustics for Robust Speech Recognition in Highly Noisy Environments

Simone Cifani; Cesare Rocchi; Stefano Squartini; Francesco Piazza

Microphone array systems, due to their spatial filtering capability, usually overcome the traditional mono approaches in noise reduction. Moreover, the employment of psychoacoustically motivated speech enhancement schemes typically allows to achieve a good balance between noise reduction and speech distortion. This drove some of the authors to merge the two advantageous aspects into a unique solution, allowing to achieve relevant performances in terms of enhanced speech quality in a wide range of operating conditions. Now, in this paper, the objective is assessing the effectiveness of the approach when applied as Noise Reduction Front-end to an Automatic Speech Recognition system working in adverse acoustic environments. Some computer simulations have been carried out and they show that a significant improvement of recognition rate is registered when such front-end is used, also w.r.t. the performances achievable when another Multichannel Noise Reduction architecture, not based on psychoacoustics concepts, is adopted on purpose.


international conference on intelligent computing | 2010

Joint Multichannel Blind Speech Separation and Dereverberation: A Real-Time Algorithmic Implementation

Rudy Rotili; Claudio De Simone; Alessandro Perelli; Simone Cifani; Stefano Squartini

Blind source separation (BSS) and dereverberation have been deeply investigated due to their importance in many applications, as in image and audio processing. A two-stage approach leading to a sequential source separation and speech dereverberation algorithm based on blind channel identification (BCI) has recently appeared in literature and taken here as reference. In this contribution, a real-time implementation of the aforementioned approach is presented. The optimum inverse filtering algorithm based on the Bezout’s Theorem and used in the dereverberation stage has been substituted with an iterative technique, which is computationally more efficient and allows the inversion of long impulse responses in real-time applications. The entire framework works in frequency domain and the NU-Tech software platform has been used on purpose for real-time simulations.


asia pacific conference on circuits and systems | 2008

A robust iterative inverse filtering approach for speech dereverberation in presence of disturbances

Rudy Rotili; Simone Cifani; Stefano Squartini; Francesco Piazza

In the present work the inverse filtering problem for speech dereverberation in stationary conditions is addressed. In particular we consider the presence of multiple observables which has a beneficial impact of on room transfer functions (RTFs) invertibility. In actual acoustic environments the assumed knowledge of RTFs is usually altered by the presence of disturbances under the form of additive noise or RTF fluctuations, inevitably resulting in reduced inverse filtering performances. Several approaches, mainly based on regularization theory, have appeared in the literature to face such a problem. Among them, a recent study has shown the dereverberation capabilities dependence on some design parameters, significantly related to the filter energy. In this paper such interesting work is taken as reference and its optimum inverse filtering approach substituted with an iterative technique, which is typically much more computationally efficient. As proved by results obtained through the several computer simulations carried out, such an algorithm has revealed to be more robust w.r.t. the reference counterpart in terms of regularization parameter variations.


international symposium on circuits and systems | 2010

Robust speech recognition using feature-domain multi-channel bayesian estimators

Rudy Rotili; Simone Cifani; Lorenzo Marinelli; Stefano Squartini; Francesco Piazza

This paper proposes innovative multi-channel bayesian estimators in the feature-domain for robust speech recognition. Both minimum-mean-squared-error (MMSE) and maximum-a-posteriori (MAP) criteria have been explored: the related algorithms extend the multi-channel frequency-domain counterparts and generalize the single-channel feature-domain MMSE solution, recently appeared in the literature. Computer simulations conducted on a modified AURORA2 database show the efficacy of the frequency-domain multi-channel estimators when used as a pre-processing stage of a speech recognition engine, and that the proposed multi-channel MAP approach outperforms single-channel estimators by at least 3 % on average.


Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions | 2009

An Investigation into Audiovisual Speech Correlation in Reverberant Noisy Environments

Simone Cifani; Andrew Abel; Amir Hussain; Stefano Squartini; Francesco Piazza

As evidence of a link between the various human communication production domains has become more prominent in the last decade, the field of multimodal speech processing has undergone significant expansion. Many different specialised processing methods have been developed to attempt to analyze and utilize the complex relationship between multimodal data streams. This work uses information extracted from an audiovisual corpus to investigate and assess the correlation between audio and visual features in speech. A number of different feature extraction techniques are assessed, with the intention of identifying the visual technique that maximizes the audiovisual correlation. Additionally, this paper aims to demonstrate that a noisy and reverberant audio environment reduces the degree of audiovisual correlation, and that the application of a beamformer remedies this. Experimental results, carried out in a synthetic scenario, confirm the positive impact of beamforming not only for improving the audio-visual correlation but also in a complete audio-visual speech enhancement scheme. Thus, this work inevitably highlights an important aspect for the development of future promising bimodal speech enhancement systems.


Archive | 2011

Multi-channel Feature Enhancement for Robust Speech Recognition

Rudy Rotili; Simone Cifani; Francesco Piazza; Stefano Squartini

In the last decades, a great deal of research has been devoted to extending our capacity of verbal communication with computers through automatic speech recognition (ASR). Although optimum performance can be reached when the speech signal is captured close to the speaker’s mouth, there are still obstacles to overcome in making reliable distant speech recognition (DSR) systems. The two major sources of degradation in DSR are distortions, such as additive noise and reverberation. This implies that speech enhancement techniques are typically required to achieve best possible signal quality. Different methodologies have been proposed in literature for environment robustness in speech recognition over the past two decades (Gong (1995); Hussain, Chetouani, Squartini, Bastari & Piazza (2007)). Two main classes can be identified (Li et al. (2009)). The first class encompasses the so called model-based techniques, which operate on the acoustic model to adapt or adjust its parameters so that the system fits better the distorted environment. The most popular of such techniques are multi-style training (Lippmann et al. (2003)), parallel model combination (PMC) (Gales & Young (2002)) and the vector Taylor series (VTS) model adaptation (Moreno (1996)). Although model-based techniques obtain excellent results, they require heavy modifications to the decoding stage and, in most cases, a greater computational burden. Conversely, the second class directly enhances the speech signal before it is presented to the recognizer, and show some significant advantages with respect to the previous class:


COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours | 2007

A novel psychoacoustically motivated multichannel speech enhancement system

Amir Hussain; Simone Cifani; Stefano Squartini; Francesco Piazza; Tariq S. Durrani

The ubiquitous noise reduction / speech enhancement problem has gained an increasing interest in recent years. This is due both to progress made by microphone-array systems and to the successful introduction of perceptual models. In the last decade, several methods incorporating psychoacoustic criteria in single channel speech enhancement systems have been proposed, however very few works exploit these features in the multichannel case. In this paper we present a novel psychoacoustically motivated, multichannel speech enhancement system that exploits spatial information and psychoacoustic concepts. The proposed framework offers enhanced flexibility allowing for a multitude of perceptually-based post-filtering solutions. Moreover, the system has been devised on a frame-by-frame basis to facilitate real-time implementation. Objective performance measures and informal subjective listening tests for the case of speech signals corrupted with real car and F-16 cockpit noise demonstrate enhanced performance of the proposed speech enhancement system in terms of musical residual noise reduction compared to conventional multichannel techniques.


2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis | 2009

A PEM-AFROW based algorithm for acoustic feedback control in automotive speech reinforcement systems

Simone Cifani; L. C. Montesi; Rudy Rotili; Stefano Squartini; Francesco Piazza

Developing performing speech reinforcement systems to improve the intra-cabin communication quality among car passengers in different row seats, typically degraded by the distance between speakers (for instance in SUV and mini-van) and the noise presence within the cockpit, has represented a challenging issue within the related scientific community. One of the main problem to solve in this scenario is the reduction of the electroacoustic coupling between louds and mics in order to avoid the system reaching instability (howling), namely acoustic feedback cancellation (AFC). One of the most performing technique for AFC is the PEM-AFROW approach, recently appeared in the literature. In this work, we propose an innovative feedback suppressor scheme based on the PEM-AFROW concept, which allows to achieve a valuable balance of feedback reduction, maximum stable gain values and overall sound quality. Results obtained by computer simulations in the dual channel communication scenario confirm the effectiveness of the idea.

Collaboration


Dive into the Simone Cifani's collaboration.

Top Co-Authors

Avatar

Stefano Squartini

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Francesco Piazza

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Rudy Rotili

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Cesare Rocchi

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alessio Pignotti

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Daniele Marcozzi

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Andrew Abel

University of Stirling

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

L. C. Montesi

Marche Polytechnic University

View shared research outputs
Researchain Logo
Decentralizing Knowledge