Francesco Nesta
fondazione bruno kessler
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Francesco Nesta.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Francesco Nesta; Piergiorgio Svaizer; Maurizio Omologo
This paper proposes a new method of frequency-domain blind source separation (FD-BSS), able to separate acoustic sources in challenging conditions. In frequency-domain BSS, the time-domain signals are transformed into time-frequency series and the separation is generally performed by applying independent component analysis (ICA) at each frequency envelope. When short signals are observed and long demixing filters are required, the number of time observations for each frequency is limited and the variance of the ICA estimator increases due to the intrinsic statistical bias. Furthermore, common methods used to solve the permutation problem fail, especially with sources recorded under highly reverberant conditions. We propose a recursively regularized implementation of the ICA (RR-ICA) that overcomes the mentioned problem by exploiting two types of deterministic knowledge: 1) continuity of the demixing matrix across frequencies; 2) continuity of the time-activity of the sources. The recursive regularization propagates the statistics of the sources across frequencies reducing the effect of statistical bias and the occurrence of permutations. Experimental results on real-data show that the algorithm can successfully perform a fast separation of short signals (e.g., 0.5-1s), by estimating long demixing filters to deal with highly reverberant environments (e.g., ms).
international conference on latent variable analysis and signal separation | 2012
Shoko Araki; Francesco Nesta; Emmanuel Vincent; Zbyněk Koldovský; Guido Nolte; Andreas Ziehe; Alexis Benichoux
This paper summarizes the audio part of the 2011 community-based Signal Separation Evaluation Campaign (SiSEC2011). Four speech and music datasets were contributed, including datasets recorded in noisy or dynamic environments and a subset of the SiSEC2010 datasets. The participants addressed one or more tasks out of four source separation tasks, and the results for each task were evaluated using different objective performance criteria. We provide an overview of the audio datasets, tasks and criteria. We also report the results achieved with the submitted systems, and discuss organization strategies for future campaigns.
IEEE Transactions on Audio, Speech, and Language Processing | 2012
Francesco Nesta; Maurizio Omologo
According to the physical meaning of the frequency-domain blind source separation (FD-BSS), each mixing matrix estimated by independent component analysis (ICA) contains information on the physical acoustic propagation related to each source and then can be used for localization purposes. In this paper, we analyze the Generalized State Coherence Transform (GSCT) which is a non-linear transform of the space represented by the whole demixing matrices. The transform enables an accurate estimation of the propagation time-delay of multiple sources in multiple dimensions. Furthermore, it is shown that with appropriate nonlinearities and a statistical model for the reverberation, GSCT can be considered an approximated kernel density estimator of the acoustic propagation time-delay. Experimental results confirm the good properties of the transform and its effectiveness in addressing multiple source TDOA detection (e.g., 2-D TDOA estimation of several sources with only three microphones).
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Francesco Nesta; Ted S. Wada; Biing-Hwang Juang
Semi-blind source separation (SBSS) is a special case of the well-known blind source separation (BSS) when some partial knowledge of the source signals is available to the system. In particular, a batch adaptation in the frequency domain based on independent component analysis (ICA) can be effectively used to jointly perform source separation and multichannel acoustic echo cancellation (MCAEC) through SBSS without double-talk detection. Many issues related to the implementation of an SBSS system are discussed in this paper. After a deep analysis of the structure of the SBSS adaptation, we propose a constrained batch-online implementation that stabilizes the convergence behavior even in the worst case scenario of a single far-end talker along with the non-uniqueness condition on the far-end mixing system. Specifically, a matrix constraint is proposed to reduce the effect of the non-uniqueness problem caused by highly correlated far-end reference signals during MCAEC. Experimental results show that high echo cancellation can be achieved just as the misalignment remains relatively low without any preprocessing procedure to decorrelate the far-end signals even for the single far-end talker case.
international conference on latent variable analysis and signal separation | 2012
Francesco Nesta; Maurizio Omologo
This paper presents a novel method for underdetermined acoustic source separation of convolutive mixtures. Multiple complex-valued Independent Component Analysis adaptations jointly estimate the mixing matrix and the temporal activities of multiple sources in each frequency. A structure based on a recursive temporal weighting of the gradient enforces each ICA adaptation to estimate mixing parameters related to sources having a disjoint temporal activity. Permutation problem is reduced imposing a multiresolution spatio-temporal correlation of the narrow-band components. Finally, aligned mixing parameters are used to recover the sources through L0 -norm minimization and a post-processing based on a single channel Wiener filtering. Promising results obtained over a public dataset show that the proposed method is an effective solution to the underdetermined source separation problem.
Computer Speech & Language | 2013
Francesco Nesta; Marco Matassoni
This paper proposes and describes a complete system for Blind Source Extraction (BSE). The goal is to extract a target signal source in order to recognize spoken commands uttered in reverberant and noisy environments, and acquired by a microphone array. The architecture of the BSE system is based on multiple stages: (a) TDOA estimation, (b) mixing system identification for the target source, (c) on-line semi-blind source separation and (d) source extraction. All the stages are effectively combined, allowing the estimation of the target signal with limited distortion. While a generalization of the BSE framework is described, here the proposed system is evaluated on the data provided for the CHiME Pascal 2011 competition, i.e. binaural recordings made in a real-world domestic environment. The CHiME mixtures are processed with the BSE and the recovered target signal is fed to a recognizer, which uses noise robust features based on Gammatone Frequency Cepstral Coefficients. Moreover, acoustic model adaptation is applied to further reduce the mismatch between training and testing data and improve the overall performance. A detailed comparison between different models and algorithmic settings is reported, showing that the approach is promising and the resulting system gives a significant reduction of the error rate.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Zbynek Koldovsky; Jiri Malek; Petr Tichavsky; Francesco Nesta
An extracted noise signal provides important information for subsequent enhancement of a target signal. When the targets position is fixed, the noise extractor could be a target-cancellation filter derived in a noise-free situation. In this paper we consider a situation when such cancellation filters are prepared for a set of several possible positions of the target in advance. The set of filters is interpreted as prior information available for the noise extraction when the targets exact position is unknown. Our novel method looks for a linear combination of the prepared filters via Independent Component Analysis. The method yields a filter that has a better cancellation performance than the individual filters or filters based on a minimum variance principle. The method is tested in a highly noisy and reverberant real-world environment with moving target source and interferers. A post-processing by Wiener filter using the noise signal extracted by the method is able to improve signal-to-noise ratio of the target by up to 8 dB.
international conference on acoustics, speech, and signal processing | 2010
Francesco Nesta; Maurizio Omologo
During the last decade, distributed microphone arrays have been proposed in order to increase accuracy and spatial coverage of speaker localization systems operating in large and reverberant rooms. In principle, the framework provided by a distributed microphone network can also be applied effectively when using Blind Source Separation (BSS). Separation is commonly performed by processing the signals sampled at closely spaced microphones in a single adaptation step, for example by means of Independent Component Analysis (ICA). When the microphone spacing or the distance between source and microphones increase, the separation performance reduces due to spatial aliasing effects and to a reduced spatial coherence at microphones. In this paper we propose a new method, here referred to as Cooperative Wiener ICA (CW-ICA), which is able to apply BSS to signals acquired by a network of distributed microphone arrays. Different ICA adaptations are applied to the signals recorded by each array and are interconnected in order to constrain each adaptation to converge to a solution related to the same physical interpretation. A preliminary analysis on a network of two arrays shows that the proposed method can be applied successfully to source separation and localization tasks.
international conference on acoustics, speech, and signal processing | 2012
Francesco Nesta; Maurizio Omologo
The Steered Response Power with PHAT transform (SRP-PHAT) or Global Coherence Field (GCF), has become a standard method for acoustic source localization, thanks to their simplicity, computational inexpensiveness and robustness against mid-high reverberation. However, originally formulated for the single source localization case, it does not apply satisfactorily to the multiple source case. In this paper, we analyze the structure of the spatial function and reshape it according to a generic multidimensional metric. We show that traditional functions are based on the L1 norm which is prone to generate ambiguous locations with high likelihood (i.e. ghosts). A more generic multidimensional kernel based on higher norms and on a partitioned representation of the cross-power spectrum is introduced, which better exploits the source sparseness in the discrete time-frequency domain. Evaluation results over simulated data show that the new spatial functions considerably improve the detection of multiple competing sources in both spatial and multidimensional TDOA domains.
international workshop on machine learning for signal processing | 2008
Francesco Nesta; Maurizio Omologo; Piergiorgio Svaizer
A novel method to solve the permutation problem for Blind Source Separation (BSS) is presented. According to the acoustic propagation model, in frequency-domain, each separation matrix can be represented with a set of states associated with each source. We formulate a novel transform of the states which is independent of the aliasing and of the permutations since states belonging to all the sources are exploited at the same time. The estimated TDOAs are used to model the propagation of the acoustic wave and to cluster all the frequency components associated to the same source. Experimental results show that the novel approach can be applied to localize and separate sources in challenging situations: two sources have been separated estimating long demixing filters (0.25-0.5s) using widely spaced microphones (0.25 m) in reverberant environment (T60 = 700 ms) and using very short signals (0.5-1 s).