Roland Badeau
Université Paris-Saclay
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Roland Badeau.
international conference on acoustics, speech, and signal processing | 2016
Paul Magron; Roland Badeau; Bertrand David
Nonnegative Matrix Factorization (NMF) is a powerful tool for decomposing mixtures of audio signals in the Time-Frequency (TF) domain. In the source separation framework, the phase recovery for each extracted component is necessary for synthesizing time-domain signals. The Complex NMF (CNMF) model aims to jointly estimate the spectrogram and the phase of the sources, but requires to constrain the phase in order to produce satisfactory sounding results. We propose to incorporate phase constraints based on signal models within the CNMF framework: a phase unwrapping constraint that enforces a form of temporal coherence, and a constraint based on the repetition of audio events, which models the phases of the sources within onset frames. We also provide an algorithm for estimating the model parameters. The experimental results highlight the interest of including such constraints in the CNMF framework for separating overlapping components in complex audio mixtures.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Derry Fitzgerald; Antoine Liutkus; Roland Badeau
We propose a method to unmix multichannel audio signals into their different constitutive spatial objects. To achieve this, we characterize an audio object through both a spatial and a spectro-temporal modeling. The particularity of the spatial model we pick is that it neither assumes an object has only one underlying source point, nor does it attempt to model the complex room acoustics. Instead, it focuses on a listener perspective, and takes each object as the superposition of many contributions with different incoming directions and interchannel delays. Our spectro-temporal probabilistic model is based on the recently proposed α-harmonisable processes, which are adequate for signals with large dynamics, such as audio. Then, the main originality of this paper is to provide a new way to estimate and exploit interchannel dependences of an object for the purpose of demixing. In the Gaussian α = 2 case, previous research focused on covariance structures. This approach is no longer valid for α <; 2 where covariances are not defined. Instead, we show how simple linear combinations of the mixture channels can be used to learn the model parameters, and the method we propose consists in pooling the estimates based on many projections to correctly account for the original multichannel audio. Intuitively, each such downmix of the mixture provides a new perspective where some objects are canceled or enhanced. Finally, we also explain how to recover the different spatial audio objects when all parameters have been computed. Performance of the method is illustrated on the separation of stereophonic music signals.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Simon Leglaive; Roland Badeau; Gaël Richard
Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account, but fewer works have focused on constraining the mixing model. In this paper, we address the problem of underdetermined multichannel audio source separation in reverberant conditions. We target a semi-informed scenario where some room parameters are known. Two probabilistic priors on the frequency response of the mixing filters are proposed. Early reverberation is characterized by an autoregressive model while according to statistical room acoustics results, late reverberation is represented by an autoregressive moving average model. Both reverberation models are defined in the frequency domain. They aim to transcribe the temporal characteristics of the mixing filters into frequency-domain correlations. Our approach leads to a maximum a posteriori estimation of the mixing filters which is achieved thanks to the expectation-maximization algorithm. We experimentally show the superiority of this approach compared with a maximum likelihood estimation of the mixing filters.
international conference on acoustics, speech, and signal processing | 2017
Paul Magron; Roland Badeau; Bertrand David
Phase reconstruction of complex components in the time-frequency domain is a challenging but necessary task for audio source separation. While traditional approaches do not exploit phase constraints that originate from signal modeling, some prior information about the phase can be obtained from sinusoidal modeling. In this paper, we introduce a probabilistic mixture model which allows us to incorporate such phase priors within a source separation framework. While the magnitudes are estimated beforehand, the phases are modeled by Von Mises random variables whose location parameters are the phase priors. We then approximate this non-tractable model by an anisotropic Gaussian model, in which the phase dependencies are preserved. This enables us to derive an MMSE estimator of the sources which optimally combines Wiener filtering and prior phase estimates. Experimental results highlight the potential of incorporating phase priors into mixture models for separating overlapping components in complex audio mixtures.
international conference on acoustics, speech, and signal processing | 2017
Simon Leglaive; Umut Simsekli; Antoine Liutkus; Roland Badeau; Gaël Richard
In this paper, we focus on modeling multichannel audio signals in the short-time Fourier transform domain for the purpose of source separation. We propose a probabilistic model based on a class of heavy-tailed distributions, in which the observed mixtures and the latent sources are jointly modeled by using a certain class of multivariate alpha-stable distributions. As opposed to the conventional Gaussian models, where the observations are constrained to lie just within a few standard deviations from the mean, the proposed heavy-tailed model allows us to account for spurious data or important uncertainties in the model. We develop a Monte Carlo Expectation-Maximization algorithm for inferring the sources from the proposed model. We show that our approach leads to significant performance improvements in audio source separation under corrupted mixtures and in spatial audio object coding.
international conference on acoustics, speech, and signal processing | 2016
Derry Fitzgerald; Antoine Liutkus; Roland Badeau
We propose a projection-based method for the unmixing of multichannel audio signals into their different constituent spatial objects. Here, spatial objects are modelled using a unified framework which handles both point sources and diffuse sources. We then propose a novel methodology to estimate and take advantage of the spatial dependencies of an object. Where previous research has processed the original multichannel mixtures directly and has been principally focused on the use of inter-channel covariance structures, here we instead process projections of the multichannel signal on many different spatial directions. These linear combinations consist of observations where some spatial objects are cancelled or enhanced. We then propose an algorithm which takes these projections as the observations, discarding dependencies between them. Since each one contains global information regarding all channels of the original multichannel mixture, this provides an effective means of learning the parameters of the original audio, while avoiding the need for joint-processing of all the channels. We further show how to recover the separated spatial objects and demonstrate the use of the technique on stereophonic music signals.
international conference on acoustics, speech, and signal processing | 2017
Simon Leglaive; Roland Badeau; Gaël Richard
A great number of methods for multichannel audio source separation are based on probabilistic approaches in which the sources are modeled as latent random variables in a Time-Frequency (TF) domain. For reverberant mixtures, it is common to approximate the time-domain convolutive mixing process as being instantaneous in the short-term Fourier transform domain, under a short mixing filters assumption. The TF latent sources are then inferred from the TF mixture observations. In this paper we propose to infer the TF latent sources from the time-domain observations. This approach allows us to exactly model the convolutive mixing process. The inference procedure relies on a variational expectation-maximization algorithm. In significant reverberation conditions, our approach leads to a signal-to-distortion ratio improvement of 5.5 dB compared with the usual TF approximation of the convolutive mixing process.
international workshop on acoustic signal enhancement | 2016
Arthur Belhomme; Yves Grenier; Roland Badeau; Eric Humbert
Most dereverberation methods aim to reconstruct the anechoic magnitude spectrogram, given a reverberant signal. Regardless of the method, the dereverberated signal is systematically synthesized with the reverberant phase. This corrupted phase reintroduces reverberation and distortion in the signal. This is why we intend to also reconstruct the anechoic phase, given a reverberant signal. Before processing speech signals, we propose in this paper a method for estimating the anechoic phase of reverberant chirp signals. Our method presents an accurate estimation of the instantaneous phase and improves objective measures of dereverberation.
international conference on acoustics, speech, and signal processing | 2016
Umut Simcekli; Roland Badeau; Gaël Richard; Ali Taylan Cemgil
Model selection is a central topic in Bayesian machine learning, which requires the estimation of the marginal likelihood of the data under the models to be compared. During the last decade, conventional model selection methods have lost their charm as they have high computational requirements. In this study, we propose a computationally efficient model selection method by integrating ideas from Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) literature and statistical physics. As opposed to conventional methods, the proposed method has very low computational needs and can be implemented almost without modifying existing SG-MCMC code. We provide an upper-bound for the bias of the proposed method. Our experiments show that, our method is 40 times as fast as the baseline method on finding the optimal model order in a matrix factorization problem.
international conference on latent variable analysis and signal separation | 2017
Mathieu Fontaine; Charles Vanwynsberghe; Antoine Liutkus; Roland Badeau
We propose a probabilistic model for acoustic source localization with known but arbitrary geometry of the microphone array. The approach has several features. First, it relies on a simple nearfield acoustic model for wave propagation. Second, it does not require the number of active sources. On the contrary, it produces a heat map representing the energy of a large set of candidate locations, thus imaging the acoustic field. Second, it relies on a heavy-tail (alpha )-stable probabilistic model, whose most important feature is to yield an estimation strategy where the multichannel signals need to be processed only once in a simple online procedure, called sketching. This sketching produces a fixed-sized representation of the data that is then analyzed for localization. The resulting algorithm has a small computational complexity and in this paper, we demonstrate that it compares favorably with state of the art for localization in realistic simulations of reverberant environments.