Simon Leglaive
Université Paris-Saclay
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Simon Leglaive.
international conference on acoustics, speech, and signal processing | 2015
Simon Leglaive; Romain Hennequin; Roland Badeau
In this paper, we propose a new method for singing voice detection based on a Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Network (RNN). This classifier is able to take a past and future temporal context into account to decide on the presence/absence of singing voice, thus using the inherent sequential aspect of a short-term feature extraction in a piece of music. The BLSTM-RNN contains several hidden layers, so it is able to extract a simple representation fitted to our task from low-level features. The results we obtain significantly outperform state-of-the-art methods on a common database.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Simon Leglaive; Roland Badeau; Gaël Richard
Incorporating prior knowledge about the sources and/or the mixture is a way to improve under-determined audio source separation performance. A great number of informed source separation techniques concentrate on taking priors on the sources into account, but fewer works have focused on constraining the mixing model. In this paper, we address the problem of underdetermined multichannel audio source separation in reverberant conditions. We target a semi-informed scenario where some room parameters are known. Two probabilistic priors on the frequency response of the mixing filters are proposed. Early reverberation is characterized by an autoregressive model while according to statistical room acoustics results, late reverberation is represented by an autoregressive moving average model. Both reverberation models are defined in the frequency domain. They aim to transcribe the temporal characteristics of the mixing filters into frequency-domain correlations. Our approach leads to a maximum a posteriori estimation of the mixing filters which is achieved thanks to the expectation-maximization algorithm. We experimentally show the superiority of this approach compared with a maximum likelihood estimation of the mixing filters.
workshop on applications of signal processing to audio and acoustics | 2015
Simon Leglaive; Roland Badeau; Gaël Richard
In this paper we show that considering early contributions of mixing filters through a probabilistic prior can help blind source separation in reverberant recording conditions. By modeling mixing filters as the direct path plus R-1 reflections, we represent the propagation from a source to a mixture channel as an autoregressive process of order R in the frequency domain. This model is used as a prior to derive a Maximum A Posteriori (MAP) estimation of the mixing filters using the Expectation-Maximization (EM) algorithm. Experimental results over reverberant synthetic mixtures and live recordings show that MAP estimation with this prior provides better separation results than a Maximum Likelihood (ML) estimation.
international conference on acoustics, speech, and signal processing | 2017
Simon Leglaive; Umut Simsekli; Antoine Liutkus; Roland Badeau; Gaël Richard
In this paper, we focus on modeling multichannel audio signals in the short-time Fourier transform domain for the purpose of source separation. We propose a probabilistic model based on a class of heavy-tailed distributions, in which the observed mixtures and the latent sources are jointly modeled by using a certain class of multivariate alpha-stable distributions. As opposed to the conventional Gaussian models, where the observations are constrained to lie just within a few standard deviations from the mean, the proposed heavy-tailed model allows us to account for spurious data or important uncertainties in the model. We develop a Monte Carlo Expectation-Maximization algorithm for inferring the sources from the proposed model. We show that our approach leads to significant performance improvements in audio source separation under corrupted mixtures and in spatial audio object coding.
international conference on acoustics, speech, and signal processing | 2017
Simon Leglaive; Roland Badeau; Gaël Richard
A great number of methods for multichannel audio source separation are based on probabilistic approaches in which the sources are modeled as latent random variables in a Time-Frequency (TF) domain. For reverberant mixtures, it is common to approximate the time-domain convolutive mixing process as being instantaneous in the short-term Fourier transform domain, under a short mixing filters assumption. The TF latent sources are then inferred from the TF mixture observations. In this paper we propose to infer the TF latent sources from the time-domain observations. This approach allows us to exactly model the convolutive mixing process. The inference procedure relies on a variational expectation-maximization algorithm. In significant reverberation conditions, our approach leads to a signal-to-distortion ratio improvement of 5.5 dB compared with the usual TF approximation of the convolutive mixing process.
european signal processing conference | 2016
Simon Leglaive; Roland Badeau; Gaël Richard
In this paper, the late part of a room response is modeled in the frequency domain as a complex Gaussian random process. The autocovariance function (ACVF) and power spectral density (PSD) are theoretically defined from the exponential decay of the late reverberation power. Furthermore we show that the ACVF and PSD are accurately parametrized by an autoregressive moving average (ARMA) model. This leads to a new generative model of late reverberation in the frequency domain. The ARMA parameters are easily estimated from the theoretical ACVF. The statistical characterization is consistent with empirical results on simulated and real data. This model could be used to incorporate priors in audio source separation and dereverberation.
european signal processing conference | 2017
Simon Leglaive; Roland Badeau; Gaël Richard
This paper addresses the problem of multichannel audio source separation in under-determined convolutive mixtures. We target a semi-blind scenario assuming that the mixing filters are known. The convolutive mixing process is exactly modeled using the time-domain impulse responses of the mixing filters. We propose a Students t time-frequency source model based on non-negative matrix factorization (NMF). The Students t distribution being heavy-tailed with respect to the Gaussian, it provides some flexibility in the modeling of the sources. We also study a simpler Students t sparse source model within the same general source separation framework. The inference procedure relies on a variational expectation-maximization algorithm. Experiments show the advantage of using an NMF model compared with the sparse source model. While the Students t NMF source model leads to slightly better results than our previous Gaussian one, we demonstrate the superiority of our method over two other approaches from the literature.
international conference on acoustics, speech, and signal processing | 2018
Umut Simsekli; Halil Erdogan; Simon Leglaive; Antoine Liutkus; Roland Badeau; Gaël Richard
workshop on applications of signal processing to audio and acoustics | 2017
Simon Leglaive; Roland Badeau; Gaël Richard
Colloque GRETSI | 2015
Simon Leglaive; Roland Badeau; Gaël Richard