Ngoc Q. K. Duong
Technicolor
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ngoc Q. K. Duong.
EURASIP Journal on Advances in Signal Processing | 2013
Ngoc Q. K. Duong; Emmanuel Vincent; Rémi Gribonval
We consider the Gaussian framework for reverberant audio source separation, where the sources are modeled in the time-frequency domain by their short-term power spectra and their spatial covariance matrices. We propose two alternative probabilistic priors over the spatial covariance matrices which are consistent with the theory of statistical room acoustics and we derive expectation-maximization algorithms for maximum a posteriori (MAP) estimation. We argue that these algorithms provide a statistically principled solution to the permutation problem and to the risk of overfitting resulting from conventional maximum likelihood (ML) estimation. We show experimentally that in a semi-informed scenario where the source positions and certain room characteristics are known, the MAP algorithms outperform their ML counterparts. This opens the way to rigorous statistical treatment of this family of models in other scenarios in the future.
international workshop on machine learning for signal processing | 2013
Luc Le Magoarou; Alexey Ozerov; Ngoc Q. K. Duong
We consider a single-channel source separation problem consisting in separating speech from nonstationary background such as music. We introduce a novel approach called text-informed separation, where the source separation process is guided by the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the nonnegative matrix partial co-factorization (NMPCF) model based on a so called excitation-filter-channel speech model. The proposed NMPCF model allows sharing the linguistic information between the example speech and the speech in the mixture. We then derive the corresponding multiplicative update (MU) rules for the parameter estimation. Experimental results over different types of mixtures and speech examples show the effectiveness of the proposed approach.
international conference on consumer electronics berlin | 2012
Ngoc Q. K. Duong; Christopher Howson; Yvon Legallais
For the implementation of emerging second screen TV applications, there is a need for a technique to assure fast and accurate synchronization of media components streamed over different networks to different rendering devices. One approach of great value is to exploit the unmodified audio stream of the original media, and compare it to a reference version. We consider two major approaches for this purpose, namely finger-printing techniques and generalized cross correlation, where the former can greatly reduce computational cost and the latter can offer sample-accurate synchronization. We propose an approach combining these two techniques where coarse frame-accurate synchronization positions are first found by fingerprint matching, then a possible accurate synchronization position is verified by generalized cross correlation with phase transform (GCC-PHAT). Experimental results in a real-world setting confirm the accuracy and rapidity of the proposed approach.
signal processing systems | 2015
Luc Le Magoarou; Alexey Ozerov; Ngoc Q. K. Duong
The so-called informed audio source separation, where the separation process is guided by some auxiliary information, has recently attracted a lot of research interest since classical blind or non-informed approaches often do not lead to satisfactory performances in many practical applications. In this paper we present a novel text-informed framework in which a target speech source can be separated from the background in the mixture using the corresponding textual information. First, given the text, we propose to produce a speech example via either a speech synthesizer or a human. We then use this example to guide source separation and, for that purpose, we introduce a new variant of the non-negative matrix partial co-factorization (NMPCF) model based on a so-called excitation-filter-channel speech model. Such a modeling allows sharing the linguistic information between the speech example and the speech in the mixture. The corresponding multiplicative update (MU) rules are eventually derived for the parameters estimation and several extensions of the model are proposed and investigated. We perform extensive experiments to assess the effectiveness of the proposed approach in terms of source separation and alignment performance.
international workshop on machine learning for signal processing | 2014
Dalia El Badawy; Ngoc Q. K. Duong; Alexey Ozerov
This paper addresses the challenging task of single channel audio source separation. We introduce a novel concept of on-the-fly audio source separation which greatly simplifies the users interaction with the system compared to the state-of-the-art user-guided approaches. In the proposed framework, the user is only asked to listen to an audio mixture and type some keywords (e.g. “dog barking”, “wind”, etc.) describing the sound sources to be separated. These keywords are then used as text queries to search for audio examples from the internet to guide the separation process. In particular, we propose several approaches to efficiently exploit these retrieved examples, including an approach based on a generic spectral model with group sparsity-inducing constraints. Finally, we demonstrate the effectiveness of the proposed framework with mixtures containing various types of sounds.
international conference on acoustics, speech, and signal processing | 2014
Ngoc Q. K. Duong; Alexey Ozerov; Louis Chevallier; Joel Sirot
Though audio source separation offers a wide range of applications in audio enhancement and post-production, its performance has yet to reach the satisfactory especially for single-channel mixtures with limited training data. In this paper we present a novel interactive source separation framework that allows end-users to provide feedback at each separation step so as to gradually improve the result. For this purpose, a prototype graphical user interface (GUI) is developed to help users annotating time-frequency regions where a source can be labeled as either active, inactive, or well-separated within the displayed spectrogram. This user feedback information, which is partially new with respect to the state-of-the-art annotations, is then taken into account in a proposed uncertainty-based learning algorithm to constraint the source estimates in next separation step. The considered framework is based on non-negative matrix factorization and is shown to be effective even without using any isolated training data.
international conference on acoustics, speech, and signal processing | 2013
Ngoc Q. K. Duong; Franck Thudor
This paper addresses movie synchronization, i.e. synchronizing multiple versions of the same movie, with an objective of automatically transferring metadata available on a reference version to other ones. We first exploit audio tracks associated with two different versions and adapt an existing audio fingerprinting technique to find all temporal matching positions between them. We then propose additional steps to refine the match and eliminate outliers. The proposed approach can efficiently handle situations where temporal scene edits occur like scene addition, removal, and even the challenging scene re-ordering case. Experimental results over synthetic editorial data show the effectiveness of the proposed approach with respect to the state-of-the-art dynamic time warping (DTW) based solution.
international conference on acoustics, speech, and signal processing | 2017
Sanjeel Parekh; Slim Essid; Alexey Ozerov; Ngoc Q. K. Duong; Patrick Pérez; Gaël Richard
In this paper we tackle the problem of single channel audio source separation driven by descriptors of the sounding objects motion. As opposed to previous approaches, motion is included as a soft-coupling constraint within the nonnegative matrix factorization framework. The proposed method is applied to a multimodal dataset of instruments in string quartet performance recordings where bow motion information is used for separation of string instruments. We show that the approach offers better source separation result than an audio-based baseline and the state-of-the-art multimodal-based approaches on these very challenging music mixtures.
international conference on consumer electronics berlin | 2014
Ngoc Q. K. Duong; Alexey Ozerov; Louis Chevallier
We consider an emerging user-guided audio source separation approach based on the temporal annotation of the source activity along the mixture. In this baseline algorithm nonnegative matrix factorization (NMF) is usually used as spectral model for audio sources. In this paper we propose two weighting strategies incorporated in the NMF formulation so as to better exploit the annotation. We then derive the corresponding multiplicative update (MU) rules for the parameter estimation. The proposed approach was objectively evaluated within the fourth community-based Signal Separation Evaluation Campaign (SiSEC 2013) and shown to outperform the baseline algorithm, while obtaining comparable result to some other state-of-the-art methods.
international conference on acoustics, speech, and signal processing | 2015
Dalia El Badawy; Alexey Ozerov; Ngoc Q. K. Duong
We consider dictionary-based signal decompositions with group sparsity, a variant of structured sparsity. We point out that the group sparsity-inducing constraint alone may not be sufficient in some cases when we know that some bigger groups or so-called supergroups cannot vanish completely. To deal with this problem we introduce the notion of relative group sparsity preventing the supergroups from vanishing. In this paper we formulate practical criteria and algorithms for relative group sparsity as applied to non-negative matrix factorization and investigate its potential benefit within the on-the-fly audio source separation framework we recently introduced. Experimental evaluation shows that the proposed relative group sparsity leads to performance improvement over group sparsity in both supervised and semi-supervised on-the-fly audio source separation settings.