Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Alexey Ozerov is active.

Publication


Featured researches published by Alexey Ozerov.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Multichannel Nonnegative Matrix Factorization in Convolutive Mixtures for Audio Source Separation

Alexey Ozerov; Cédric Févotte

We consider inference in a general data-driven object-based model of multichannel audio data, assumed generated as a possibly underdetermined convolutive mixture of source signals. We work in the short-time Fourier transform (STFT) domain, where convolution is routinely approximated as linear instantaneous mixing in each frequency band. Each source STFT is given a model inspired from nonnegative matrix factorization (NMF) with the Itakura-Saito divergence, which underlies a statistical model of superimposed Gaussian components. We address estimation of the mixing and source parameters using two methods. The first one consists of maximizing the exact joint likelihood of the multichannel data using an expectation-maximization (EM) algorithm. The second method consists of maximizing the sum of individual likelihoods of all channels using a multiplicative update algorithm inspired from NMF methodology. Our decomposition algorithms are applied to stereo audio source separation in various settings, covering blind and supervised separation, music and speech sources, synthetic instantaneous and convolutive mixtures, as well as professionally produced music recordings. Our EM method produces competitive results with respect to state-of-the-art as illustrated on two tasks from the international Signal Separation Evaluation Campaign (SiSEC 2008).


IEEE Transactions on Audio, Speech, and Language Processing | 2012

A General Flexible Framework for the Handling of Prior Information in Audio Source Separation

Alexey Ozerov; Emmanuel Vincent; Frédéric Bimbot

Most audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this paper, we introduce a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via user-specifiable constraints. While this framework generalizes several existing audio source separation methods, it also allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the model structure and constraints, explaining its generality, and summarizing its algorithmic implementation using a generalized expectation-maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation problems. We have released a software tool named Flexible Audio Source Separation Toolbox (FASST) implementing a baseline version of the framework in Matlab.


IEEE Transactions on Audio, Speech, and Language Processing | 2007

Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

Alexey Ozerov; Pierrick Philippe; Frédéric Bimbot; Rémi Gribonval

Probabilistic approaches can offer satisfactory solutions to source separation with a single channel, provided that the models of the sources match accurately the statistical properties of the mixed signals. However, it is not always possible to train such models. To overcome this problem, we propose to resort to an adaptation scheme for adjusting the source models with respect to the actual properties of the signals observed in the mix. In this paper, we introduce a general formalism for source model adaptation which is expressed in the framework of Bayesian models. Particular cases of the proposed approach are then investigated experimentally on the problem of separating voice from music in popular songs. The obtained results show that an adaptation scheme can improve consistently and significantly the separation performance in comparison with nonadapted models.


Signal Processing | 2012

Multi-source TDOA estimation in reverberant audio using angular spectra and clustering

Charles Blandin; Alexey Ozerov; Emmanuel Vincent

We consider the problem of estimating the time differences of arrival (TDOAs) of multiple sources from a two-channel reverberant audio signal. While several clustering-based or angular spectrum-based methods have been proposed in the literature, only relatively small-scale experimental evaluations restricted to either category of methods have been carried out so far. We design and conduct the first large-scale experimental evaluation of these methods and investigate a two-step procedure combining angular spectra and clustering. In addition, we introduce and evaluate five new TDOA estimation methods inspired from signal-to-noise-ratio (SNR) weighting and probabilistic multi-source modeling techniques that have been successful for anechoic TDOA estimation and audio source separation. For 5cm microphone spacing, the best TDOA estimation performance is achieved by one of the proposed SNR-based angular spectrum methods. For larger spacing, a variant of the generalized cross-correlation with phase transform (GCC-PHAT) method performs best.


workshop on applications of signal processing to audio and acoustics | 2009

Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation

Alexey Ozerov; Cédric Févotte; Maurice Charbit

We present a new probabilistic model for polyphonic audio termed factorial scaled hidden Markov model (FS-HMM), which generalizes several existing models, notably the Gaussian scaled mixture model and the Itakura-Saito nonnegative matrix factorization (NMF) model. We describe two expectation-maximization (EM) algorithms for maximum likelihood estimation, which differ by the choice of complete data set. The second EM algorithm, based on a reduced complete data set and multiplicative updates inspired from NMF methodology, exhibits much faster convergence. We consider the FS-HMM in different configurations for the difficult problem of speech/music separation from a single channel and report satisfying results.


international conference on acoustics, speech, and signal processing | 2011

Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation

Alexey Ozerov; Cédric Févotte; Raphaël Blouet; Jean-Louis Durrieu

Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications.


information sciences, signal processing and their applications | 2010

Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation

Simon Arberet; Alexey Ozerov; Ngoc Q. K. Duong; Emmanuel Vincent; Rémi Gribonval; Frédéric Bimbot; Pierre Vandergheynst

We address the problem of blind audio source separation in the under-determined and convolutive case. The contribution of each source to the mixture channels in the time-frequency domain is modeled by a zero-mean Gaussian random vector with a full rank covariance matrix composed of two terms: a variance which represents the spectral properties of the source and which is modeled by a nonnegative matrix factorization (NMF) model and another full rank covariance matrix which encodes the spatial properties of the source contribution in the mixture. We address the estimation of these parameters by maximizing the likelihood of the mixture using an expectation-maximization (EM) algorithm. Theoretical propositions are corroborated by experimental studies on stereo reverberant music mixtures.


computer music modeling and retrieval | 2010

Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

Cédric Févotte; Alexey Ozerov

Nonnegative tensor factorization (NTF) of multichannel spectrograms under PARAFAC structure has recently been proposed by Fitzgerald et al as a mean of performing blind source separation (BSS) of multichannel audio data. In this paper we investigate the statistical source models implied by this approach. We show that it implicitly assumes a nonpoint-source model contrasting with usual BSS assumptions and we clarify the links between the measure of fit chosen for the NTF and the implied statistical distribution of the sources. While the original approach of Fitzgeral et al requires a posterior clustering of the spatial cues to group the NTF components into sources, we discuss means of performing the clustering within the factorization. In the results section we test the impact of the simplifying nonpoint-source assumption on underdetermined linear instantaneous mixtures of musical sources and discuss the limits of the approach for such mixtures.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Coding-Based Informed Source Separation: Nonnegative Tensor Factorization Approach

Alexey Ozerov; Antoine Liutkus; Roland Badeau; Gaël Richard

Informed source separation (ISS) aims at reliably recovering sources from a mixture. To this purpose, it relies on the assumption that the original sources are available during an encoding stage. Given both sources and mixture, a side-information may be computed and transmitted along with the mixture, whereas the original sources are not available any longer. During a decoding stage, both mixture and side-information are processed to recover the sources. ISS is motivated by a number of specific applications including active listening and remixing of music, karaoke, audio gaming, etc. Most ISS techniques proposed so far rely on a source separation strategy and cannot achieve better results than oracle estimators. In this study, we introduce Coding-based ISS (CISS) and draw the connection between ISS and source coding. CISS amounts to encode the sources using not only a model as in source coding but also the observation of the mixture. This strategy has several advantages over conventional ISS methods. First, it can reach any quality, provided sufficient bandwidth is available as in source coding. Second, it makes use of the mixture in order to reduce the bitrate required to transmit the sources, as in classical ISS. Furthermore, we introduce Nonnegative Tensor Factorization as a very efficient model for CISS and report rate-distortion results that strongly outperform the state of the art.


workshop on applications of signal processing to audio and acoustics | 2011

Informed source separation: Source coding meets source separation

Alexey Ozerov; Antoine Liutkus; Roland Badeau; Gaël Richard

We consider the informed source separation (ISS) problem where, given the sources and the mixtures, any kind of side-information can be computed during a so-called encoding stage. This side-information is then used to assist source separation, given the mixtures only, at the so-called decoding stage. State of the art ISS approaches do not really consider ISS as a coding problem and rely on some purely source separation-inspired strategies, leading to performances that can at best reach those of oracle estimators. On the other hand, classical source coding strategies are not optimal either, since they do not benefit from the mixture availability. We introduce a general probabilistic framework called coding-based ISS (CISS) that consists in quantizing the sources using some posterior source distribution from those usually used in probabilistic model-based source separation. CISS benefits from both source coding, thanks to the source quantization, and source separation, thanks to the use of the posterior distribution that depends on the mixture. Our experiments show that CISS based on a particular model considerably outperforms for all rates both the conventional ISS approach and the source coding approach based on the same model.

Collaboration


Dive into the Alexey Ozerov's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Cédric Févotte

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Gaël Richard

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Arberet

École Polytechnique Fédérale de Lausanne

View shared research outputs
Top Co-Authors

Avatar

Cagdas Bilen

French Institute for Research in Computer Science and Automation

View shared research outputs
Top Co-Authors

Avatar

Roland Badeau

Institut Mines-Télécom

View shared research outputs
Researchain Logo
Decentralizing Knowledge