Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Victor Bisot is active.

Publication


Featured researches published by Victor Bisot.


international conference on acoustics, speech, and signal processing | 2016

Acoustic scene classification with matrix factorization for unsupervised feature learning

Victor Bisot; Romain Serizel; Slim Essid; Gaël Richard

In this paper we study the use of unsupervised feature learning for acoustic scene classification (ASC). The acoustic environment recordings are represented by time-frequency images from which we learn features in an unsupervised manner. After a set of preprocessing and pooling steps, the images are decomposed using matrix factorization methods. By decomposing the data on a learned dictionary, we use the projection coefficients as features for classification. An experimental evaluation is done on a large ASC dataset to study popular matrix factorization methods such as Principal Component Analysis (PCA) and Non-negative Matrix Factorization (NMF) as well as some of their extensions including sparse, kernel based and convolutive variants. The results show the compared variants lead to significant improvement compared to the state-of-the-art results in ASC.


european signal processing conference | 2015

HOG and subband power distribution image features for acoustic scene classification

Victor Bisot; Slim Essid; Gaël Richard

Acoustic scene classification is a difficult problem mostly due to the high density of events concurrently occurring in audio scenes. In order to capture the occurrences of these events we propose to use the Subband Power Distribution (SPD) as a feature. We extract it by computing the histogram of amplitude values in each frequency band of a spectrogram image. The SPD allows us to model the density of events in each frequency band. Our method is evaluated on a large acoustic scene dataset using support vector machines. We outperform the previous methods when using the SPD in conjunction with the histogram of gradients. To reach further improvement, we also consider the use of an approximation of the earth movers distance kernel to compare histograms in a more suitable way. Using the so-called Sinkhorn kernel improves the results on most of the feature configurations. Best performances reach a 92.8% F1 score.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Feature Learning With Matrix Factorization Applied to Acoustic Scene Classification

Victor Bisot; Romain Serizel; Slim Essid; Gaël Richard

In this paper, we study the usefulness of various matrix factorization methods for learning features to be used for the specific acoustic scene classification (ASC) problem. A common way of addressing ASC has been to engineer features capable of capturing the specificities of acoustic environments. Instead, we show that better representations of the scenes can be automatically learned from time–frequency representations using matrix factorization techniques. We mainly focus on extensions including sparse, kernel-based, convolutive and a novel supervised dictionary learning variant of principal component analysis and nonnegative matrix factorization. An experimental evaluation is performed on two of the largest ASC datasets available in order to compare and discuss the usefulness of these methods for the task. We show that the unsupervised learning methods provide better representations of acoustic scenes than the best conventional hand-crafted features on both datasets. Furthermore, the introduction of a novel nonnegative supervised matrix factorization model and deep neural networks trained on spectrograms, allow us to reach further improvements.


international conference on image processing | 2016

Machine listening techniques as a complement to video image analysis in forensics

Romain Serizel; Victor Bisot; Slim Essid; Gaël Richard

Video is now one of the major sources of information for forensics. However, video documents can be originating from various recording devices (CCTV, mobile devices, etc.) with inconsistent quality and can sometimes be recorded in challenging light or motion conditions. Therefore, the amount of information that can be extracted relying solely on video image can vary to a great extent. Most of the videos however generally include audio recording as well. Machine listening can then become a valuable complement to video image analysis in challenging scenarios. In this paper, the authors present a brief overview of some machine listening techniques and their application to the analysis of video documents for forensics. The applicability of these techniques to forensics problems is then discussed in the light of machine listening system performances.


Archive | 2018

Acoustic Features for Environmental Sound Analysis

Romain Serizel; Victor Bisot; Slim Essid; Gaël Richard

Most of the time it is nearly impossible to differentiate between particular type of sound events from a waveform only. Therefore, frequency domain and time-frequency domain representations have been used for years providing representations of the sound signals that are more inline with the human perception. However, these representations are usually too generic and often fail to describe specific content that is present in a sound recording. A lot of work have been devoted to design features that could allow extracting such specific information leading to a wide variety of hand-crafted features. During the past years, owing to the increasing availability of medium scale and large scale sound datasets, an alternative approach to feature extraction has become popular, the so-called feature learning. Finally, processing the amount of data that is at hand nowadays can quickly become overwhelming. It is therefore of paramount importance to be able to reduce the size of the dataset in the feature space. The general processing chain to convert an sound signal to a feature vector that can be efficiently exploited by a classifier and the relation to features used for speech and music processing are described is this chapter.


international conference on acoustics, speech, and signal processing | 2017

Supervised group nonnegative matrix factorisation with similarity constraints and applications to speaker identification

Romain Serizel; Victor Bisot; Slim Essid; Gaël Richard

This paper presents supervised feature learning approaches for speaker identification that rely on nonnegative matrix factorisation. Recent studies have shown that group nonnegative matrix factorisation and task-driven supervised dictionary learning can help performing effective feature learning for audio classification problems. This paper proposes to integrate a recent method that relies on group nonnegative matrix factorisation into a task-driven supervised framework for speaker identification. The goal is to capture both the speaker variability and the session variability while exploiting the discriminative learning aspect of the task-driven approach. Results on a subset of the ESTER corpus prove that the proposed approach can be competitive with I-vectors.


international conference on acoustics, speech, and signal processing | 2017

Overlapping sound event detection with supervised Nonnegative Matrix Factorization

Victor Bisot; Slim Essid; Gaël Richard

In this paper we propose a supervised Nonnegative Matrix Factorization (NMF) model for overlapping sound event detection in real life audio. We start by highlighting the usefulness of non-euclidean NMF to learn representations for detecting and classifying acoustic events in a multi-label setting. Then, we propose to learn a classifier and the NMF decomposition in a joint optimization problem. This is done with a general β-divergence version of the nonnegative task-driven dictionary learning model. An experimental evaluation is performed on the development set of the DCASE 2016 task3 challenge. The proposed supervised NMF-based system improves performance over the baseline and the submitted systems.


DCASE 2017 - Workshop on Detection and Classification of Acoustic Scenes and Events | 2017

Nonnegative Feature Learning Methods for Acoustic Scene Classification

Victor Bisot; Romain Serizel; Slim Essid; Gaël Richard


international symposium/conference on music information retrieval | 2014

Improving music structure segmentation using lag-priors

Geoffroy Peeters; Victor Bisot


international workshop on machine learning for signal processing | 2017

Leveraging deep neural networks with nonnegative representations for improved environmental sound classification

Victor Bisot; Romain Serizel; Slim Essid; Gaël Richard

Collaboration


Dive into the Victor Bisot's collaboration.

Top Co-Authors

Avatar

Gaël Richard

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar

Slim Essid

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar

Romain Serizel

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge