Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Gaël Richard is active.

Publication


Featured researches published by Gaël Richard.


international conference on multimedia and expo | 2005

Events Detection for an Audio-Based Surveillance System

Chloé Clavel; Thibaut Ehrette; Gaël Richard

The present research deals with audio events detection in noisy environments for a multimedia surveillance application. In surveillance or homeland security most of the systems aiming to automatically detect abnormal situations are only based on visual clues while, in some situations, it may be easier to detect a given event using the audio information. This is in particular the case for the class of sounds considered in this paper, sounds produced by gun shots. The automatic shot detection system presented is based on a novelty detection approach which offers a solution to detect abnormality (abnormal audio events) in continuous audio recordings of public places. We specifically focus on the robustness of the detection against variable and adverse conditions and the reduction of the false rejection rate which is particularly important in surveillance applications. In particular, we take advantage of potential similarity between the acoustic signatures of the different types of weapons by building a hierarchical classification system


IEEE Journal of Selected Topics in Signal Processing | 2011

Signal Processing for Music Analysis

Meinard Müller; Daniel P. W. Ellis; Anssi Klapuri; Gaël Richard

Music signal processing may appear to be the junior relation of the large and mature field of speech signal processing, not least because many techniques and representations originally developed for speech have been applied to music, often with good results. However, music signals possess specific acoustic and structural characteristics that distinguish them from spoken language or other nonmusical signals. This paper provides an overview of some signal analysis techniques that specifically address musical dimensions such as melody, harmony, rhythm, and timbre. We will examine how particular characteristics of music signals impact and determine these techniques, and we highlight a number of novel music analysis and retrieval tasks that such processing makes possible. Our goal is to demonstrate that, to be successful, music audio signal processing techniques must be informed by a deep and thorough insight into the nature of music itself.


IEEE Transactions on Signal Processing | 2005

Fast approximated power iteration subspace tracking

Roland Badeau; Bertrand David; Gaël Richard

This paper introduces a fast implementation of the power iteration method for subspace tracking, based on an approximation that is less restrictive than the well-known projection approximation. This algorithm, referred to as the fast approximated power iteration (API) method, guarantees the orthonormality of the subspace weighting matrix at each iteration. Moreover, it outperforms many subspace trackers related to the power iteration method, such as PAST, NIC, NP3, and OPAST, while having the same computational complexity. The API method is designed for both exponential windows and sliding windows. Our numerical simulations show that sliding windows offer a faster tracking response to abrupt signal variations.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

Jean-Louis Durrieu; Gaël Richard; Bertrand David; Cédric Févotte

Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum-likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Instrument recognition in polyphonic music based on automatic taxonomies

Slim Essid; Gaël Richard; Bertrand David

We propose a new approach to instrument recognition in the context of real music orchestrations ranging from solos to quartets. The strength of our approach is that it does not require prior musical source separation. Thanks to a hierarchical clustering algorithm exploiting robust probabilistic distances, we obtain a taxonomy of musical ensembles which is used to efficiently classify possible combinations of instruments played simultaneously. Moreover, a wide set of acoustic features is studied including some new proposals. In particular, signal to mask ratios are found to be useful features for audio classification. This study focuses on a single music genre (i.e., jazz) but combines a variety of instruments among which are percussion and singing voice. Using a varied database of sound excerpts from commercial recordings, we show that the segmentation of music with respect to the instruments played can be achieved with an average accuracy of 53%.


IEEE Journal of Selected Topics in Signal Processing | 2011

A Musically Motivated Mid-Level Representation for Pitch Estimation and Musical Audio Source Separation

Jean-Louis Durrieu; Bertrand David; Gaël Richard

When designing an audio processing system, the target tasks often influence the choice of a data representation or transformation. Low-level time-frequency representations such as the short-time Fourier transform (STFT) are popular, because they offer a meaningful insight on sound properties for a low computational cost. Conversely, when higher level semantics, such as pitch, timbre or phoneme, are sought after, representations usually tend to enhance their discriminative characteristics, at the expense of their invertibility. They become so-called mid-level representations. In this paper, a source/filter signal model which provides a mid-level representation is proposed. This representation makes the pitch content of the signal as well as some timbre information available, hence keeping as much information from the raw data as possible. This model is successfully used within a main melody extraction system and a lead instrument/accompaniment separation system. Both frameworks obtained top results at several international evaluation campaigns.


Speech Communication | 2008

Fear-type emotion recognition for future audio-based surveillance systems

Chloé Clavel; Ioana Vasilescu; Laurence Devillers; Gaël Richard; Thibaut Ehrette

This paper addresses the issue of automatic emotion recognition in speech. We focus on a type of emotional manifestation which has been rarely studied in speech processing: fear-type emotions occurring during abnormal situations (here, unplanned events where human life is threatened). This study is dedicated to a new application in emotion recognition - public safety. The starting point of this work is the definition and the collection of data illustrating extreme emotional manifestations in threatening situations. For this purpose we develop the SAFE corpus (situation analysis in a fictional and emotional corpus) based on fiction movies. It consists of 7h of recordings organized into 400 audiovisual sequences. The corpus contains recordings of both normal and abnormal situations and provides a large scope of contexts and therefore a large scope of emotional manifestations. In this way, not only it addresses the issue of the lack of corpora illustrating strong emotions, but also it forms an interesting support to study a high variety of emotional manifestations. We define a task-dependent annotation strategy which has the particularity to describe simultaneously the emotion and the situation evolution in context. The emotion recognition system is based on these data and must handle a large scope of unknown speakers and situations in noisy sound environments. It consists of a fear vs. neutral classification. The novelty of our approach relies on dissociated acoustic models of the voiced and unvoiced contents of speech. The two are then merged at the decision step of the classification system. The results are quite promising given the complexity and the diversity of the data: the error rate is about 30%.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Temporal Integration for Audio Classification With Application to Musical Instrument Classification

Cyril Joder; Slim Essid; Gaël Richard

Nowadays, it appears essential to design automatic indexing tools which provide meaningful and efficient means to describe the musical audio content. There is in fact a growing interest for music information retrieval (MIR) applications amongst which the most popular are related to music similarity retrieval, artist identification, musical genre or instrument recognition. Current MIR-related classification systems usually do not take into account the mid-term temporal properties of the signal (over several frames) and lie on the assumption that the observations of the features in different frames are statistically independent. The aim of this paper is to demonstrate the usefulness of the information carried by the evolution of these characteristics over time. To that purpose, we propose a number of methods for early and late temporal integration and provide an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases. In particular, the impact of the time horizon over which the temporal integration is performed will be assessed both for fixed and variable frame length analysis. Also, a number of proposed alignment kernels will be used for late temporal integration. For all experiments, the results are compared to a state of the art musical instrument recognition system.


IEEE Transactions on Audio, Speech, and Language Processing | 2006

Musical instrument recognition by pairwise classification strategies

Slim Essid; Gaël Richard; Bertrand David

Musical instrument recognition is an important aspect of music information retrieval. In this paper, statistical pattern recognition techniques are utilized to tackle the problem in the context of solo musical phrases. Ten instrument classes from different instrument families are considered. A large sound database is collected from excerpts of musical phrases acquired from commercial recordings translating different instrument instances, performers, and recording conditions. More than 150 signal processing features are studied including new descriptors. Two feature selection techniques, inertia ratio maximization with feature space projection and genetic algorithms are considered in a class pairwise manner whereby the most relevant features are fetched for each instrument pair. For the classification task, experimental results are provided using Gaussian mixture models (GMMs) and support vector machines (SVMs). It is shown that higher recognition rates can be reached with pairwise optimized subsets of features in association with SVM classification using a radial basis function kernel


IEEE Signal Processing Magazine | 2014

Melody Extraction from Polyphonic Music Signals: Approaches, applications, and challenges

Justin Salamon; Emilia Gómez; Daniel P. W. Ellis; Gaël Richard

Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade, melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of ?melody? from both musical and signal processing perspectives and provide a case study that interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation, and applications that build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology.

Collaboration


Dive into the Gaël Richard's collaboration.

Top Co-Authors

Avatar

Roland Badeau

Institut Mines-Télécom

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Slim Essid

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Leglaive

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Romain Serizel

Katholieke Universiteit Leuven

View shared research outputs
Researchain Logo
Decentralizing Knowledge