Mathieu Lagrange
IRCAM
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mathieu Lagrange.
IEEE Transactions on Multimedia | 2015
Dan Stowell; Dimitrios Giannoulis; Emmanouil Benetos; Mathieu Lagrange; Mark D. Plumbley
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.
workshop on applications of signal processing to audio and acoustics | 2013
Dimitrios Giannoulis; Emmanouil Benetos; Dan Stowell; Mathias Rossignol; Mathieu Lagrange; Mark D. Plumbley
This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within a scene. Systems dealing with such tasks are far from exhibiting human-like performance and robustness. Undermining factors are numerous: the extreme variability of sources of interest possibly interfering, the presence of complex background noise as well as room effects like reverberation. The proposed challenge is an attempt to help the research community move forward in defining and studying the aforementioned tasks. Apart from the challenge description, this paper provides an overview of systems submitted to the challenge as well as a detailed evaluation of the results achieved by those systems.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Mathieu Lagrange; Luis Gustavo Martins; Jennifer Murdoch; George Tzanetakis
The predominant melodic source, frequently the singing voice, is an important component of musical signals. In this paper, we describe a method for extracting the predominant source and corresponding melody from ldquoreal-worldrdquo polyphonic music. The proposed method is inspired by ideas from computational auditory scene analysis. We formulate predominant melodic source tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in computer vision. Sinusoidal modeling is used as the underlying representation. A novel harmonicity cue which we term harmonically wrapped peak similarity is introduced. Experimental results supporting the use of this cue are presented. In addition, we show results for automatic melody extraction using the proposed approach.
IEEE Transactions on Audio, Speech, and Language Processing | 2007
Mathieu Lagrange; Sylvain Marchand; Jean-Bernard Rault
This paper addresses the problem of tracking partials, i.e., determining the evolution over time of the parameters of a given number of sinusoids with respect to the analyzed audio stream. We first show that the minimal frequency difference heuristic generally used to identify continuities between local maxima of successive short-time spectra can be successfully generalized using the linear prediction formalism to handle modulated sounds such as musical tones with vibrato. The spectral properties of the evolutions in time of the parameters of the partials are next studied to ensure that the parameters of the partials effectively satisfy the slow time-varying constraint of the sinusoidal model. These two improvements are combined in a new algorithm designed for the sinusoidal modeling of polyphonic sounds. The comparative tests show that onsets/offsets of sinusoids as well as closely spaced sinusoids are better identified and stochastic components are better avoided.
international conference on acoustics, speech, and signal processing | 2010
Rémi Foucard; Jean-Louis Durrieu; Mathieu Lagrange; Gaël Richard
Expressing the similarity between musical streams is a challenging task as it involves the understanding of many factors which are most often blended into one information channel: the audio stream. Consequently, separating the musical audio stream into its main melody and its accompaniment may prove as being useful to root the similarity computation on a more robust and expressive representation. In this paper, we show that considering the mixture, an estimation of its main melody and its accompaniment as modalities allows us to propose new ways of defining the similarity between musical streams. In the context of the detection of cover version, we show that highest performance is achieved by jointly considering the mixture and the estimated accompaniment. As demonstrated by the experiments carried out using two different evaluation databases, this scheme allows the scoring system to focus more on the chord progression by considering the accompaniment while being robust to the potential separation errors by also considering the mixture.
Computer Speech & Language | 2013
Alexey Ozerov; Mathieu Lagrange; Emmanuel Vincent
We consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new expectation maximization (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 3-4% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio.
international conference on acoustics, speech, and signal processing | 2007
Mathieu Lagrange; George Tzanetakis
The goal of computational auditory scene analysis (CASA) is to create computer systems that can take as input a mixture of sounds and form packages of acoustic evidence such that each package most likely has arisen from a single sound source. We formulate sound source tracking and formation as a graph partitioning problem and solve it using the normalized cut which is a global criterion for segmenting graphs that has been used in computer vision. It measures both the total dissimilarity between the different groups as well as the total similarity within groups. We describe how this formulation can be used with sinusoidal modeling, a common technique for sound analysis, manipulation and synthesis. Several examples showing the potential of this approach are provided.
international conference on acoustics, speech, and signal processing | 2013
Carlo Baugé; Mathieu Lagrange; Joakim Andén; Stéphane Mallat
Environmental sounds are an interesting subject of study for machine audition because of their wide variety of acoustical characteristics and their central presence in our everyday life. They are perceived effortlessly in the human auditory system whereas state-of-the-art computational systems are far from reaching the same efficiency. In this paper we propose a novel representation of such sounds based on the scattering transform which has the property of stability to time-warping deformations and invariance to time-shift useful for classifications tasks. This representation is compared to several state-of-the-art approaches for the task of quantifying similarity between environmental sounds.
IEEE Transactions on Audio, Speech, and Language Processing | 2016
Grégoire Lafay; Mathieu Lagrange; Mathias Rossignol; Emmanouil Benetos; Axel Roebel
This paper introduces a model for simulating environmental acoustic scenes that abstracts temporal structures from audio recordings. This model allows us to explicitly control key morphological aspects of the acoustic scene and to isolate their impact on the performance of the system under evaluation. Thus, more information can be gained on the behavior of an evaluated system, providing guidance for further improvements. To demonstrate its potential, this model is employed to evaluate the performance of nine state of the art sound event detection systems submitted to the IEEE DCASE 2013 Challenge. Results indicate that the proposed scheme is able to successfully build datasets useful for evaluating important aspects of the performance of sound event detection systems, such as their robustness to new recording conditions and to varying levels of background audio.
IEEE Transactions on Audio, Speech, and Language Processing | 2010
Mathieu Lagrange; Gary P. Scavone; Philippe Depalle
This paper introduces an analysis/synthesis scheme for the reproduction of sounds generated by sustained contact between rigid bodies. This scheme is rooted in a Source/Filter decomposition of the sound where the filter is described as a set of poles and the source is described as a set of impulses representing the energy transfer between the interacting objects. Compared to single impacts, sustained contact interactions like rolling and sliding make the estimation of the parameters of the Source/Filter model challenging because of two issues. First, the objects are almost continuously interacting. The second is that the source is generally unknown and has therefore to be modeled in a generic way. In an attempt to tackle those issues, the proposed analysis/synthesis scheme combines advanced analysis techniques for the estimation of the filter parameters and a flexible model of the source. It allows the modeling of a wide range of sounds. Examples are presented for objects of various shapes and sizes, rolling or sliding over plates of different materials. In order to demonstrate the versatility of the approach, the system is also considered for the modeling of sounds produced by percussive musical instruments.