Olivier Gillet
Télécom ParisTech
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Olivier Gillet.
IEEE Transactions on Audio, Speech, and Language Processing | 2008
Olivier Gillet; Gaël Richard
The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source separation. In addition to efficient fusion strategies to take into account these two complementary sources of information, the transcription system integrates a large set of features, optimally selected by feature selection. Concurrently, the problem of drum track extraction from polyphonic music is tackled both by proposing a novel approach based on harmonic/noise decomposition and time/frequency masking and by improving an existing Wiener filtering-based separation method. The separation and transcription techniques presented are thoroughly evaluated on a large public database of music signals. A transcription accuracy between 64.5% and 80.3% is obtained, depending on the drum instrument, for well-balanced mixes, and the efficiency of our drum separation algorithms is illustrated in a comprehensive benchmark.
international conference on acoustics, speech, and signal processing | 2004
Olivier Gillet; Gaël Richard
Recent efforts in audio indexing and retrieval in music databases mostly focus on melody. If this is appropriate for polyphonic music signals, specific approaches are needed for systems dealing with percussive audio signals such as those produced by drums, tabla or djembe. Most studies of drum signal transcription focus on sounds taken in isolation. In this paper, we propose several methods for drum loop transcription where the drums signals dataset reflects the variability encountered in modern audio recordings (real and natural drum kits, audio effects, simultaneous instruments, etc.). The approaches described are based on hidden Markov models (HMM) and support vector machines (SVM). Promising results are obtained with a 83.9% correct recognition rate for a simplified taxonomy.
IEEE Transactions on Circuits and Systems for Video Technology | 2007
Olivier Gillet; Slim Essid; Gaël Richard
The study of the associations between audio and video content has numerous important applications in the fields of information retrieval and multimedia content authoring. In this work, we focus on music videos which exhibit a broad range of structural and semantic relationships between the music and the video content. To identify such relationships, a two-level automatic structuring of the music and the video is achieved separately. Note onsets are detected from the music signal, along with section changes. The latter is achieved by a novel algorithm which makes use of feature selection and statistical novelty detection approaches based on kernel methods. The video stream is independently segmented to detect changes in motion activity, as well as shot boundaries. Based on this two-level segmentation of both streams, four audio-visual correlation measures are computed. The usefulness of these correlation measures is illustrated by a query by video experiment on a 100 music video database, which also exhibits interesting genre dependencies
intelligent information systems | 2005
Olivier Gillet; Gaël Richard
Recent efforts in audio indexing and music information retrieval mostly focus on melody. If this is appropriate for polyphonic music signals, specific approaches are needed for systems dealing with percussive audio signals such as those produced by drums, tabla or djembé. In this article, we present a complete system allowing the management of a drum patterns (or drumloops) database. Queries in this database are formulated with spoken onomatopoeias—short meaningless words imitating the different sounds of the drumkit. The transcription task necessary to index the database is performed using Hidden Markov Models (HMM) and Support Vector Machines (SVM) and achieves a 86.4% correct recognition rate. The syllables of spoken queries are recognized and a relevant statistical model allows the comparison and alignment of the query with the rythmic sequences stored in the database, in order to provide a set of the most relevant drum loops.
international conference on acoustics, speech, and signal processing | 2005
Olivier Gillet; Gaël Richard
The transcription of a musical performance from the audio signal is often problematic, either because it requires the separation of complex sources, or simply because some important high-level music information cannot be directly extracted from the audio signal. We propose a novel multimodal approach for the transcription of drum sequences using audiovisual features. The transcription is performed by support vector machine (SVM) classifiers, and three different information fusion strategies are evaluated. A correct recognition rate of 85.8% can be achieved for a detailed taxonomy and a fully automated transcription.
workshop on applications of signal processing to audio and acoustics | 2005
Olivier Gillet; Gaël Richard
This paper presents a novel algorithm to extract the drum track of a polyphonic music signal, based on a harmonic / noise decomposition. This algorithm is causal and does not require prior knowledge or learning. The input signal is split into several frequency bands in which the signal is separated in a deterministic and a stochastic part. The stochastic part can be efficiently used to detect drum events and to resynthesize a drum track. Possible applications include drum transcription, remixing, and independent processing of the rhythmic and melodic components of music signals. Results obtained from real recordings of popular music are presented, as well as a perceptual evaluation of the quality of remixed signals.
international conference on acoustics, speech, and signal processing | 2006
Olivier Gillet; Gaël Richard
Music videos are good examples of multimedia documents in which the structures of the audio and video streams are highly correlated. This paper presents a system that matches these structures and extracts audio-visual correlation measures. The audio and video streams are independently segmented at two-levels: shots (sections for audio) and events. Audio segmentation is performed at the event level by detecting onsets, and at the section level by a novelty detection algorithm identifying instrumentation changes. Video segmentation is performed at the event level by detecting changes in the motion intensity descriptor, and at the shot level by using a classical histogram-based shot detection algorithm. Audio-visual correlation measures are computed on the extracted structures. Possible applications include audio/video stream resynchronization, video retrieval from audio content, or classification of music videos by genre
international symposium/conference on music information retrieval | 2006
Olivier Gillet; Gaël Richard
international symposium/conference on music information retrieval | 2003
Olivier Gillet; Gaël Richard
international symposium/conference on music information retrieval | 2005
Olivier Gillet; Gaël Richard