Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Cyril Joder is active.

Publication


Featured researches published by Cyril Joder.


IEEE Transactions on Audio, Speech, and Language Processing | 2009

Temporal Integration for Audio Classification With Application to Musical Instrument Classification

Cyril Joder; Slim Essid; Gaël Richard

Nowadays, it appears essential to design automatic indexing tools which provide meaningful and efficient means to describe the musical audio content. There is in fact a growing interest for music information retrieval (MIR) applications amongst which the most popular are related to music similarity retrieval, artist identification, musical genre or instrument recognition. Current MIR-related classification systems usually do not take into account the mid-term temporal properties of the signal (over several frames) and lie on the assumption that the observations of the features in different frames are statistically independent. The aim of this paper is to demonstrate the usefulness of the information carried by the evolution of these characteristics over time. To that purpose, we propose a number of methods for early and late temporal integration and provide an in-depth experimental study on their interest for the task of musical instrument recognition on solo musical phrases. In particular, the impact of the time horizon over which the temporal integration is performed will be assessed both for fixed and variable frame length analysis. Also, a number of proposed alignment kernels will be used for late temporal integration. For all experiments, the results are compared to a state of the art musical instrument recognition system.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

A Conditional Random Field Framework for Robust and Scalable Audio-to-Score Matching

Cyril Joder; Slim Essid; Gaël Richard

In this paper, we introduce the use of conditional random fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames. Three different CRF models are proposed for our task, for different choices of tradeoff between accuracy and complexity. Three types of features are used, characterizing the local harmony, note attacks and tempo. We also propose a novel hierarchical approach, which takes advantage of the score structure for an approximate decoding of the statistical model. This strategy reduces the complexity, yielding a better overall efficiency than the classic beam search method used in HMM-based models. Experiments run on a large database of classical piano and popular music exhibit very accurate alignments. Indeed, with the best performing system, more than 95% of the note onsets are detected with a precision finer than 100 ms. We additionally show how the proposed framework can be modified in order to be robust to possible structural differences between the score and the musical performance.


international conference on acoustics, speech, and signal processing | 2010

A comparative study of tonal acoustic features for a symbolic level music-to-score alignment

Cyril Joder; Slim Essid; Gaël Richard

In this paper we review the acoustic features used for music-to-score alignment and study their influence on the performance in a challenging alignment task, where the audio data is polyphonic and may contain percussion. Furthermore, as we aim at using “real world” scores, we follow an approach which does exploit the rhythm information (considered unreliable) and test its robustness to score errors. We use a unified framework to handle different state-of-the-art features, and propose a simple way to exploit either a model of the feature values, or an audio synthesis of a musical score, in an audio-to-score alignment system. We confirm that chroma vectors drawn from representations using a logarithmic frequency scale are the most efficient features, and lead to a good precision, even with a simple alignment strategy. Robustness tests also show that the relative performance of the features do not depend on possible musical score degradations.


IEEE Transactions on Audio, Speech, and Language Processing | 2013

Learning Optimal Features for Polyphonic Audio-to-Score Alignment

Cyril Joder; Slim Essid; Gaël Richard

This paper addresses the design of feature functions for the matching of a musical recording to the symbolic representation of the piece (the score). These feature functions are defined as dissimilarity measures between the audio observations and template vectors corresponding to the score. By expressing the template construction as a linear mapping from the symbolic to the audio representation, one can learn the feature functions by optimizing the linear transformation. In this paper, we explore two different learning strategies. The first one uses a best-fit criterion (minimum divergence), while the second one exploits a discriminative framework based on a Conditional Random Fields model (maximum likelihood criterion). We evaluate the influence of the feature functions in an audio-to-score alignment task, on a large database of popular and classical polyphonic music. The results show that with several types of models, using different temporal constraints, the learned mappings have the potential to outperform the classic heuristic mappings. Several representations of the audio observations, along with several distance functions are compared in this alignment task. Our experiments elect the symmetric Kullback-Leibler divergence. Moreover, both the spectrogram and a CQT-based representation turn out to provide very accurate alignments, detecting more than 97% of the onsets with a precision of 100 ms with our most complex system.


international conference on acoustics, speech, and signal processing | 2013

A comparative study on sparsity penalties for NMF-based speech separation: Beyond LP-norms

Cyril Joder; Felix Weninger; David Virette; Björn W. Schuller

In this work, we study the usefulness of several types of sparsity penalties in the task of speech separation using supervised and semi-supervised Nonnegative Matrix Factorization (NMF). We compare different criteria from the literature to two novel penalty functions based on Wiener Entropy, in a large-scale evaluation on spontaneous speech overlaid by realistic domestic noise, as well as music and stationary environmental noise corpora. The results show that enforcing the sparsity constraint in the separation phase does not improve the perceptual quality. In the learning phase however, it yields a better estimation of the base spectra, especially in the case of supervised NMF, where the proposed criteria delivered the best results.


international conference on acoustics, speech, and signal processing | 2013

Integrating noise estimation and factorization-based speech separation: A novel hybrid approach

Cyril Joder; Felix Weninger; David Virette; Björn W. Schuller

We present a novel method to integrate noise estimates by unsupervised speech enhancement algorithms into a semi-supervised non-negative matrix factorization framework. A multiplicative update algorithm is derived to estimate a non-negative noise dictionary given a time-varying background noise estimate with a stationarity constraint. A large-scale, speaker-independent evaluation is carried out on spontaneous speech overlaid with the official CHiME 2011 Challenge corpus of realistic domestic noise, as well as music and stationary environmental noise corpora. In the result, the proposed method delivers higher signal-distortion ratio and objective perceptual measure than standard semi-supervised NMF or spectral subtraction based on the same noise estimation algorithm, and further gains can be expected by speaker adaptation.


acm multimedia | 2010

A conditional random field viewpoint of symbolic audio-to-score matching

Cyril Joder; Slim Essid; Gaël Richard

We present a new approach of symbolic audio-to-score alignment, with the use of Conditional Random Fields (CRFs). Unlike Hidden Markov Models, these graphical models allow the calculation of state conditional probabilities to be made on the basis of several audio frames. The CRF models that we propose exploit this property to take into account the rhythmic information of the musical score. Assuming that the tempo is locally constant, they confront the neighborhood of each frame with several tempo hypotheses. Experiments on a pop-music database show that this use of contextual information leads to a significant improvement of the alignment accuracy. In particular, the proportion of detected onsets inside a 100-ms tolerance window increases by more than 10% when a 1-s neighborhood is considered.


international conference on acoustics, speech, and signal processing | 2013

Off-line refinement of audio-to-score alignment by observation template adaptation

Cyril Joder; Björn W. Schuller

Audio-to-score alignment aims at matching a symbolic representation (the score) to a musical recording. A key problem in this application is the great variability of audio observations which can be explained by a single symbolic element. Whereas most previous works deal with this problem by training or heuristic design of a generic observation model, we propose the adaptation of this model to each musical piece. We exploit a template-based formulation of the observation model and we investigate two strategies for the adaptation of the templates using a Hidden Markov Model for the alignment. Experiments run on a large dataset of popular and classical piano music show that such an approach can lead to a significant improvement of the alignment accuracy compared to the use of a single generic model, even if the latter is trained on real data.


workshop on applications of signal processing to audio and acoustics | 2011

Optimizing the mapping from a symbolic to an audio representation for music-to-score alignment

Cyril Joder; Slim Essid; Gaël Richard

A key processing step in music-to-score alignment systems is the estimation of the intantaneous match between an audio observation and the score. We here propose a general formulation of this matching measure, using a linear transformation from the symbolic domain to any time-frequency representation of the audio. We investigate the learning of the mapping for several common audio representations, based on a best-fit criterion. We evaluate the effectiveness of our mapping approach with two different alignment systems, on a large database of popular and classical polyphonic music. The results show that the learning procedure significantly improves the precision of the alignments, compared to common heuristic templates used in the literature.


international conference on acoustics, speech, and signal processing | 2015

A Conditional Random Field system for beat tracking

Thomas Fillon; Cyril Joder; Simon Durand; Slim Essid

In the present work, we introduce a new probabilistic model for the task of estimating beat positions in a musical audio recording, instantiating the Conditional Random Field (CRF) framework. Our approach takes its strength from a sophisticated temporal modeling of the audio observations, accounting for local tempo variations which are readily represented in the CRF model proposed using well-chosen potentials. The system is experimentally evaluated by studying its performance on 3 datasets of 1394 music excerpts of various western music styles and comparatively to 4 reference systems in the light of 6 reference evaluation metrics. The results show that the proposed system tracks perceptively coherent pulses and is very effective in estimating the beat positions while further work is needed to find the correct salient tempo.

Collaboration


Dive into the Cyril Joder's collaboration.

Top Co-Authors

Avatar

Slim Essid

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar

Gaël Richard

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Durand

Université Paris-Saclay

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Felix Weninger

Technische Universität München

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge