Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Axel Roebel is active.

Publication


Featured researches published by Axel Roebel.


IEEE Transactions on Audio, Speech, and Language Processing | 2010

Multiple Fundamental Frequency Estimation and Polyphony Inference of Polyphonic Music Signals

Chunghsin Yeh; Axel Roebel; Xavier Rodet

This paper presents a frame-based system for estimating multiple fundamental frequencies (F0s) of polyphonic music signals based on the short-time Fourier transform (STFT) representation. To estimate the number of sources along with their F0s, it is proposed to estimate the noise level beforehand and then jointly evaluate all the possible combinations among pre-selected F0 candidates. Given a set of F0 hypotheses, their hypothetical partial sequences are derived, taking into account where partial overlap may occur. A score function is used to select the plausible sets of F0 hypotheses. To infer the best combination, hypothetical sources are progressively combined and iteratively verified. A hypothetical source is considered valid if it either explains more energy than the noise, or improves significantly the envelope smoothness once the overlapping partials are treated. The proposed system has been submitted to Music Information Retrieval Evaluation eXchange (MIREX) 2007 and 2008 contests where the accuracy has been evaluated with respect to the number of sources inferred and the precision of the F0s estimated. The encouraging results demonstrate its competitive performance among the state-of-the-art methods.


IEEE Transactions on Audio, Speech, and Language Processing | 2011

Phase Minimization for Glottal Model Estimation

Gilles Degottex; Axel Roebel; Xavier Rodet

In glottal source analysis, the phase minimization criterion has already been proposed to detect excitation instants. As shown in this paper, this criterion can also be used to estimate the shape parameter of a glottal model (ex. Liljencrants-Fant model) and not only its time position. Additionally, we show that the shape parameter can be estimated independently of the glottal model position. The reliability of the proposed methods is evaluated with synthetic signals and compared to that of the IAIF and minimum/maximum-phase decomposition methods. The results of the methods are evaluated according to the influence of the fundamental frequency and noise. The estimation of a glottal model is useful for the separation of the glottal source and the vocal-tract filter and therefore can be applied in voice transformation, synthesis, and also in clinical context or for the study of the voice production.


Speech Communication | 2013

Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis

Gilles Degottex; Pierre Lanchantin; Axel Roebel; Xavier Rodet

In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition.


international conference on acoustics, speech, and signal processing | 2013

Syll-O-Matic: An adaptive time-frequency representation for the automatic segmentation of speech into syllables

Nicolas Obin; François Lamare; Axel Roebel

This paper introduces novel paradigms for the segmentation of speech into syllables. The main idea of the proposed method is based on the use of a time-frequency representation of the speech signal, and the fusion of intensity and voicing measures through various frequency regions for the automatic selection of pertinent information for the segmentation. The time-frequency representation is used to exploit the speech characteristics depending on the frequency region. In this representation, intensity profiles are measured to provide information into various frequency regions, and voicing profiles are measured to determine the frequency regions that are pertinent for the segmentation. The proposed method outperforms conventional methods for the detection of syllable landmark and boundaries on the TIMIT database of American-English, and provides a promising paradigm for the segmentation of speech into syllables.


international conference on acoustics, speech, and signal processing | 2011

Pitch transposition and breathiness modification using a glottal source model and its adapted vocal-tract filter

Gilles Degottex; Axel Roebel; Xavier Rodet

The transformation of the voiced segments of a speech recording has many applications such as expressivity synthesis or voice conversion. This paper addresses the pitch transposition and the modification of breathiness by means of an analytic description of the deterministic component of the voice source, a glottal model. Whereas this model is dedicated to voice production, most of the current methods can be applied to any pseudo-periodic signals. Using the described method, the synthesized voice is thus expected to better preserve some naturalness compared to a more generic method. Using preference tests, it is shown that this method is preferred for important pitch transposition (e.g. one octave) compared to two state of the art methods. Additionally, it is shown that the breathiness of two male utterances can be controlled.


international conference on acoustics, speech, and signal processing | 2013

Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge

Yuki Mitsufuji; Axel Roebel

This paper concerns a new method of source separation that uses a spatial cue given by a user or from accompanying images to extract a target sound. The algorithm is based on non-negative tensor factorization (NTF), which decomposes multichannel spectrograms into three matrices. The components of one of the three matrices represent spatial information and are associated with the spatial cue, thus indicating which bins of the spectrogram should be given preference. When a spatial cue is available, this method has a great advantage over conventional PARAFAC-NTF in terms of both computational costs and separation quality, as measured by evaluation metrics such as SDR, SIR and SAR.


international conference on acoustics, speech, and signal processing | 2015

On automatic drum transcription using non-negative matrix deconvolution and itakura saito divergence

Axel Roebel; Jordi Pons; Marco Liuni; Mathieu Lagrangey

This paper presents an investigation into the detection and classification of drum sounds in polyphonic music and drum loops using non-negative matrix deconvolution (NMD) and the Itakura Saito divergence. The Itakura Saito divergence has recently been proposed as especially appropriate for decomposing audio spectra due to the fact that it is scale invariant, but it has not yet been widely adopted. The article studies new contributions for audio event detection methods using the Itakura Saito divergence that improve efficiency and numerical stability, and simplify the generation of target pattern sets. A new approach for handling background sounds is proposed and moreover, a new detection criteria based on estimating the perceptual presence of the target class sources is introduced. Experimental results obtained for drum detection in polyphonic music and drum soli demonstrate the beneficial effects of the proposed extensions.


IEEE Transactions on Audio, Speech, and Language Processing | 2016

A Morphological Model for Simulating Acoustic Scenes and Its Application to Sound Event Detection

Grégoire Lafay; Mathieu Lagrange; Mathias Rossignol; Emmanouil Benetos; Axel Roebel

This paper introduces a model for simulating environmental acoustic scenes that abstracts temporal structures from audio recordings. This model allows us to explicitly control key morphological aspects of the acoustic scene and to isolate their impact on the performance of the system under evaluation. Thus, more information can be gained on the behavior of an evaluated system, providing guidance for further improvements. To demonstrate its potential, this model is employed to evaluate the performance of nine state of the art sound event detection systems submitted to the IEEE DCASE 2013 Challenge. Results indicate that the proposed scheme is able to successfully build datasets useful for evaluating important aspects of the performance of sound event detection systems, such as their robustness to new recording conditions and to varying levels of background audio.


international conference on acoustics, speech, and signal processing | 2011

Function of Phase-Distortion for glottal model estimation

Gilles Degottex; Axel Roebel; Xavier Rodet

In voice analysis, the parameters estimation of a glottal model, an analytic description of the deterministic component of the glottal source, is a challenging question to assess voice quality in clinical use or to model voice production for speech transformation and synthesis using a priori constraints. In this paper, we first describe the Function of Phase-Distortion (FPD) which allows to characterize the shape of the periodic pulses of the glottal source independently of other features of the glottal source. Then, using the FPD, we describe two methods to estimate a shape parameter of the Liljencrants-Fant glottal model. By comparison with state of the art methods using Electro-Glotto-Graphic signals, we show that the one of these method outperform the compared methods.


international conference on acoustics, speech, and signal processing | 2010

Joint estimate of shape and time-synchronization of a glottal source model by phase flatness

Gilles Degottex; Axel Roebel; Xavier Rodet

A new method is proposed to jointly estimate the shape parameter of a glottal model and its time position in a voiced segment. We show that, the idea of phase flatness (or phase minimization) used in the most robust Glottal Closure Instant detection methods can be generalized to estimate the shape of the glottal model. In this paper we evaluate the proposed method using synthetic signals. The reliability related to fundamental frequency and noise is evaluated. The estimation of the glottal source is useful for voice analysis (ex. separation of glottal source and vocal-tract filter), voice transformation and synthesis.

Collaboration


Dive into the Axel Roebel's collaboration.

Top Co-Authors

Avatar

Xavier Rodet

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alvin W.Y. Su

National Cheng Kung University

View shared research outputs
Researchain Logo
Decentralizing Knowledge