Gilles Degottex
University of Crete
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gilles Degottex.
international conference on acoustics, speech, and signal processing | 2014
Gilles Degottex; John Kane; Thomas Drugman; Tuomo Raitio; Stefan Scherer
Speech processing algorithms are often developed demonstrating improvements over the state-of-the-art, but sometimes at the cost of high complexity. This makes algorithm reimplementations based on literature difficult, and thus reliable comparisons between published results and current work are hard to achieve. This paper presents a new collaborative and freely available repository for speech processing algorithms called COVAREP, which aims at fast and easy access to new speech processing algorithms and thus facilitating research in the field. We envisage that COVAREP will allow more reproducible research by strengthening complex implementations through shared contributions and openly available code which can be discussed, commented on and corrected by the community. Presently COVAREP contains contributions from five distinct laboratories and we encourage contributions from across the speech processing research field. In this paper, we provide an overview of the current offerings of COVAREP and also include a demonstration of the algorithms through an emotion classification experiment.
IEEE Transactions on Audio, Speech, and Language Processing | 2011
Gilles Degottex; Axel Roebel; Xavier Rodet
In glottal source analysis, the phase minimization criterion has already been proposed to detect excitation instants. As shown in this paper, this criterion can also be used to estimate the shape parameter of a glottal model (ex. Liljencrants-Fant model) and not only its time position. Additionally, we show that the shape parameter can be estimated independently of the glottal model position. The reliability of the proposed methods is evaluated with synthetic signals and compared to that of the IAIF and minimum/maximum-phase decomposition methods. The results of the methods are evaluated according to the influence of the fundamental frequency and noise. The estimation of a glottal model is useful for the separation of the glottal source and the vocal-tract filter and therefore can be applied in voice transformation, synthesis, and also in clinical context or for the study of the voice production.
IEEE Transactions on Audio, Speech, and Language Processing | 2013
Gilles Degottex; Yannis Stylianou
Voice models often use frequency limits to split the speech spectrum into two or more voiced/unvoiced frequency bands. However, from the voice production, the amplitude spectrum of the voiced source decreases smoothly without any abrupt frequency limit. Accordingly, multiband models struggle to estimate these limits and, as a consequence, artifacts can degrade the perceived quality. Using a linear frequency basis adapted to the non-stationarities of the speech signal, the Fan Chirp Transformation (FChT) have demonstrated harmonicity at frequencies higher than usually observed from the DFT which motivates a full-band modeling. The previously proposed Adaptive Quasi-Harmonic model (aQHM) offers even more flexibility than the FChT by using a non-linear frequency basis. In the current paper, exploiting the properties of aQHM, we describe a full-band Adaptive Harmonic Model (aHM) along with detailed descriptions of its corresponding algorithms for the estimation of harmonics up to the Nyquist frequency. Formal listening tests show that the speech reconstructed using aHM is nearly indistinguishable from the original speech. Experiments with synthetic signals also show that the proposed aHM globally outperforms previous sinusoidal and harmonic models in terms of precision in estimating the sinusoidal parameters. As a perspective, such a precision is interesting for building higher level models upon the sinusoidal parameters, like spectral envelopes for speech synthesis.
international conference on acoustics, speech, and signal processing | 2010
Pierre Lanchantin; Gilles Degottex; Xavier Rodet
This paper introduces a HMM-based speech synthesis system which uses a new method for the Separation of Vocal-tract and Liljencrants-Fant model plus Noise (SVLN). The glottal source is separated into two components: a deterministic glottal waveform Liljencrants-Fant model and a modulated Gaussian noise. This glottal source is first estimated and then used in the vocal-tract estimation procedure. Then, the parameters of the source and the vocal-tract are included into HMM contextual models of phonems. SVLN is promising for voice transformation in synthesis of expressive speech since it allows an independent control of vocal-tract and glottal-source properties. The synthesis results are finally discussed and subjectively evaluated.
Speech Communication | 2013
Gilles Degottex; Pierre Lanchantin; Axel Roebel; Xavier Rodet
In current methods for voice transformation and speech synthesis, the vocal tract filter is usually assumed to be excited by a flat amplitude spectrum. In this article, we present a method using a mixed source model defined as a mixture of the Liljencrants-Fant (LF) model and Gaussian noise. Using the LF model, the base approach used in this presented work is therefore close to a vocoder using exogenous input like ARX-based methods or the Glottal Spectral Separation (GSS) method. Such approaches are therefore dedicated to voice processing promising an improved naturalness compared to generic signal models. To estimate the Vocal Tract Filter (VTF), using spectral division like in GSS, we show that a glottal source model can be used with any envelope estimation method conversely to ARX approach where a least square AR solution is used. We therefore derive a VTF estimate which takes into account the amplitude spectra of both deterministic and random components of the glottal source. The proposed mixed source model is controlled by a small set of intuitive and independent parameters. The relevance of this voice production model is evaluated, through listening tests, in the context of resynthesis, HMM-based speech synthesis, breathiness modification and pitch transposition.
international conference on acoustics, speech, and signal processing | 2011
Gilles Degottex; Axel Roebel; Xavier Rodet
The transformation of the voiced segments of a speech recording has many applications such as expressivity synthesis or voice conversion. This paper addresses the pitch transposition and the modification of breathiness by means of an analytic description of the deterministic component of the voice source, a glottal model. Whereas this model is dedicated to voice production, most of the current methods can be applied to any pseudo-periodic signals. Using the described method, the synthesized voice is thus expected to better preserve some naturalness compared to a more generic method. Using preference tests, it is shown that this method is preferred for important pitch transposition (e.g. one octave) compared to two state of the art methods. Additionally, it is shown that the breathiness of two male utterances can be controlled.
international conference on acoustics, speech, and signal processing | 2011
Gilles Degottex; Axel Roebel; Xavier Rodet
In voice analysis, the parameters estimation of a glottal model, an analytic description of the deterministic component of the glottal source, is a challenging question to assess voice quality in clinical use or to model voice production for speech transformation and synthesis using a priori constraints. In this paper, we first describe the Function of Phase-Distortion (FPD) which allows to characterize the shape of the periodic pulses of the glottal source independently of other features of the glottal source. Then, using the FPD, we describe two methods to estimate a shape parameter of the Liljencrants-Fant glottal model. By comparison with state of the art methods using Electro-Glotto-Graphic signals, we show that the one of these method outperform the compared methods.
international conference on acoustics, speech, and signal processing | 2014
George P. Kafentzis; Gilles Degottex; Olivier Rosec; Yannis Stylianou
In this paper, a simple method for pitch-scale modifications of speech based on a recently suggested model for AM-FM decomposition of speech signals, is presented. This model is referred to as the adaptive Harmonic Model (aHM). The aHM models speech as a sum of harmonically related sinusoids that can adapt to the local characteristics of the signal. It was shown that this model provides high quality reconstruction of speech and thus, it can also provide high quality pitch-scale modifications. For the latter, the amplitude envelope is estimated using the Discrete All-Pole (DAP) method, and the phase envelope estimation is performed by utilizing the concept of relative phase. Formal listening tests on a database of several languages show that the synthetic pitch-scaled waveforms are natural and free of some common artefacts encountered in other state-of-the-art models, such as HNM and STRAIGHT.
international conference on acoustics, speech, and signal processing | 2010
Gilles Degottex; Axel Roebel; Xavier Rodet
A new method is proposed to jointly estimate the shape parameter of a glottal model and its time position in a voiced segment. We show that, the idea of phase flatness (or phase minimization) used in the most robust Glottal Closure Instant detection methods can be generalized to estimate the shape of the glottal model. In this paper we evaluate the proposed method using synthetic signals. The reliability related to fundamental frequency and noise is evaluated. The estimation of the glottal source is useful for voice analysis (ex. separation of glottal source and vocal-tract filter), voice transformation and synthesis.
international conference on acoustics, speech, and signal processing | 2013
George P. Kafentzis; Gilles Degottex; Olivier Rosec; Yannis Stylianou
In this paper, a simple method for time-scale modifications of speech based on a recently suggested model for AM-FM decomposition of speech signals, is presented. This model is referred to as the adaptive Harmonic Model (aHM). A full-band speech analysis/synthesis system based on the aHM representation is built, without the necessity of separating a deterministic and/or a stochastic component from the speech signal. The aHM models speech as a sum of harmonically related sinusoids that can adapt to the local characteristics of the signal and provide accurate instantaneous amplitude, frequency, and phase trajectories. Because of the high quality representation and reconstruction of speech, aHM can provide high quality time-scale modifications. Informal listenings show that the synthetic time-scaled waveforms are natural and free of some common artifacts encountered in other state-of-the-art models, such as “metallic quality”, chorusing, or musical noise.