Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Vincent Colotte is active.

Publication


Featured researches published by Vincent Colotte.


international conference on acoustics, speech, and signal processing | 2000

Automatic enhancement of speech intelligibility

Vincent Colotte; Yves Laprie

This paper presents a speech signal transformation which slows down speech signals selectively and enhances some important acoustic cues. This transformation can be used not only for hearing aids but also for second language acquisition by facilitating oral comprehension. Selective slowing down relies on the use of the TD-PSOLA synthesis method. An automatic pitch marking algorithm was designed to apply this method automatically. The strategy used to control slowing down exploits a spectral variation function which locates rapid spectral changes. The enhancement simply consists of amplifying stop bursts and unvoiced fricatives. These acoustic cues are detected automatically through the examination of energy criteria. This approach was evaluated in the context of second language acquisition, more precisely by evaluating improvements in oral comprehension. Transformations triggered properly, i.e. the signal regions modified are those which were expected to be modified. Experiments show that the oral comprehension is improved.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Acoustic-visual synthesis technique using bimodal unit-selection

Slim Ouni; Vincent Colotte; Utpala Musti; Asterios Toutios; Brigitte Wrobel-Dautcourt; Marie-Odile Berger; Caroline Lavecchia

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynamics of the face rather than on rendering. The proposed technique overcomes the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. The bimodal synthesis was evaluated using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels.


International Conference on Statistical Language and Speech Processing | 2018

DNN-Based Speech Synthesis for Arabic: Modelling and Evaluation

Amal Houidhek; Vincent Colotte; Zied Mnasri; Denis Jouvet

This paper investigates the use of deep neural networks (DNN) for Arabic speech synthesis. In parametric speech synthesis, whether HMM-based or DNN-based, each speech segment is described with a set of contextual features. These contextual features correspond to linguistic, phonetic and prosodic information that may affect the pronunciation of the segments. Gemination and vowel quantity (short vowel vs. long vowel) are two particular and important phenomena in Arabic language. Hence, it is worth investigating if those phenomena must be handled by using specific speech units, or if their specification in the contextual features is enough. Consequently four modelling approaches are evaluated by considering geminated consonants (respectively long vowels) either as fully-fledged phoneme units or as the same phoneme as their simple (respectively short) counterparts. Although no significant difference has been observed in previous studies relying on HMM-based modelling, this paper examines these modelling variants in the framework of DNN-based speech synthesis. Listening tests are conducted to evaluate the four modelling approaches, and to assess the performance of DNN-based Arabic speech synthesis with respect to previous HMM-based approach.


conference of the international speech communication association | 2016

Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech

Slim Ouni; Vincent Colotte; Sara Dahmani; Soumaya Azzi

Within the framework of developing an expressive audiovisual speech synthesis, an acoustic and visual analysis of expressive acted speech is proposed in this paper. Our purpose is to identify the main characteristics of audiovisual expressions that need to be integrated during synthesis to provide believable emotions to the virtual 3D talking head. We conducted a case study of a semi-professional actor who uttered a set of sentences for 6 different emotions in addition to neutral speech. We have recorded concurrently audio and motion capture data. The acoustic and the visual data have been analyzed. The main finding is that although some expressions are not well identified, some expressions were well characterized and tied in both acoustic and visual space.


Proceedings of the 3rd Symposium on Facial Analysis and Animation | 2012

ViSAC: acoustic-visual speech synthesis: the system and its evaluation

Utpala Musti; Caroline Lavecchia; Vincent Colotte; Slim Ouni; Brigitte Wrobel-Dautcourt; Marie-Odile Berger

In the vast majority of recent works, data-driven audiovisual speech synthesis, i.e., the generation of face animation together with the corresponding acoustic speech, is still considered as the synchronization of two independent sources: synthesized acoustic speech (or natural speech aligned with text) and the face animation. However, achieving perfect synchronization between these two streams is not straightforward and presents several challenges related to audio-visual intelligibility. In our work, we achieve synthesis with its acoustic and visible components simultaneously. The bimodal signal is considered as one signal with two channels: acoustic and visual. This bimodality is kept during the whole synthesis process. The setup is similar to a typical concatenative (acoustic-only) speech synthesis setup, with the difference that here, the units to be concatenated consist of visual information alongside acoustic information. The concatenation unit adopted in our work is the diphone. The advantage of choosing diphones is that the major part of coarticulation phenomena is captured locally in the middle of the unit and the concatenation is made at the boundaries, which are acoustically and visually steadier. Actually, this choice is in accordance with current practices in concatenative speech synthesis.


european signal processing conference | 1998

Automatic pitch marking for speech transformations via TD-PSOLA

Yves Laprie; Vincent Colotte


language resources and evaluation | 2014

Designing a Bilingual Speech Corpus for French and German Language Learners: a Two-Step Process

Camille Fauth; Anne Bonneau; Frank Zimmerer; Juergen Trouvain; Bistra Andreeva; Vincent Colotte; Dominique Fohr; Denis Jouvet; Jeanin J"ugler; Yves Laprie; Odile Mella; Bernd M"obius


european signal processing conference | 2002

Higher precision pitch marking for TD-PSOLA

Vincent Colotte; Yves Laprie


Archive | 2013

Designing a bilingual speech corpus for French and German language learners

Jürgen Trouvain; Yves Laprie; Bernd Möbius; Bistra Andreeva; Anne Bonneau; Vincent Colotte; Camille Fauth; Dominique Fohr; Denis Jouvet; Odile Mella; Jeanin Jügler; Frank Zimmerer


Archive | 2011

Automatic Feedback for L2 Prosody Learning

Anne Bonneau; Vincent Colotte

Collaboration


Dive into the Vincent Colotte's collaboration.

Top Co-Authors

Avatar

Anne Bonneau

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Slim Ouni

University of Lorraine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Asterios Toutios

University of Southern California

View shared research outputs
Top Co-Authors

Avatar

Camille Fauth

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Odile Mella

University of Lorraine

View shared research outputs
Researchain Logo
Decentralizing Knowledge