Benjamin Picart | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Benjamin Picart is active.

Explore More

Publication

Featured researches published by Benjamin Picart.

spoken language technology workshop | 2012

Reactive and continuous control of HMM-based speech synthesis

Maria Astrinaki; Nicolas D'Alessandro; Benjamin Picart; Thomas Drugman; Thierry Dutoit

In this paper, we present a modified version of HTS, called performative HTS or pHTS. The objective of pHTS is to enhance the control ability and reactivity of HTS. pHTS reduces the phonetic context used for training the models and generates the speech parameters within a 2-label window. Speech waveforms are generated on-the-fly and the models can be re-actively modified, impacting the synthesized speech with a delay of only one phoneme. It is shown that HTS and pHTS have comparable output quality. We use this new system to achieve reactive model interpolation and conduct a new test where articulation degree is modified within the sentence.

international conference on acoustics, speech, and signal processing | 2010

Analysis of phone posterior feature space exploiting class-specific sparsity and MLP-based similarity measure

Afsaneh Asaei; Benjamin Picart

Class posterior distributions have recently been used quite successfully in Automatic Speech Recognition (ASR), either for frame or phone level classification or as acoustic features, which can be further exploited (usually after some “ad hoc” transformations) in different classifiers (e.g., in Gaussian Mixture based HMMs). In the present paper, we show preliminary results showing that it may be possible to perform speech recognition without explicit subword unit (phone) classification or likelihood estimation, simply answering the question whether two acoustic (posterior) vectors belong to the same subword unit class or not. In this paper, we first exhibit specific properties of the posterior acoustic space before showing how those properties can be exploited to reach very high performance in deciding (based on an appropriate, trained, distance metric, and hypothesis testing approaches) whether two posterior vectors belong to the same class or not. Performance as high as 90% correct decision rates are reported on the TIMIT database, before reporting kNN phone classification rates.

Computer Speech & Language | 2014

Analysis and HMM-based synthesis of hypo and hyperarticulated speech

Benjamin Picart; Thomas Drugman; Thierry Dutoit

Hypo and hyperarticulation refer to the production of speech with respectively a reduction and an increase of the articulatory efforts compared to the neutral style. Produced consciously or not, these variations of articulatory efforts depend upon the surrounding environment, the communication context and the motivation of the speaker with regard to the listener. The goal of this work is to integrate hypo and hyperarticulation into speech synthesizers, such that they are more realistic by automatically adapting their way of speaking to the contextual situation, like humans do. Based on our preliminary work, this paper provides a thorough and detailed study on the analysis and synthesis of hypo and hyperarticulated speech. It is divided into three parts. In the first one, we focus on both acoustic and phonetic modifications due to articulatory effort changes. The second part aims at developing a HMM-based speech synthesizer allowing a continuous control of the degree of articulation. This requires to first tackle the issue of speaking style adaptation to derive hypo and hyperarticulated speech from the neutral synthesizer. Once this is done, an interpolation and extrapolation of the resulting models enables to finely tune the voice so that it is generated with the desired articulatory efforts. Finally the third and last part focuses on a perceptual study of speech with a variable articulation degree, where it is analyzed how intelligibility and various other voice dimensions are affected.

international conference on acoustics, speech, and signal processing | 2015

Analysis and automatic recognition of Human BeatBox sounds: A comparative study

Benjamin Picart; Sandrine Brognaux; Stéphane Dupont

“Human BeatBox” (HBB) is a newly expanding contemporary singing style where the vocalist imitates drum beats percussive sounds as well as pitched musical instrument sounds. Drum sounds typically use a notation based on plosives and fricatives, and instrument sounds cover vocalisations that go beyond spoken language vowels. HBB hence constitutes an interesting use case for expanding techniques initially developed for speech processing, with the goal of automatically annotating performances as well as developing new sound effects dedicated to HBB performers. In this paper, we investigate three complementary aspects of HBB analysis: pitch tracking, onset detection, and automatic recognition of sounds and instruments. As a first step, a new high-quality HBB audio database has been recorded, carefully segmented and annotated manually to obtain a ground truth reference. Various pitch tracking and onset detection methods are then compared and assessed against this reference. Finally, Hidden Markov Models are evaluated, together with an exploration of their parameters space, for the automatic recognition of different types of sounds. This study exhibits very encouraging experimental results.

Neurocomputing | 2014

HMM-based speech synthesis with various degrees of articulation: A perceptual study

Benjamin Picart; Thomas Drugman; Thierry Dutoit

HMM-based speech synthesis is very convenient for creating a synthesizer whose speaker characteristics and speaking styles can be easily modified. This can be obtained by adapting a source speakers model to a target speakers model, using intra-speaker voice adaptation techniques. In this paper, we focus on high-quality HMM-based speech synthesis integrating various degrees of articulation, and more specifically on the internal mechanisms leading to the perception of the degrees of articulation by listeners. Therefore the process of adapting a neutral speech synthesizer to generate hypo and hyperarticulated speech is broken down into four factors: cepstrum, prosody, phonetic transcription adaptation as well as the complete adaptation. The impact of these factors on the perceived degree of articulation is studied. Moreover, this study is complemented with an Absolute Category Rating (ACR) evaluation, allowing the subjective assessment of hypo/hyperarticulated speech through various dimensions: comprehension, non-monotony, fluidity and pronunciation. This paper quantifies the importance of prosody and cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions.

non-linear speech processing | 2011

Perceptual effects of the degree of articulation in HMM-based speech synthesis

Benjamin Picart; Thomas Drugman; Thierry Dutoit

This paper focuses on the understanding of the effects leading to high-quality HMM-based speech synthesis with various degrees of articulation. The adaptation of a neutral speech synthesizer to generate hypo and hyperarticulated speech is first performed. The impact of cepstral adaptation, of prosody, of phonetic transcription as well as the adaptation technique on the perceived degree of articulation is studied. For this, a subjective evaluation is conducted. It is shown that highquality hypo and hyperarticulated speech synthesis requires the use of an efficient adaptation such as CMLLR. Moreover, in addition to prosody adaptation, the importance of cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyper-articulated phonetic transcriptions is assessed.

Proceedings of the 3rd International Symposium on Movement and Computing | 2016

The i-Treasures Intangible Cultural Heritage dataset

Nikos Grammalidis; Kosmas Dimitropoulos; Filareti Tsalakanidou; Alexandros Kitsikidis; Pierre Roussel; Bruce Denby; Patrick Chawah; Lise Crevier Buchman; Stéphane Dupont; Sohaib Laraba; Benjamin Picart; Mickaël Tits; Joëlle Tilmanne; Stelios Hadjidimitriou; Vasileios Charisis; Christina Volioti; Athanasia Stergiaki; Athanasios Manitsaris; Odysseas bouzos; Sotiris Manitsaris

In this paper, we introduce the i-Treasures Intangible Cultural Heritage (ICH) dataset, a freely available collection of multimodal data captured from different forms of rare ICH. More specifically, the dataset contains video, audio, depth, motion capture data and other modalities, such as EEG or ultrasound data. It also includes (manual) annotations of data, while in some cases additional features and metadata are provided, extracted using algorithms and modules developed within the i-Treasures project. We describe the creation process (sensors, capture setups and modules used), the dataset content and the associated annotations. An attractive feature of this ICH Database is that its the first of its kind, providing annotated multimodal data for a wide range of rare ICH types. Finally, some conclusions are drawn and the future development of the dataset is discussed.

IEEE Journal of Selected Topics in Signal Processing | 2014

Automatic Variation of the Degree of Articulation in New HMM-Based Voices

Benjamin Picart; Thomas Drugman; Thierry Dutoit

This paper focuses on the automatic modification of the degree of articulation (hypo and hyperarticulation) of an existing standard neutral voice in the framework of HMM-based speech synthesis. Hypo and hyperarticulation refer to the production of speech respectively with a reduction and an increase of the articulatory efforts compared to the neutral style. Starting from a source speaker for which neutral, hypo and hyperarticulated speech data are available, statistical transformations are computed during the adaptation of the neutral speech synthesizer. These transformations are then applied to a new target speaker for which no hypo or hyperarticulated recordings are available. Four statistical methods are investigated, differing in the speaking style adaptation technique (model-space Linear Scaling LS vs. CMLLR) and in the speaking style transposition approach (phonetic vs. acoustic correspondence) they use. The efficiency of these techniques is assessed for the transposition of prosody and of filter coefficients separately. Besides we investigate which representation of the spectral envelope is the most suited for this purpose: MGC, LSP, PARCOR and LAR coefficients. Subjective evaluations are performed in order to determine which statistical transformation method achieves the highest performance in terms of segmental quality, reproduction of the articulation degree and speaker identity preservation. The most successful method is finally used for automatically modifying the degree of articulation of existing standard neutral voices.

spoken language technology workshop | 2012

Statistical methods for varying the degree of articulation in new HMM-based voices

Benjamin Picart; Thomas Drugman; Thierry Dutoit

This paper focuses on the automatic modification of the degree of articulation (hypo/hyperarticulation) of an existing standard neutral voice in the framework of HMM-based speech synthesis. Starting from a source speaker for which neutral, hypo and hyperarticulated speech data are available, two sets of transformations are computed during the adaptation of the neutral speech synthesizer. These transformations are then applied to a new target speaker for which no hypo/hyperarticulated recordings are available. Four statistical methods are investigated, differing in the speaking style adaptation technique (MLLR vs. CMLLR) and in the speaking style transposition approach (phonetic vs. acoustic correspondence) they use. This study focuses on the prosody model although such techniques can be applied to any stream of parameters exhibiting suited interpolability properties. Two subjective evaluations are performed in order to determine which statistical transformation method achieves the better segmental quality and reproduction of the articulation degree.

Mixed Reality and Gamification for Cultural Heritage | 2017

Intangible Cultural Heritage and New Technologies: Challenges and Opportunities for Cultural Preservation and Development

Marilena Alivizatou-Barakou; Alexandros Kitsikidis; Filareti Tsalakanidou; Kosmas Dimitropoulos; Chantas Giannis; Spiros Nikolopoulos; Samer Al Kork; Bruce Denby; Lise Crevier Buchman; Martine Adda-Decker; Claire Pillot-Loiseau; Joëlle Tillmane; Stéphane Dupont; Benjamin Picart; Francesca Pozzi; Michela Ott; Yilmaz Erdal; Vasileios Charisis; Stelios Hadjidimitriou; Marius Cotescu; Christina Volioti; Athanasios Manitsaris; Sotiris Manitsaris; Nikos Grammalidis

Intangible cultural heritage (ICH) is a relatively recent term coined to represent living cultural expressions and practices, which are recognised by communities as distinct aspects of identity. The safeguarding of ICH has become a topic of international concern primarily through the work of United Nations Educational, Scientific and Cultural Organization (UNESCO). However, little research has been done on the role of new technologies in the preservation and transmission of intangible heritage. This chapter examines resources, projects and technologies providing access to ICH and identifies gaps and constraints. It draws on research conducted within the scope of the collaborative research project, i-Treasures. In doing so, it covers the state of the art in technologies that could be employed for access, capture and analysis of ICH in order to highlight how specific new technologies can contribute to the transmission and safeguarding of ICH.

Explore More