Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Slim Ouni is active.

Publication


Featured researches published by Slim Ouni.


Journal of the Acoustical Society of America | 2005

Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion

Slim Ouni; Yves Laprie

Acoustic-to-articulatory inversion is a difficult problem mainly because of the nonlinearity between the articulatory and acoustic spaces and the nonuniqueness of this relationship. To resolve this problem, we have developed an inversion method that provides a complete description of the possible solutions without excessive constraints and retrieves realistic temporal dynamics of the vocal tract shapes. We present an adaptive sampling algorithm to ensure that the acoustical resolution is almost independent of the region in the articulatory space under consideration. This leads to a codebook that is organized in the form of a hierarchy of hypercubes, and ensures that, within each hypercube, the articulatory-to-acoustic mapping can be approximated by means of a linear transform. The inversion procedure retrieves articulatory vectors corresponding to acoustic entries from the hypercube codebook. A nonlinear smoothing algorithm together with a regularization technique is then used to recover the best articulatory trajectory. The inversion ensures that inverse articulatory parameters generate original formant trajectories with high precision and a realistic sequence of the vocal tract shapes.


Eurasip Journal on Audio, Speech, and Music Processing | 2007

Visual contribution to speech perception: measuring the intelligibility of animated talking heads

Slim Ouni; Michael M. Cohen; Hope Ishak; Dominic W. Massaro

Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. In three experiments, we extend and test the Sumby and Pollack (1954) metric to allow the comparison of an agent relative to a standard or reference, and also propose a new metric based on the fuzzy logical model of perception (FLMP) to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. A valid metric would allow direct comparisons accross different experiments and would give measures of the benfit of a synthetic animated face relative to a natural face (or indeed any two conditions) and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences), different individuals, and applications.


Speech Communication | 2005

Training Baldi to be multilingual: A case study for an Arabic Badr

Slim Ouni; Michael M. Cohen; Dominic W. Massaro

In this paper, we describe research to extend the capability of an existing talking head, Baldi, to be multilingual. We use parsimonious client/server architecture to impose autonomy in the functioning of an auditory speech module and a visual speech synthesis module. This scheme enables the implementation and the joint application of text-to-speech synthesis and facial animation in many languages simultaneously. Additional languages can be added to the system by defining a unique phoneme set and unique phoneme definitions for the visible speech for each language. The accuracy of these definitions is tested in perceptual experiments in which human observers identify auditory speech in noise presented alone or paired with the synthetic versus a comparable natural face. We illustrate the development of an Arabic talking head, Badr, and demonstrate how the empirical evaluation enabled the improvement of the visible speech synthesis from one version to another.


hawaii international conference on system sciences | 2005

A Multilingual Embodied Conversational Agent

Dominic W. Massaro; Slim Ouni; Michael M. Cohen; Rashid Clark

In our paper last year, we described our language-training program, which utilizes Baldi as a tutor, who guides students through a variety of exercises designed to teach vocabulary and grammar, to improve speech articulation, and to develop linguistic and phonological awareness. In this paper, we describe the technology of Baldi in more detail, and how it has been enhanced along a number of dimensions. First, Baldi is now multilingual and is capable of acquiring new languages fairly easily. Second, Baldi has grown a body to allow the communication of communicative gesture.


Journal of the Acoustical Society of America | 2008

Incorporation of phonetic constraints in acoustic-to-articulatory inversion

Blaise Potard; Yves Laprie; Slim Ouni

This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maedas articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model.


Computer Assisted Language Learning | 2014

Tongue control and its implication in pronunciation training

Slim Ouni

Pronunciation training based on speech production techniques illustrating tongue movements is gaining popularity. However, there is not sufficient evidence that learners can imitate some tongue animation. In this paper, we argue that although controlling tongue movement related to speech is not such an easy task, training with visual feedback improves its control. We investigated human awareness of controlling their tongue body gestures. In a first experiment, participants were asked to perform some tongue movements composed of two sets of gestures. This task was evaluated by observing ultrasound imaging of the tongue recorded during the experiment. No feedback was provided. In a second experiment, a short session of training was added where participants can observe ultrasound imaging in real time of their own tongue movements. The goal was to increase their awareness of their tongue gestures. A pretest and posttest were carried out without any feedback. The results suggest that without a priori knowledge, it is not easy to finely control tongue body gestures. The second experiment showed that we gained in performance after a short training session and this suggests that providing visual feedback, even a short one, improves tongue gesture awareness.


Journal of the Acoustical Society of America | 2011

Estimating the control parameters of an articulatory model from electromagnetic articulograph data

Asterios Toutios; Slim Ouni; Yves Laprie

Finding the control parameters of an articulatory model that result in given acoustics is an important problem in speech research. However, one should also be able to derive the same parameters from measured articulatory data. In this paper, a method to estimate the control parameters of the the model by Maeda from electromagnetic articulography (EMA) data, which allows the derivation of full sagittal vocal tract slices from sparse flesh-point information, is presented. First, the articulatory grid system involved in the models definition is adapted to the speaker involved in the experiment, and EMA data are registered to it automatically. Then, articulatory variables that correspond to measurements defined by Maeda on the grid are extracted. An initial solution for the articulatory control parameters is found by a least-squares method, under constraints ensuring vocal tract shape naturalness. Dynamic smoothness of the parameter trajectories is then imposed by a variational regularization method. Generated vocal tract slices for vowels are compared with slices appearing in magnetic resonance images of the same speaker or found in the literature. Formants synthesized on the basis of these generated slices are adequately close to those tracked in real speech recorded concurrently with EMA.


Eurasip Journal on Audio, Speech, and Music Processing | 2013

Acoustic-visual synthesis technique using bimodal unit-selection

Slim Ouni; Vincent Colotte; Utpala Musti; Asterios Toutios; Brigitte Wrobel-Dautcourt; Marie-Odile Berger; Caroline Lavecchia

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynamics of the face rather than on rendering. The proposed technique overcomes the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. The bimodal synthesis was evaluated using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels.


Journal of the Acoustical Society of America | 2013

An episodic memory-based solution for the acoustic-to-articulatory inversion problem

Sébastien Demange; Slim Ouni

This paper presents an acoustic-to-articulatory inversion method based on an episodic memory. An episodic memory is an interesting model for two reasons. First, it does not rely on any assumptions about the mapping function but rather it relies on real synchronized acoustic and articulatory data streams. Second, the memory inherently represents the real articulatory dynamics as observed. It is argued that the computational models of episodic memory, as they are usually designed, cannot provide a satisfying solution for the acoustic-to-articulatory inversion problem due to the insufficient quantity of training data. Therefore, an episodic memory is proposed, called generative episodic memory (G-Mem), which is able to produce articulatory trajectories that do not belong to the set of episodes the memory is based on. The generative episodic memory is evaluated using two electromagnetic articulography corpora: one for English and one for French. Comparisons with a codebook-based method and with a classical episodic memory (which is termed concatenative episodic memory) are presented in order to evaluate the proposed generative episodic memory in terms of both its modeling of articulatory dynamics and its generalization capabilities. The results show the effectiveness of the method where an overall root-mean-square error of 1.65 mm and a correlation of 0.71 are obtained for the G-Mem method. They are comparable to those of methods recently proposed.


international conference on acoustics, speech, and signal processing | 2011

Acoustic-to-articulatory inversion using an episodic memory

Sébastien Demange; Slim Ouni

This paper presents a new acoustic-to-articulatory inversion method-based on an episodic memory, which is an interesting model for two reasons. First, it does not rely on any assumptions about the mapping function but rather it relies on real synchronized acoustic and articulatory data streams. Second, the memory structurally embeds the naturalness of the articulatory dynamics. In addition, we introduce the concept of generative episodic memory, which enables the production of unseen articulatory trajectories according to the acoustic signals to be inverted. The proposed memory is evaluated on the MOCHA corpus. The results show its effectiveness and are very encouraging since they are comparable to those of recently proposed methods.

Collaboration


Dive into the Slim Ouni's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Asterios Toutios

University of Southern California

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Michael M. Cohen

University of Wisconsin-Madison

View shared research outputs
Top Co-Authors

Avatar

Mohamed Embarki

University of Franche-Comté

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Anne Bonneau

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar

Alexandra Jesse

University of Massachusetts Amherst

View shared research outputs
Researchain Logo
Decentralizing Knowledge