Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hani Camille Yehia is active.

Publication


Featured researches published by Hani Camille Yehia.


Speech Communication | 1998

Quantitative association of vocal-tract and facial behavior

Hani Camille Yehia; Philip E. Rubin; Eric Vatikiotis-Bateson

Abstract This paper examines the degrees of correlation among vocal-tract and facial movement data and the speech acoustics. Multilinear techniques are applied to support the claims that facial motion during speech is largely a by-product of producing the speech acoustics and further that the spectral envelope of the speech acoustics can be better estimated by the 3D motion of the face than by the midsagittal motion of the anterior vocal-tract (lips, tongue and jaw). Experimental data include measurements of the motion of markers placed on the face and in the vocal-tract, as well as the speech acoustics, for two subjects. The numerical results obtained show that, for both subjects, 91% of the total variance observed in the facial motion data could be determined from vocal-tract motion by means of simple linear estimators. For the inverse path, i.e. recovery of vocal-tract motion from facial motion, the results indicate that about 80% of the variance observed in the vocal-tract can be estimated from the face. Regarding the speech acoustics, it is observed that, in spite of the nonlinear relation between vocal-tract geometry and acoustics, linear estimators are sufficient to determine between 72 and 85% (depending on subject and utterance) of the variance observed in the RMS amplitude and LSP parametric representation of the spectral envelope. A dimensionality analysis is also carried out, and shows that between four and eight components are sufficient to represent the mappings examined. Finally, it is shown that even the tongue, which is an articulator not necessarily coupled with the face, can be recovered reasonably well from facial motion since it frequently displays the same kind of temporal pattern as the jaw during speech.


Journal of Phonetics | 2002

Linking facial animation, head motion and speech acoustics

Hani Camille Yehia; Takaaki Kuratate; Eric Vatikiotis-Bateson

Abstract Facial motion during speech is a direct consequence of vocal-tract motion which also shapes the acoustics of speech. This fact suggests that speech acoustics can be used to estimate face motion, and vice versa. Another kinematic–acoustic relation that occurs during production of speech is between head motion and fundamental frequency (F 0). This paper focuses on the development of a system that takes speech acoustics as input, and gives as output the coefficients necessary to animate natural face and head motion. The results obtained are based on simultaneous measurements of face deformation, head motion and speech acoustics collected for two subjects during production of naturalistic sentences and spontaneous speech. The procedure for estimating face motion from speech acoustics first trains nonlinear estimators whose inputs are line spectrum frequency pair coefficients and whose outputs are marker positions on the face. These estimators are then applied to test data. The estimated marker trajectories are objectively compared with their measured counterparts, yielding correlation coefficients between 0.8 and 0.9. Linear estimators are used to relate F 0 and head motion. As F 0 to head motion is a one-to-many problem, constraints must be added for the estimation of head motion. This is done by computation of the co-dependence among head motion components. Finally, measured and estimated face and head motion data are used to animate a naturalistic talking head.


international conference on spoken language processing | 1996

Characterizing audiovisual information during speech

Eric Vatikiotis-Bateson; Kevin G. Munhall; Y. Kasahara; Frederique Garcia; Hani Camille Yehia

Several analyses relating facial motion with perioral muscle behavior and speech acoustics are described. The results suggest that linguistically relevant visual information is distributed over large regions of the face and can be modeled from the same control source as the acoustics.


Computer Communications | 2015

A concise review of the quality of experience assessment for video streaming

Orlewilson Bentes Maia; Hani Camille Yehia; Luciano de Errico

Abstract The widespread use of mobile and high definition video devices is changing Internet traffic, with a significant increase in multimedia content, especially video on demand (VoD) and Internet protocol television (IPTV). However, the success of these services is strongly related to the video quality perceived as by the user, also known as quality of experience (QoE). This paper reviews current methodologies used to evaluate the quality of experience in a video streaming service. A typical video assessment diagram is described, and analyses of the subjective, objective, and hybrid approaches are presented. Finally, considering the moving target scenario of mobile and high definition devices, the text outlines challenges and future research directions that should be considered in the measurement and assessment of the quality of experience for video streaming services.


Journal of the Acoustical Society of America | 2012

Quantifying time-varying coordination of multimodal speech signals using correlation map analysis

Adriano Vilela Barbosa; Rose-Marie Déchaine; Eric Vatikiotis-Bateson; Hani Camille Yehia

This paper demonstrates an algorithm for computing the instantaneous correlation coefficient between two signals. The algorithm is the computational engine for analyzing the time-varying coordination between signals, which is called correlation map analysis (CMA). Correlation is computed around any pair of points in the two input signals. Thus, coordination can be assessed across a continuous range of temporal offsets and be detected even when changing over time due to temporal fluctuations. The correlation algorithm has two major features: (i) it is structurally similar to a tunable filter, requiring only one parameter to set its cutoff frequency (and sensitivity), (ii) it can be applied either uni-directionally (computing correlation based only on previous samples) or bi-directionally (computing correlation based on both previous and future samples). Computing instantaneous correlation for a range of time offsets between two signals produces a 2D correlation map, in which correlation is characterized as a function of time and temporal offset. Graphic visualization of the correlation map provides rapid assessment of how correspondence patterns progress through time. The utility of the algorithm and of CMA are exemplified using the spatial and temporal coordination of various audible and visible components associated with linguistic performance.


BMC Neuroscience | 2013

Brain activity underlying auditory perceptual learning during short period training: simultaneous fMRI and EEG recording

Ana Cláudia Silva de Souza; Hani Camille Yehia; Masa-aki Sato

BackgroundThere is an accumulating body of evidence indicating that neuronal functional specificity to basic sensory stimulation is mutable and subject to experience. Although fMRI experiments have investigated changes in brain activity after relative to before perceptual learning, brain activity during perceptual learning has not been explored. This work investigated brain activity related to auditory frequency discrimination learning using a variational Bayesian approach for source localization, during simultaneous EEG and fMRI recording. We investigated whether the practice effects are determined solely by activity in stimulus-driven mechanisms or whether high-level attentional mechanisms, which are linked to the perceptual task, control the learning process.ResultsThe results of fMRI analyses revealed significant attention and learning related activity in left and right superior temporal gyrus STG as well as the left inferior frontal gyrus IFG. Current source localization of simultaneously recorded EEG data was estimated using a variational Bayesian method. Analysis of current localized to the left inferior frontal gyrus and the right superior temporal gyrus revealed gamma band activity correlated with behavioral performance.ConclusionsRapid improvement in task performance is accompanied by plastic changes in the sensory cortex as well as superior areas gated by selective attention. Together the fMRI and EEG results suggest that gamma band activity in the right STG and left IFG plays an important role during perceptual learning.


Journal of Neuroscience Methods | 2005

Avoiding spectral leakage in objective detection of auditory steady-state evoked responses in the inferior colliculus of rat using coherence

Leonardo Bonato Felix; José Elvano Moraes; Antonio Mauricio Ferreira Leite Miranda de Sá; Hani Camille Yehia; Márcio Flávio Dutra Moraes

Local field potentials (LFP) are bioelectric signals recorded from the brain that reflect neural activity in a high temporal resolution. Separating background activity from that evoked by specific somato-sensory input is a matter of great clinical relevance in neurology. The coherence function is a spectral coefficient that can be used as a detector of periodic responses in noisy environments. Auditory steady-state responses to amplitude-modulated tones generate periodic responses in neural networks that may be accessed by means of coherence between the stimulation signal and the LFP recorded from the auditory pathway. Such signal processing methodology was applied in this work to evaluate in vivo, anaesthetized Wistar rats, activation of neural networks due to single carrier sound stimulation frequencies, as well as to evaluate the effect of different modulating tones in the evoked responses. Our results show that an inappropriate choice of sound stimuli modulating frequencies can compromise coherence analysis, e.g. misleading conclusions due to mathematical artefact of signal processing. Two modulating frequency correction protocols were used: nearest integer and nearest prime number. The nearest prime number correction was successful in avoiding spectral leakage in the coherence analysis of steady-state auditory response, as predicted by Monte Carlo simulations.


international conference on acoustics, speech, and signal processing | 2001

Measuring the relation between speech acoustics and 2D facial motion

Adriano Vilela Barbosa; Hani Camille Yehia

Presents a quantitative analysis of the relation between speech acoustics and the 2D video signal of the facial motion that occurs simultaneously. 2D facial motion is acquired using an ordinary video camera: after digitizing a video sequence, a search algorithm is used for tracking markers painted on the speakers face. Facial motion is represented by the 2D marker trajectories; whereas line spectrum pairs (LSP) coefficients are used to parameterize the speech acoustics. LSP coefficients and the marker trajectories are then used to train time-invariant and time-varying linear models, as well as nonlinear (neural network) models. These models are used to evaluate to what extent 2D facial motion is determined from speech acoustics. The correlation coefficients between measured and estimated trajectories are as high as 0.95. This estimation of facial motion from speech acoustics indicates a way to integrate audio and visual signals for efficient audio-visual speech coding.


international conference on acoustics, speech, and signal processing | 1997

A parametric three-dimensional model of the vocal-tract based on MRI data

Hani Camille Yehia; Mark Tiede

Twenty four three-dimensional (3D) vocal-tract (VT) shapes extracted from MRI data are used to derive a parametric model for the vocal-tract. The method is as follows: first, each 3D VT shape is sampled using a semi-cylindrical grid whose position is determined by reference points based on the VT anatomy. After that, the VT projections onto each plane of the grid are represented by their two main components obtained via principal component analysis (PCA). PCA is once again used to parametrize the sequences of coefficients that represent the sections along the tract. It was verified that the first four components can explain about 90% of the total variance of the observed shapes. Following this procedure, 3D VT shapes are approximated by linear combinations of four 3D basis functions. Finally, it is shown that the four parameters of the model can be estimated from the VT midsagittal profiles.


Journal of the Acoustical Society of America | 1996

A shape‐based approach to vocal tract area function estimation

Mark Tiede; Hani Camille Yehia

In default of a method for obtaining dynamic/time‐varying 3‐D vocal tract data, it remains the case that the midsagittal profile provides the best characterization of tract articulators. For this reason articulatory approaches to speech synthesis typically model the vocal tract as a series of concatenated tubes whose cross‐sectional areas are related by some heuristic (the area function) to the midsagittal cross dimensions. But while areas alone are adequate for lower formants, accurate modeling of higher frequencies requires access to details of tract morphology. Previous work based on MRI volume parametrization from a single subject [Tiede et al., Proc. ETRW‐SPM, 41–44 (1996)] has demonstrated the feasibility of recovering correlated cross‐sectional shapes from the midsagittal profile alone. In the current study this approach is extended to analysis of four English and four Japanese subjects (five MRI‐scanned vowels per subject). The results show that characteristic tract shapes for vowels can be recove...

Collaboration


Dive into the Hani Camille Yehia's collaboration.

Top Co-Authors

Avatar

Eric Vatikiotis-Bateson

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar

Mauricio Loureiro

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Adriano Vilela Barbosa

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Eric Vatikiotis-Bateson

University of British Columbia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ana Cláudia Silva de Souza

Universidade Federal de São João del-Rei

View shared research outputs
Top Co-Authors

Avatar

Euler C. F. Teixeira

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Gustavo Fernandes Rodrigues

Universidade Federal de Minas Gerais

View shared research outputs
Top Co-Authors

Avatar

Hugo Bastos de Paula

Pontifícia Universidade Católica de Minas Gerais

View shared research outputs
Researchain Logo
Decentralizing Knowledge