Yvonne Leung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Yvonne Leung is active.

Explore More

Publication

Featured researches published by Yvonne Leung.

International Journal of Human-computer Studies \/ International Journal of Man-machine Studies | 2013

Evaluating a synthetic talking head using a dual task: Modality effects on speech understanding and cognitive load

Catherine J. Stevens; Guillaume Gibert; Yvonne Leung; Zhengzhi Zhang

The dual task is a data-rich paradigm for evaluating speech modes of a synthetic talking head. Three experiments manipulated auditory-visual (AV) and auditory-only (A-only) speech produced by text-to-speech synthesis from a talking head (Experiment 1-single task; Experiment 2-dual task), and natural speech produced by a human male similar in appearance to the talking head (Experiment 3-dual task). In a dual task, participants perform two tasks concurrently with a secondary reaction time (RT) task sensitive to cognitive processing demands of the primary task. In the primary task, participants either shadowed words or named the superordinate categories to which words belonged under AV (dynamic face with lips moving) or A-only (static face) speech modes. First, it was hypothesized that category naming is more difficult than shadowing. The hypothesis was supported in each experiment with significantly longer latencies on the primary task and slower RT on the secondary task. Second, an AV advantage was hypothesized and supported by significantly shorter latencies for the AV modality on the primary task of Experiment 3 and with partial support in Experiment 1. Third, it was hypothesized that while the AV modality helps it also creates great cognitive load. Significantly longer RT for AV presentation in the secondary tasks supported this hypothesis. The results indicate that task difficulty influences speech perception. Performance on a secondary task can reveal cognitive demand that is not evident in a single task or self-report ratings. A dual task will be an effective evaluation tool in operational environments where multiple tasks are conducted (e.g., responding to spoken directions and monitoring displays) and an implicit, sensitive measure of cognitive load is imperative.

Speech Communication | 2013

Control of speech-related facial movements of an avatar from video

Guillaume Gibert; Yvonne Leung; Catherine J. Stevens

Several puppetry techniques have been recently proposed to transfer emotional facial expressions to an avatar from a users video. Whereas generation of facial expressions may not be sensitive to small tracking errors, generation of speech-related facial movements would be severely impaired. Since incongruent facial movements can drastically influence speech perception, we proposed a more effective method to transfer speech-related facial movements from a user to an avatar. After a facial tracking phase, speech articulatory parameters (controlling the jaw and the lips) were determined from the set of landmark positions. Two additional processes calculated the articulatory parameters which controlled the eyelids and the tongue from the 2D Discrete Cosine Transform coefficients of the eyes and inner mouth images. A speech in noise perception experiment was conducted on 25 participants to evaluate the system. Increase in intelligibility was shown for the avatar and human auditory-visual conditions compared to the avatar and human auditory-only conditions, respectively. Depending on the vocalic context, the results of the avatar auditory-visual presentation were different: all the consonants were better perceived in /a/ vocalic context compared to /i/ and /u/ because of the lack of depth information retrieved from video. This method could be used to accurately animate avatars for hearing impaired people using information technologies and telecommunication.

Frontiers in Psychology | 2017

White Matter Correlates of Musical Anhedonia: Implications for Evolution of Music

Psyche Loui; Sean Patterson; Matthew E. Sachs; Yvonne Leung; Tima Zeng; Emily Przysinda

Recent theoretical advances in the evolution of music posit that affective communication is an evolutionary function of music through which the mind and brain are transformed. A rigorous test of this view should entail examining the neuroanatomical mechanisms for affective communication of music, specifically by comparing individual differences in the general population with a special population who lacks specific affective responses to music. Here we compare white matter connectivity in BW, a case with severe musical anhedonia, with a large sample of control subjects who exhibit normal variability in reward sensitivity to music. We show for the first time that structural connectivity within the reward system can predict individual differences in musical reward in a large population, but specific patterns in connectivity between auditory and reward systems are special in an extreme case of specific musical anhedonia. Results support and extend the Mixed Origins of Music theory by identifying multiple neural pathways through which music might operate as an affective signaling system.

PLOS ONE | 2016

What constitutes a phrase in sound-based music? A mixed-methods investigation of perception and acoustics

Kirk N. Olsen; Roger T. Dean; Yvonne Leung

Phrasing facilitates the organization of auditory information and is central to speech and music. Not surprisingly, aspects of changing intensity, rhythm, and pitch are key determinants of musical phrases and their boundaries in instrumental note-based music. Different kinds of speech (such as tone- vs. stress-languages) share these features in different proportions and form an instructive comparison. However, little is known about whether or how musical phrasing is perceived in sound-based music, where the basic musical unit from which a piece is created is commonly non-instrumental continuous sounds, rather than instrumental discontinuous notes. This issue forms the target of the present paper. Twenty participants (17 untrained in music) were presented with six stimuli derived from sound-based music, note-based music, and environmental sound. Their task was to indicate each occurrence of a perceived phrase and qualitatively describe key characteristics of the stimulus associated with each phrase response. It was hypothesized that sound-based music does elicit phrase perception, and that this is primarily associated with temporal changes in intensity and timbre, rather than rhythm and pitch. Results supported this hypothesis. Qualitative analysis of participant descriptions showed that for sound-based music, the majority of perceived phrases were associated with intensity or timbral change. For the note-based piano piece, rhythm was the main theme associated with perceived musical phrasing. We modeled the occurrence in time of perceived musical phrases with recurrent event ‘hazard’ analyses using time-series data representing acoustic predictors associated with intensity, spectral flatness, and rhythmic density. Acoustic intensity and timbre (represented here by spectral flatness) were strong predictors of perceived musical phrasing in sound-based music, and rhythm was only predictive for the piano piece. A further analysis including five additional spectral measures linked to timbre strengthened the models. Overall, results show that even when little of the pitch and rhythm information important for phrasing in note-based music is available, phrasing is still perceived, primarily in response to changes of intensity and timbre. Implications for electroacoustic music composition and music recommender systems are discussed.

Computational Cognitive Science | 2016

Mimicry and expressiveness of an ECA in human-agent interaction : familiarity breeds content!

Catherine J. Stevens; Bronwyn Pinchbeck; Trent W. Lewis; Martin H. Luerssen; Darius Pfitzner; David M. W. Powers; Arman Abrahamyan; Yvonne Leung; Guillaume Gibert

Background Two experiments investigated the effect of features of human behaviour on the quality of interaction with an Embodied Conversational Agent (ECA). Methods In Experiment 1, visual prominence cues (head nod, eyebrow raise) of the ECA were manipulated to explore the hypothesis that likeability of an ECA increases as a function of interpersonal mimicry. In the context of an error detection task, the ECA either mimicked or did not mimic a head nod or brow raise that humans produced to give emphasis to a word when correcting the ECA’s vocabulary. In Experiment 2, presence versus absence of facial expressions on comprehension accuracy of two computer-driven ECA monologues was investigated. Results In Experiment 1, evidence for a positive relationship between ECA mimicry and lifelikeness was obtained. However, a mimicking agent did not elicit more human gestures. In Experiment 2, expressiveness was associated with greater comprehension and higher ratings of humour and engagement. Conclusion Influences from mimicry can be explained by visual and motor simulation, and bidirectional links between similarity and liking. Cue redundancy and minimizing cognitive load are potential explanations for expressiveness aiding comprehension. Electronic supplementary material The online version of this article (doi:10.1186/s40469-016-0008-2) contains supplementary material, which is available to authorized users.

Computational Cognitive Science | 2015

Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

Guillaume Gibert; Kirk N. Olsen; Yvonne Leung; Catherine J. Stevens

Background Virtual humans have become part of our everyday life (movies, internet, and computer games). Even though they are becoming more and more realistic, their speech capabilities are, most of the time, limited and not coherent and/or not synchronous with the corresponding acoustic signal. Methods We describe a method to convert a virtual human avatar (animated through key frames and interpolation) into a more naturalistic talking head. In fact, speech articulation cannot be accurately replicated using interpolation between key frames and talking heads with good speech capabilities are derived from real speech production data. Motion capture data are commonly used to provide accurate facial motion for visible speech articulators (jaw and lips) synchronous with acoustics. To access tongue trajectories (partially occluded speech articulator), electromagnetic articulography (EMA) is often used. We recorded a large database of phonetically-balanced English sentences with synchronous EMA, motion capture data, and acoustics. An articulatory model was computed on this database to recover missing data and to provide ‘normalized’ animation (i.e., articulatory) parameters. In addition, semi-automatic segmentation was performed on the acoustic stream. A dictionary of multimodal Australian English diphones was created. It is composed of the variation of the articulatory parameters between all the successive stable allophones. Results The avatar’s facial key frames were converted into articulatory parameters steering its speech articulators (jaw, lips and tongue). The speech production database was used to drive the Embodied Conversational Agent (ECA) and to enhance its speech capabilities. A Text-To-Auditory Visual Speech synthesizer was created based on the MaryTTS software and on the diphone dictionary derived from the speech production database. Conclusions We describe a method to transform an ECA with generic tongue model and animation by key frames into a talking head that displays naturalistic tongue, jaw and lip motions. Thanks to a multimodal speech production database, a Text-To-Auditory Visual Speech synthesizer drives the ECA’s facial movements enhancing its speech capabilities. Electronic supplementary material The online version of this article (doi:10.1186/s40469-015-0007-8) contains supplementary material, which is available to authorized users.

intelligent virtual agents | 2011

A flexible dual task paradigm for evaluating an embodied conversational agent: modality effects and reaction time as an index of cognitive load

Catherine J. Stevens; Guillaume Gibert; Yvonne Leung; Zhengzhi Zhang

A new experimental method based on the dual task paradigm is used to evaluate speech intelligibility of an embodied conversational agent (ECA). The experiment consists of the manipulation of auditory-visual (AV) versus auditory-only (A) presentation of speech. In the dual task, participants perform two tasks concurrently. The secondary task is sensitive to cognitive processing demands of the primary task. In the primary task participants either shadowed words or named the superordinate categories to which words belonged, as the word items were spoken by the ECA under A or AV conditions. Reaction time (RT) on the secondary task-swatting a fly on the ECA face-was affected by the difficulty of the concurrent task. The secondary RT was affected by modality of presentation of the primary task. Using a relatively primitive ECA, RT on the secondary task was significantly slower when shadowing occurred in AV versus A conditions. The benefits of this evaluation system, that returns quantitative behavioural data and self-report ratings, are discussed.

Psychomusicology: Music, Mind and Brain | 2018

The difficulty of learning microtonal tunings rapidly: The influence of pitch intervals and structural familiarity.

Yvonne Leung; Roger T. Dean

The current study investigates the learning of microtonal tuning systems, which have a different pitch interval structure than the Western tonal system (12-tone equal temperament). To examine the influence of structural similarity, we included systems that differed from the 12-tone equal temperament system in different degrees in terms of temperament, pitch ratio, and pitch differences. After a brief exposure phase in which participants became acquainted with the previously unfamiliar systems, we assessed aspects of their learning. We measured pitch memory performance and the knowledge of pitch membership using a pitch deviant detection task and a goodness-of-fit perception task. In the pitch deviant detection task, participants were required to detect pitch shifts in a second playing of a given melody, whereas in the goodness-of-fit task, they made judgments about whether the last tone (probe) fits or does not fit with the context of the just presented melody. A total of 30 musically untrained individuals were tested in each experiment, and results showed that learning was limited; hence the task was difficult in such a short period. Pitch deviant detection was better in the microtonal system that is well-formed with two step sizes than that in the other systems in the test. Goodness-of-fit perception was similar between 12-tone equal temperament and the other microtonal systems, and participants who were fundamentally better at pitch discrimination and contour perception were better at rejecting incongruent probes (nonmember of the system) in the goodness-of-fit task. This study has implications for music perception of pitch ratio- and pitch difference-based tuning systems.

PLOS ONE | 2018

Learning unfamiliar pitch intervals: A novel paradigm for demonstrating the learning of statistical associations between musical pitches

Yvonne Leung; Roger T. Dean

While mastering a musical instrument takes years, becoming familiar with a new music system requires less time and skills. In this study, we examine whether musically untrained Western listeners can incidentally learn an unfamiliar, microtonal musical scale from simply engaging in a timbre discrimination task. The experiment is comprised of an Exposure and a Test phase. During Exposure, 21 non-musicians were instructed to detect a timbre shift (TS) within short microtonal melodies, and we hypothesised that they would incidentally learn about the pitch interval structure of the microtonal scale from attending to the melodies during the task. In a follow-up Test phase, the tone before the TS was either a member (congruent) or a non-member (incongruent) of the scale. Based on our statistical manipulation of the stimuli, incongruent tones would be a better predictor of an incoming TS than the congruent tones. We therefore expect a faster response time to the shift after the participants have heard an incongruent tone. Specifically, a faster response time observed after an incongruent tone would imply participants’ ability to differentiate tones from the microtonal and the diatonic scale, and reflect their learning of the microtonal pitch intervals. Results are consistent with our predictions. In investigating the learning of a microtonal scale, our study can offer directions for future research on the perception of computer music and new musical genres.

Journal of New Music Research | 2018

Learning a well-formed microtonal scale: Pitch intervals and event frequencies

Yvonne Leung; Roger T. Dean

Abstract This study investigates learning interval structure and pitch occurrence frequency of a microtonal scale by two groups of musicians (one experienced in Western tonal music only, the other in several microtonal systems) and non-musicians. While musically untrained participants could rapidly learn the pitch occurrence frequency of this scale, learning microtonal pitch intervals was slow in musicians. Interestingly, microtonal musicians were the slowest in responding to deviant pitch intervals and timbre changes in microtonal melodies amongst the musicians. These results extend our recent observation of non-musicians’ ability to learn aspects of microtonal pitch intervals, suggesting that paradoxically, musicians do not adjust their learned expectations to microtonal systems as quickly as non-musicians.

Explore More