Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jonas Beskow is active.

Publication


Featured researches published by Jonas Beskow.


Embodied conversational agents | 2001

Developing and evaluating conversational agents

Dominic W. Massaro; Michael M. Cohen; Jonas Beskow; Ronald A. Cole

Publisher Summary This chapter focuses on the development and evaluation of conversational agents. Computer users can benefit from interaction with conversational agents and the access to the many sources of information that they can provide. A completely animated synthetic talking head with which one can control and study the informative aspects and psychological processes in face-to face dialogues has been developed. The goal of this chapter is to advance the development of the talking head, its design, and its accompanying technology and to create a human–computer interface centered on a virtual, conversational agent. Such agents interact with human users in the most natural manner possible, including the ability to listen and understand as well as speak fluently. Agents will facilitate and enrich interaction between humans and machines. Moreover, communication among humans can also be enhanced when mediated by virtual agents. The conversational agent can also be the interface for a series of public, interactive, art installations. This chapter expands the use of the agent in educational and therapeutic environments, as in the learning of non-native languages and in learning to read.


COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems | 2011

Furhat: a back-projected human-like robot head for multiparty human-machine interaction

Samer Al Moubayed; Jonas Beskow; Gabriel Skantze; Björn Granström

In this chapter, we first present a summary of findings from two previous studies on the limitations of using flat displays with embodied conversational agents (ECAs) in the contexts of face-to-face human-agent interaction. We then motivate the need for a three dimensional display of faces to guarantee accurate delivery of gaze and directional movements and present Furhat, a novel, simple, highly effective, and human-like back-projected robot head that utilizes computer animation to deliver facial movements, and is equipped with a pan-tilt neck. After presenting a detailed summary on why and how Furhat was built, we discuss the advantages of using optically projected animated agents for interaction. We discuss using such agents in terms of situatedness, environment, context awareness, and social, human-like face-to-face interaction with robots where subtle nonverbal and social facial signals can be communicated. At the end of the chapter, we present a recent application of Furhat as a multimodal multiparty interaction system that was presented at the London Science Museum as part of a robot festival,. We conclude the paper by discussing future developments, applications and opportunities of this technology.


Ksii Transactions on Internet and Information Systems | 2012

Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections

Samer Al Moubayed; Jens Edlund; Jonas Beskow

The perception of gaze plays a crucial role in human-human interaction. Gaze has been shown to matter for a number of aspects of communication and dialogue, especially for managing the flow of the dialogue and participant attention, for deictic referencing, and for the communication of attitude. When developing embodied conversational agents (ECAs) and talking heads, modeling and delivering accurate gaze targets is crucial. Traditionally, systems communicating through talking heads have been displayed to the human conversant using 2D displays, such as flat monitors. This approach introduces severe limitations for an accurate communication of gaze since 2D displays are associated with several powerful effects and illusions, most importantly the Mona Lisa gaze effect, where the gaze of the projected head appears to follow the observer regardless of viewing angle. We describe the Mona Lisa gaze effect and its consequences in the interaction loop, and propose a new approach for displaying talking heads using a 3D projection surface (a physical model of a human head) as an alternative to the traditional flat surface projection. We investigate and compare the accuracy of the perception of gaze direction and the Mona Lisa gaze effect in 2D and 3D projection surfaces in a five subject gaze perception experiment. The experiment confirms that a 3D projection surface completely eliminates the Mona Lisa gaze effect and delivers very accurate gaze direction that is independent of the observers viewing angle. Based on the data collected in this experiment, we rephrase the formulation of the Mona Lisa gaze effect. The data, when reinterpreted, confirms the predictions of the new model for both 2D and 3D projection surfaces. Finally, we discuss the requirements on different spatially interactive systems in terms of gaze direction, and propose new applications and experiments for interaction in a human-ECA and a human-robot settings made possible by this technology.


international conference on computers for handicapped persons | 2004

SYNFACE – A Talking Head Telephone for the Hearing-Impaired

Jonas Beskow; Inger Karlsson; Jo Kewley; Giampiero Salvi

SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started.


International Journal of Speech Technology | 2004

Trainable Articulatory Control Models for Visual Speech Synthesis

Jonas Beskow

This paper deals with the problem of modelling the dynamics of articulation for a parameterised talking head based on phonetic input. Four different models are implemented and trained to reproduce the articulatory patterns of a real speaker, based on a corpus of optical measurements. Two of the models, (“Cohen-Massaro” and “Öhman”) are based on coarticulation models from speech production theory and two are based on artificial neural networks, one of which is specially intended for streaming real-time applications. The different models are evaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaro model produces trajectories that best matches the measurements. A perceptual intelligibility experiment is also carried out, where the four data-driven models are compared against a rule-based model as well as an audio-alone condition. Results show that all models give significantly increased speech intelligibility over the audio-alone case, with the rule-based model yielding highest intelligibility score.


ACM Sigcaph Computers and The Physically Handicapped | 1998

Intelligent animated agents for interactive language training

Ron Cole; Tim Carmell; Pam Connors; Mike Macon; Johan Wouters; Jacques de Villiers; Alice Tarachow; Dominic W. Massaro; Michael M. Cohen; Jonas Beskow; Jie Yang; Uwe Meier; Alex Waibel; Pat Stone; Alice Davis; Chris Soland; George Fortier

This report describes a three-year project, now eight months old, to develop interactive learning tools for language training with profoundly deaf children. The tools combine four key technologies: speech recognition, developed at the Oregon Graduate Institute; speech synthesis, developed at the University of Edinburgh and modified at OGI; facial animation, developed at University of California, Santa Cruz; and face tracking and speech reading, developed at Carnegie Mellon University. These technologies are being combined to create an intelligent conversational agent; a three-dimensional face that produces and understands auditory and visual speech. The agent has been incorporated into CSLU Toolkit, a software environment for developing and researching spoken language systems. We describe our experiences in bringing interactive learning tools to classrooms at the Tucker-Maxon Oral School in Portland, Oregon, and the technological advances that are required for this project to succeed.


AVSP | 2012

Animated speech : Research progress and applications

Dominic W. Massaro; Michael M. Cohen; R. Clark; M. Tabain; Jonas Beskow

Background This chapter is dedicated to Christian Benoit, who almost single-handedly established visible speech as an important domain of research and application. During and after his residence in ...


Eurasip Journal on Audio, Speech, and Music Processing | 2009

SynFace: speech-driven facial animation for virtual speech-reading support

Giampiero Salvi; Jonas Beskow; Samer Al Moubayed; Björn Granström

This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling).


International Journal of Humanoid Robotics | 2013

The Furhat Back-Projected Humanoid Head-Lip Reading, Gaze And Multi-Party Interaction

Samer Al Moubayed; Gabriel Skantze; Jonas Beskow

In this paper, we present Furhat - a back-projected human-like robot head using state-of-the art facial animation. Three experiments are presented where we investigate how the head might facilitate human - robot face-to-face interaction. First, we investigate how the animated lips increase the intelligibility of the spoken output, and compare this to an animated agent presented on a flat screen, as well as to a human face. Second, we investigate the accuracy of the perception of Furhats gaze in a setting typical for situated interaction, where Furhat and a human are sitting around a table. The accuracy of the perception of Furhats gaze is measured depending on eye design, head movement and viewing angle. Third, we investigate the turn-taking accuracy of Furhat in a multi-party interactive setting, as compared to an animated agent on a flat screen. We conclude with some observations from a public setting at a museum, where Furhat interacted with thousands of visitors in a multi-party interaction.


Journal on Multimodal User Interfaces | 2009

Auditory visual prominence From intelligibility to behavior

Samer Al Moubayed; Jonas Beskow; Björn Granström

Auditory prominence is defined as when an acoustic segment is made salient in its context. Prominence is one of the prosodic functions that has been shown to be strongly correlated with facial movements. In this work, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study, a speech intelligibility experiment is conducted, speech quality is acoustically degraded and the fundamental frequency is removed from the signal, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrows raise gestures, which are synchronized with the auditory prominence. The experiment shows that presenting prominence as facial gestures significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a follow-up study examining the perception of the behavior of the talking heads when gestures are added over pitch accents. Using eye-gaze tracking technology and questionnaires on 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch accents opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and the understanding of the talking head.

Collaboration


Dive into the Jonas Beskow's collaboration.

Top Co-Authors

Avatar

Björn Granström

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Samer Al Moubayed

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Joakim Gustafson

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Jens Edlund

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

David House

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Gabriel Skantze

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Giampiero Salvi

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Simon Alexanderson

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar

Kalin Stefanov

Royal Institute of Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge