Catharine Oertel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Catharine Oertel is active.

Explore More

Publication

Featured researches published by Catharine Oertel.

Speech Communication | 2014

Turn-taking, feedback and joint attention in situated human–robot interaction

Gabriel Skantze; Anna Hjalmarsson; Catharine Oertel

Abstract In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user’s and the robot’s gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot’s speech. By analysing the participants’ subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot’s gaze when talking about landmarks, and that the robot’s verbal and gaze behaviour has a strong effect on the users’ turn-taking behaviour. We also present an analysis of the users’ gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user’s level of uncertainty.

international conference on multimodal interfaces | 2013

A gaze-based method for relating group involvement to individual engagement in multimodal multiparty dialogue

Catharine Oertel; Giampiero Salvi

This paper is concerned with modelling individual engagement and group involvement as well as their relationship in an eight-party, mutimodal corpus. We propose a number of features (presence, entropy, symmetry and maxgaze) that summarise different aspects of eye-gaze patterns and allow us to describe individual as well as group behaviour in time. We use these features to define similarities between the subjects and we compare this information with the engagement rankings the subjects expressed at the end of each interactions about themselves and the other participants. We analyse how these features relate to four classes of group involvement and we build a classifier that is able to distinguish between those classes with 71\% of accuracy.

international conference on multimodal interfaces | 2015

Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions

Catharine Oertel; Kenneth Alberto Funes Mora; Joakim Gustafson; Jean-Marc Odobez

Estimating a silent participants degree of engagement and his role within a group discussion can be challenging, as there are no speech related cues available at the given time. Having this information available, however, can provide important insights into the dynamics of the group as a whole. In this paper, we study the classification of listeners into several categories (attentive listener, side participant and bystander). We devised a thin-sliced perception test where subjects were asked to assess listener roles and engagement levels in 15-second video-clips taken from a corpus of group interviews. Results show that humans are usually able to assess silent participant roles. Using the annotation to identify from a set of multimodal low-level features, such as past speaking activity, backchannels (both visual and verbal), as well as gaze patterns, we could identify the features which are able to distinguish between different listener categories. Moreover, the results show that many of the audio-visual effects observed on listeners in dyadic interactions, also hold for multi-party interactions. A preliminary classifier achieves an accuracy of 64 %.

Speech prosody | 2012

Context Cues For Classification Of Competitive And Collaborative Overlaps

Catharine Oertel; Marcin Włodarczak; Alexey Tarasov; Nick Campbell; Petra Wagner

Being able to respond appropriately to users’ overlaps should be seen as one of the core competencies of incremental dialogue systems. At the same time identifying whether an interlocutor wants to support or grab the turn is a task which comes natu- rally to humans, but has not yet been implemented in such sys- tems. Motivated by this we first investigate whether prosodic characteristics of speech in the vicinity of overlaps are signifi- cantly different from prosodic characteristics in the vicinity of non-overlapping speech. We then test the suitability of differ- ent context sizes, both preceding and following but excluding features of the overlap, for the automatic classification of col- laborative and competitive overlaps. We also test whether the fusion of preceding and succeeding contexts improves the clas- sification. Preliminary results indicate that the optimal context for classification of overlap lies at 0.2 seconds preceding the overlap and up to 0.3 seconds following it. We demonstrate that we are able to classify collaborative and competitive overlap with a median accuracy of 63%.

international conference on multimodal interfaces | 2016

Towards building an attentive artificial listener: on the perception of attentiveness in audio-visual feedback tokens

Catharine Oertel; José Lopes; Yu Yu; Kenneth Alberto Funes Mora; Joakim Gustafson; Alan W. Black; Jean-Marc Odobez

Current dialogue systems typically lack a variation of audio-visual feedback tokens. Either they do not encompass feedback tokens at all, or only support a limited set of stereotypical functions. However, this does not mirror the subtleties of spontaneous conversations. If we want to be able to build an artificial listener, as a first step towards building an empathetic artificial agent, we also need to be able to synthesize more subtle audio-visual feedback tokens. In this study, we devised an array of monomodal and multimodal binary comparison perception tests and experiments to understand how different realisations of verbal and visual feedback tokens influence third-party perception of the degree of attentiveness. This allowed us to investigate i) which features (amplitude, frequency, duration...) of the visual feedback influences attentiveness perception; ii) whether visual or verbal backchannels are perceived to be more attentive iii) whether the fusion of unimodal tokens with low perceived attentiveness increases the degree of perceived attentiveness compared to unimodal tokens with high perceived attentiveness taken alone; iv) the automatic ranking of audio-visual feedback token in terms of conveyed degree of attentiveness.

9th IFIP WG 5.5 International Summer Workshop on Multimodal Interfaces, eNTERFACE 2013, Lisbon, Portugal, July 15 – August 9, 2013 | 2014

Tutoring Robots : Multiparty multimodal social dialogue with an embodied tutor

Samer Al Moubayed; Jonas Beskow; Bajibabu Bollepalli; Ahmed Hussen-Abdelaziz; Martin Johansson; Maria Koutsombogera; José Lopes; Jekaterina Novikova; Catharine Oertel; Gabriel Skantze; Kalin Stefanov; Gül Varol

This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets t ...

intelligent virtual agents | 2017

Crowd-Powered Design of Virtual Attentive Listeners.

Patrik Jonell; Catharine Oertel; Dimosthenis Kontogiorgos; Jonas Beskow; Joakim Gustafson

This demo presents a web-based system that generates attentive listening behaviours in a virtual agent acquired from audio-visual recordings of attitudinal feedback behaviour of crowdworkers.

conference of the international speech communication association | 2016

Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances.

Catharine Oertel; Joakim Gustafson; Alan W. Black

Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances

Proceedings of the Workshop on Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction | 2016

On data driven parametric backchannel synthesis for expressing attentiveness in conversational agents

Catharine Oertel; Joakim Gustafson; Alan W. Black

In this study, we are using a multi-party recording as a template for building a parametric speech synthesiser which is able to express different levels of attentiveness in backchannel tokens. This allowed us to investigate i) whether it is possible to express the same perceived level of attentiveness in synthesised than in natural backchannels; ii) whether it is possible to increase and decrease the perceived level of attentiveness of backchannels beyond the range observed in the original corpus.

human-robot interaction | 2014

Human-robot collaborative tutoring using multiparty multimodal spoken dialogue

Samer Al Moubayed; Jonas Beskow; Bajibabu Bollepalli; Joakim Gustafson; Ahmed Hussen-Abdelaziz; Martin Johansson; Maria Koutsombogera; José Lopes; Jekaterina Novikova; Catharine Oertel; Gabriel Skantze; Kalin Stefanov; Gül Varol

In this paper, we describe a project that explores a novel experimental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robot interaction setup is designed, and a human-human dialogue corpus is collected. The corpus targets the development of a dialogue system platform to study verbal and nonverbal tutoring strategies in multiparty spoken interactions with robots which are capable of spoken dialogue. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the participants perform the task, and organizes and balances their interaction. Different multimodal signals captured and autosynchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal dominance, and how that is correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to measure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a wellcoordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of using multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

Explore More