Jens Edlund | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jens Edlund is active.

Explore More

Publication

Featured researches published by Jens Edlund.

Journal of Phonetics | 2010

Pauses, gaps and overlaps in conversations

Mattias Heldner; Jens Edlund

This paper explores durational aspects of pauses gaps and overlaps in three different conversational corpora with a view to challenge claims about precision timing in turn-taking Distributions of p ...

Speech Communication | 2008

Towards human-like spoken dialogue systems

Jens Edlund; Joakim Gustafson; Mattias Heldner; Anna Hjalmarsson

This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed in this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirety. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

Phonetica | 2005

Exploring prosody in interaction control

Jens Edlund; Mattias Heldner

This paper investigates prosodic aspects of turn-taking in conversation with aview to improving the efficiency of identifying relevant places at which a machinecan legitimately begin to talk to a human interlocutor. It examines the relationshipbetween interaction control, the communicative function of which is to regulatethe flow of information between interlocutors, and its phonetic manifestation.Specifically, the listener’s perception of such interaction control phenomena ismodelled. Algorithms for automatic online extraction of prosodic phenomenaliable to be relevant for interaction control, such as silent pauses and intonationpatterns, are presented and evaluated in experiments using Swedish map taskdata. We show that the automatically extracted prosodic features can be used toavoid many of the places where current dialogue systems run the risk of interrupt-ingtheir users, as well as to identify suitable places to take the turn.

Ksii Transactions on Internet and Information Systems | 2012

Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections

Samer Al Moubayed; Jens Edlund; Jonas Beskow

The perception of gaze plays a crucial role in human-human interaction. Gaze has been shown to matter for a number of aspects of communication and dialogue, especially for managing the flow of the dialogue and participant attention, for deictic referencing, and for the communication of attitude. When developing embodied conversational agents (ECAs) and talking heads, modeling and delivering accurate gaze targets is crucial. Traditionally, systems communicating through talking heads have been displayed to the human conversant using 2D displays, such as flat monitors. This approach introduces severe limitations for an accurate communication of gaze since 2D displays are associated with several powerful effects and illusions, most importantly the Mona Lisa gaze effect, where the gaze of the projected head appears to follow the observer regardless of viewing angle. We describe the Mona Lisa gaze effect and its consequences in the interaction loop, and propose a new approach for displaying talking heads using a 3D projection surface (a physical model of a human head) as an alternative to the traditional flat surface projection. We investigate and compare the accuracy of the perception of gaze direction and the Mona Lisa gaze effect in 2D and 3D projection surfaces in a five subject gaze perception experiment. The experiment confirms that a 3D projection surface completely eliminates the Mona Lisa gaze effect and delivers very accurate gaze direction that is independent of the observers viewing angle. Based on the data collected in this experiment, we rephrase the formulation of the Mona Lisa gaze effect. The data, when reinterpreted, confirms the predictions of the new model for both 2D and 3D projection surfaces. Finally, we discuss the requirements on different spatially interactive systems in terms of gaze direction, and propose new applications and experiments for interaction in a human-ECA and a human-robot settings made possible by this technology.

conference of the international speech communication association | 2009

Pause and gap length in face-to-face interaction

Jens Edlund; Mattias Heldner; Julia Hirschberg

It has long been noted that conversational partners tend to exhibit increasingly similar pitch, intensity, and timing behavior over the course of a conversation. However, the metrics developed to measure this similarity to date have generally failed to capture the dynamic temporal aspects of this process. In this paper, we propose new approaches to measuring interlocutor similarity in spoken dialogue. We define similarity in terms of convergence and synchrony and propose approaches to capture these, illustrating our techniques on gap and pause production in Swedish spontaneous dialogues.

conference of the international speech communication association | 2010

Pitch similarity in the vicinity of backchannels

Mattias Heldner; Jens Edlund; Julia Hirschberg

Dynamic modeling of spoken dialogue seeks to capture how interlocutors change their speech over the course of a conversation. Much work has focused on how speakers adapt or entrain to different asp ...

international conference on acoustics, speech, and signal processing | 2008

An instantaneous vector representation of delta pitch for speaker-change prediction in conversational dialogue systems

Kornel Laskowski; Jens Edlund; Mattias Heldner

As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus on system responsiveness, aiming to mimic human-like dialogue flow control by predicting speaker changes as observed in real human-human conversations. We derive an instantaneous vector representation of pitch variation and show that it is amenable to standard acoustic modeling techniques. Using a small amount of automatically labeled data, we train models which significantly outperform current state-of-the-art pause-only systems, and replicate to within 1% absolute the performance of our previously published hand-crafted baseline. The new system additionally offers scope for run-time control over the precision or recall of locations at which to speak.

international conference on acoustics, speech, and signal processing | 2011

A single-port non-parametric model of turn-taking in multi-party conversation

Kornel Laskowski; Jens Edlund; Mattias Heldner

The taking of turns to speak is an intrinsic property of conversation. It is expected that models of taking turns, providing a prior distribution over conversational form, can reduce the perplexity of what is attended to and processed by spoken dialogue systems. We propose a single-port model of multi-party turn-taking which allows conversants to behave independently but to condition their behavior on the past of the entire group. The model performs at least as well as an existing multi-port model on perplexity over subsequent speech activity. We quantify the effect of longer histories and more distant future horizons, and argue that the framework has the potential to inform the design and behavior of spoken dialogue systems.

Fonetik 2010, Lund, 2-4 juni 2010 | 2010

Very short utterances in conversation

Jens Edlund; Mattias Heldner; Samer Al Moubayed; Agustín Gravano; Julia Hirschberg

Faced with the difficulties of finding an operationalized definition of backchannels, we have previously proposed an intermediate, auxiliary unit ‐ the very short utterance (VSU) ‐ which is defined operationally and is automatically extractable from recorded or ongoing dialogues. Here, we extend that work in the following ways: (1) we test the extent to which the VSU/NONVSU distinction corresponds to backchannels/non-backchannels in a different data set that is manually annotated for backchannels ‐ the Columbia Games Corpus; (2) we examine to the extent to which VSUS capture other short utterances with a vocabulary similar to backchannels; (3) we propose a VSU method for better managing turn-taking and barge-ins in spoken dialogue systems based on detection of backchannels; and (4) we attempt to detect backchannels with better precision by training a backchannel classifier using durations and inter-speaker relative loudness differences as features. The results show that VSUS indeed c apture a large proportion of backchannels ‐ large enough that VSUs can be used to improve spoken dialogue system turntaking; and that building a reliable backchannel classifier working in real time is feasible.

Language and Speech | 2009

MushyPeek: a framework for online investigation of audiovisual dialogue phenomena.

Jens Edlund; Jonas Beskow

Evaluation of methods and techniques for conversational and multimodal spoken dialogue systems is complex, as is gathering data for the modeling and tuning of such techniques. This article describes MushyPeek, an experiment framework that allows us to manipulate the audiovisual behavior of interlocutors in a setting similar to face-to-face human—human dialogue. The setup connects two subjects to each other over a Voice over Internet Protocol (VoIP) telephone connection and simultaneously provides each of them with an avatar representing the other. We present a first experiment which inaugurates, exemplifies, and validates the framework. The experiment corroborates earlier findings on the use of gaze and head pose gestures in turn-taking.

Explore More