Björn Granström | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Björn Granström is active.

Explore More

Publication

Featured researches published by Björn Granström.

Archive | 2002

Multimodality in Language and Speech Systems

Björn Granström; David House; Inger Karlsson

Preface. Contributors. Introduction. Bodily Communication Dimensions of Expression and Content J. Allwood. Dynamic Imagery in Speech and Gesture D. McNeill, et al. Multimodal Speech Perception: a Paradigm for Speech Science D.W. Massaro. Multimodal Interaction and People with Disabilities A.D.N. Edwards. Multimodality in Language and Speech Systems - From Theory to Design Support Tool N.O. Bernsen. Developing Intelligent Multimedia Applications T. Brondsted, et al. Natural Turn-Taking Needs no Manual: Computational Theory and Model, from Perception to Action K.R. Thorisson. Speech and Gestures for Talking Faces in Conversational Dialogue Systems B. Granstrom, et al.

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems | 2011

Furhat: a back-projected human-like robot head for multiparty human-machine interaction

Samer Al Moubayed; Jonas Beskow; Gabriel Skantze; Björn Granström

In this chapter, we first present a summary of findings from two previous studies on the limitations of using flat displays with embodied conversational agents (ECAs) in the contexts of face-to-face human-agent interaction. We then motivate the need for a three dimensional display of faces to guarantee accurate delivery of gaze and directional movements and present Furhat, a novel, simple, highly effective, and human-like back-projected robot head that utilizes computer animation to deliver facial movements, and is equipped with a pan-tilt neck. After presenting a detailed summary on why and how Furhat was built, we discuss the advantages of using optically projected animated agents for interaction. We discuss using such agents in terms of situatedness, environment, context awareness, and social, human-like face-to-face interaction with robots where subtle nonverbal and social facial signals can be communicated. At the end of the chapter, we present a recent application of Furhat as a multimodal multiparty interaction system that was presented at the London Science Museum as part of a robot festival,. We conclude the paper by discussing future developments, applications and opportunities of this technology.

Speech Communication | 2001

Developments and paradigms in intonation research

Antonis Botinis; Björn Granström; Bernd Möbius

Abstract The present tutorial paper is addressed to a wide audience with different discipline backgrounds as well as variable expertise on intonation. The paper is structured into five sections. In Section 1 , “ Introduction ”, basic concepts of intonation and prosody are summarised and cornerstones of intonation research are highlighted. In Section 2 , “ Functions and forms of intonation ”, a wide range of functions from morpholexical and phrase levels to discourse and dialogue levels are discussed and forms of intonation with examples from different languages are presented. In Section 3 , “ Modelling and labelling of intonation ”, established models of intonation as well as labelling systems are presented. In Section 4 , “ Applications of intonation ”, the most widespread applications of intonation and especially technological ones are presented and methodological issues are discussed. In Section 5 , “ Research perspective ” research avenues and ultimate goals as well as the significance and benefits of intonation research in the upcoming years are outlined.

Speech Communication | 2005

Audiovisual representation of prosody in expressive speech communication

Björn Granström; David House

Abstract Prosody in a single speaking style—often read speech—has been studied extensively in acoustic speech. During the past few years we have expanded our interest in two directions: (1) Prosody in expressive speech communication and (2) prosody as an audiovisual expression. Understanding the interactions between visual expressions (primarily in the face) and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons tightly connected to the acoustics (e.g. lip and jaw movements), but there are other articulatory movements that do not show up on the outside of the face. Furthermore, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. In this presentation we will give some examples of recent work, primarily at KTH, addressing these questions. We will report on methods for the acquisition and modeling of visual and acoustic data, and some evaluation experiments in which audiovisual prosody is tested. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher.

Speech Communication | 1991

Experiments with voice modelling in speech synthesis

Rolf Carlson; Björn Granström; Inger Karlsson

Abstract Some experiments with voice modelling using recent developments of the KTH speech synthesis system will be presented. A new synthesizer, GLOVE, an extended version of OVE III has been implemented in the system. It contains an improved glottal source built on the LF voice source model, some extra control parameters for the voiced and noise sources and an extra pole/zero-pair in the nasal branch. Furthermore, the present research versions of the KTH text-to-speech system include possibilities for interactive manipulations at the parameter level with on-screen reference to natural speech. The synthesis system constitutes a flexible environment for voice modelling experiments. The new synthesis tools and models were used for synthesis-by-analysis experiments. A sentence uttered by a female speaker was analysed and a stylized copy was made using both the old and the new synthesis system. With the new system the synthetic copy sounded very similar to the natural utterance.

international conference on acoustics, speech, and signal processing | 1982

A multi-language text-to-speech module

Rolf Carlson; Björn Granström; Sheri Hunnicutt

Recent advances in microprocessor, memory and signal processor technology have made it feasible to put complete speech processing equipment into a portable form. At our laboratory a higher-level programming language has been developed that is especially suitable for rule description of linguistic processes. Text-to-speech systems for several languages have been written in this framework. Now also a cross compiler has been designed for the 16-bit microprocessor MC 68000, which makes transportation of programs from our research computer trivial. The language-independent parts of the program for the microprocessor are written in efficient assembler code. Special hardware development has resulted in a portable, battery-operated unit that is capable of transforming text-to-speech at a speaking rate of 250 wpm (words per minute). This opens up the possibility of speech options on computer terminals, portable, large vocabulary talking language translators, etc. The module has been tried in several applications for people with communication handicaps.

Phonetica | 1986

A Search for Durational Rules in a Real-Speech Data Base

Rolf Carlson; Björn Granström

The durational properties of consonants have been studied for Swedish and English. The use of quantity in Swedish demands an expansion of the rule structure proposed by Klatt. The Swedish study has be

international conference on acoustics, speech, and signal processing | 1976

A text-to-speech system based entirely on rules

Rolf Carlson; Björn Granström

When reading a text a native speaker pronounces most words correctly even if they are unknown to him. During this process he makes use of his knowledge of the language, the semantic content and the syntax. However, if we take away all information except the spelling and some pronunciation rules on the word level, the task would be more difficult. This is basically the case in our text-to-speech synthesis system containing neither semantic and syntactic analysis nor a word or morpheme dictionary. At the conference the function of our present synthesis system will be discussed. The result shows that such a system might well be based on rules rather than on an extensive dictionary. Furthermore a useful tool in speech synthesis work is described (i.e. a programming language).

Journal of the Acoustical Society of America | 1979

MITalk‐79: The 1979 MIT text‐to‐speech system

Jonathan Allen; Sharon Hunnicutt; Rolf Carlson; Björn Granström

To mark the completion of a ten‐year effort to develop a high performance text‐to‐speech algorithm, we have established a benchmark system called “MITalk.” Components of the computer‐simulated bench mark include: (1) conversion of abbreviations and special text symbols, (2) a lexicon consisting of about 11 000 morphs with pronunciation and parts of speech, (3) morpheme analysis, (4) letter‐to‐sound rules, (5) syntactic analysis, (6) rules for stress assignment, boundary placement and phonological recoding, (7) fundamental frequency and segmental duration prediction, (8) phonetic‐to‐parametric conversion, and (9) digital formant synthesis. The MITalk‐79 system is being extensively documented and its performance is being evaluated. The presentation will summarize aspects of system organization and performance. (A more complete description will be given in a one‐week course to be offered June 25–29, 1979.) The oral presentation will include a five‐minute demonstration of synthetic speech generated from Engli...

Speech Communication | 1993

Prosodic modelling in Swedish speech synthesis

Gösta Bruce; Björn Granström

Abstract Our present work concerns Swedish prosody in a speech synthesis framework. Two main problem areas are examined: prominence and phrasing. In a model for Swedish prosody, prominence levels (stress, accent, focus) are represented as layered and multidimensional for different domains (syllable, foot, word). Phrasing involves both coherence in the form of specific combinations of existing accentual gestures and separate boundary gestures. The main features of the intonation model are given in outline. Experiments on prominence include modelling of durations in a combined speech data base and rule synthesis framework, where the stressed-unstressed alternation appears to be the most important duration factor. Other experimentation concerns typical differences in the timing characteristics of the tonal gesture for focal accent between compound words and simplex accent II words. Experiments on phrasing include both production data from a varied speech material as well as synthesis and perception. Our experiments demonstrate that both coherence and boundary cues are effective as phrasing signals and that a combination of F0 and duration is typically used to signal phrasing. Our future plans include working with prosodic modelling of Swedish in a dialogue context and in a concept-to-speech framework.

Explore More