Inger Karlsson | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Inger Karlsson is active.

Explore More

Publication

Featured researches published by Inger Karlsson.

Archive | 2002

Multimodality in Language and Speech Systems

Björn Granström; David House; Inger Karlsson

Preface. Contributors. Introduction. Bodily Communication Dimensions of Expression and Content J. Allwood. Dynamic Imagery in Speech and Gesture D. McNeill, et al. Multimodal Speech Perception: a Paradigm for Speech Science D.W. Massaro. Multimodal Interaction and People with Disabilities A.D.N. Edwards. Multimodality in Language and Speech Systems - From Theory to Design Support Tool N.O. Bernsen. Developing Intelligent Multimedia Applications T. Brondsted, et al. Natural Turn-Taking Needs no Manual: Computational Theory and Model, from Perception to Action K.R. Thorisson. Speech and Gestures for Talking Faces in Conversational Dialogue Systems B. Granstrom, et al.

Speech Communication | 1991

Experiments with voice modelling in speech synthesis

Rolf Carlson; Björn Granström; Inger Karlsson

Abstract Some experiments with voice modelling using recent developments of the KTH speech synthesis system will be presented. A new synthesizer, GLOVE, an extended version of OVE III has been implemented in the system. It contains an improved glottal source built on the LF voice source model, some extra control parameters for the voiced and noise sources and an extra pole/zero-pair in the nasal branch. Furthermore, the present research versions of the KTH text-to-speech system include possibilities for interactive manipulations at the parameter level with on-screen reference to natural speech. The synthesis system constitutes a flexible environment for voice modelling experiments. The new synthesis tools and models were used for synthesis-by-analysis experiments. A sentence uttered by a female speaker was analysed and a stylized copy was made using both the old and the new synthesis system. With the new system the synthetic copy sounded very similar to the natural utterance.

international conference on computers for handicapped persons | 2004

SYNFACE – A Talking Head Telephone for the Hearing-Impaired

Jonas Beskow; Inger Karlsson; Jo Kewley; Giampiero Salvi

SYNFACE is a telephone aid for hearing-impaired people that shows the lip movements of the speaker at the other telephone synchronised with the speech. The SYNFACE system consists of a speech recogniser that recognises the incoming speech and a synthetic talking head. The output from the recogniser is used to control the articulatory movements of the synthetic head. SYNFACE prototype systems exist for three languages: Dutch, English and Swedish and the first user trials have just started.

Journal of Voice | 1992

Physiological correlates of the inverse filtered flow waveform

Stellan Hertegård; Jan Gauffin; Inger Karlsson

Summary The relationships between inverse filtered transglottal airflow waveforms, acoustic parameters, and glottal vibratory patterns were examined for five normal speakers, two males and three females, by means of inverse filtering of the flow, electroglottography, and videostroboscopy. It was found that for phonations in the normal frequency range for speech with complete glottal closure, the offset from zero flow of the waveform during the closed phase can be up to 20–30 ml/s. We suggest that the cause of the offset is vertical movements of the vocal folds. A pronounced mucosal wave and/or the intraglottal air volume might often result in a small hump at the beginning of the closed phase of the flow glottogram. For females, the frequently observed posterior chink between the arytenoid cartilages during phonation resulted in a waveform offset up to 50–60 ml/s

Speech Communication | 1992

Modelling voice variations in female speech synthesis

Inger Karlsson

Abstract The voice source is an important factor in the production of different voice qualities. These different voice qualities are used in speech to convey, among other things, different suprasegmental aspects, e.g., emphasis, phrase boundaries and also different speaking styles such as an authorititave or a submissive voice. Voice source variations are also an important means of conveying extralinguistic information of various kinds in ordinary speech. In the present study, voice source variations in normal speech by female speakers have been investigated using inverse filtering. The results of the inverse filtering are given in voice source parameters appropriate for controlling speech synthesis. Accordingly, the resulting descriptions have been utilized to produce voice variations in our new synthesis system.

Speech Communication | 2000

Speaker verification with elicited speaking styles in the VeriVox project

Inger Karlsson; Tanja Bänziger; Jana Dankovicová; Tom Johnstone; Johan Lindberg; Håkan Melin; Francis Nolan; Klaus R. Scherer

Some experiments have been carried out to study and compensate for within-speaker variations in speaker verification. To induce speaker variation, a speaking behaviour elicitation software package has been developed. A 50-speaker database with voluntary and involuntary speech variation has been recorded using this software. The database has been used for acoustic analysis as well as for automatic speaker verification (ASV) tests. The voluntary speech variations are used to form an enrolment set for the ASV system. This set is called structured training and is compared to neutral training where only normal speech is used. Both sets contain the same number of utterances. It is found that the ASV system improves its performance when testing on a mixed speaking style test without decreasing the performance of the tests with normal speech.

Archive | 2002

Speech and Gestures for Talking Faces in Conversational Dialogue Systems

Björn Granström; David House; Jonas Beskow; Inger Karlsson

Innovative spoken dialogue systems are beginning to be characterized by designs where interactivity is no longer seen as limited to a series of choices in a question and answer menu approach. New systems strive toward establishing a smooth flow of information modelled on conversational dialogues. In this context, there is currently considerable interest in developing 3D-animated agents to exploit the inherently multimodal nature of speech communication. As the 3D-animation becomes more sophisticated in terms of visual realism, the demand for naturalness in speech and gesture coordination increases. Not only are appropriate and speech-synchronized articulator movements necessary, conversational signals such as cues for turntaking and feedback are also essential. Such conversational signals can be conveyed by both the auditory and visual modality. Verbal (auditory) signals can complement syntax and interact with the prosodic (accentual and phrasal) structure of the utterances. For example, a phrase-final intonation pattern can function as both a cue for prosodic grouping and as a verbal turngiving signal. Gestural (visual) signals such as eyebrow movements and nodding for accentuation can function as parallel signals to intonation (i.e. as linguistic signals) as well as being used as conversational signals (e.g. raised eyebrows to signify an interested, listening agent or nodding to provide encouragement). While much work has been done on describing spoken and gestural conversational signals in human-to-human interaction (see e.g. Allwood, this volume; McNeill, this volume), work aimed at investigating the coordination of these two types of signals in computer-human interaction and the implementation of this knowledge in animated conversational agents is still relatively scarce.

Computer Speech & Language | 2011