Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ingmar Steiner is active.

Publication


Featured researches published by Ingmar Steiner.


Computer Graphics Forum | 2015

VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track

Pablo Garrido; Levi Valgaerts; Hamid Sarmadi; Ingmar Steiner; Kiran Varanasi; Patrick Pérez; Christian Theobalt

In many countries, foreign movies and TV productions are dubbed, i.e., the original voice of an actor is replaced with a translation that is spoken by a dubbing actor in the countrys own language. Dubbing is a complex process that requires specific translations and accurately timed recitations such that the new audio at least coarsely adheres to the mouth motion in the video. However, since the sequence of phonemes and visemes in the original and the dubbing language are different, the video‐to‐audio match is never perfect, which is a major source of visual discomfort. In this paper, we propose a system to alter the mouth motion of an actor in a video, so that it matches the new audio track. Our paper builds on high‐quality monocular capture of 3D facial performance, lighting and albedo of the dubbing and target actors, and uses audio analysis in combination with a space‐time retrieval method to synthesize a new photo‐realistically rendered and highly detailed 3D shape model of the mouth region to replace the target performance. We demonstrate plausible visual quality of our results compared to footage that has been professionally dubbed in the traditional way, both qualitatively and through a user study.


Journal of the Acoustical Society of America | 2012

The magnetic resonance imaging subset of the mngu0 articulatory corpus

Ingmar Steiner; Korin Richmond; Ian Marshall; Calum Gray

This paper announces the availability of the magnetic resonance imaging (MRI) subset of the mngu0 corpus, a collection of articulatory speech data from one speaker containing different modalities. This subset comprises volumetric MRI scans of the speakers vocal tract during sustained production of vowels and consonants, as well as dynamic mid-sagittal scans of repetitive consonant-vowel (CV) syllable production. For reference, high-quality acoustic recordings of the speech material are also available. The raw data are made freely available for research purposes.


SSW | 2007

Control concepts for articulatory speech synthesis.

Peter Birkholz; Ingmar Steiner; Stefan Breuer

We present two concepts for the generation of gestural scores to control an articulatory speech synthesizer. Gestural scores are the common input to the synthesizer and constitute an organized pattern of articulatory gestures. The first concept generates the gestures for an utterance using the phonetic transcriptions, phone durations, and intonation commands predicted by the Bonn Open Synthesis System (BOSS) from an arbitrary input text. This concept extends the synthesizer to a text-to-speech synthesis system. The idea of the second concept is to use timing information extracted from Electromagnetic Articulography signals to generate the articulatory gestures. Therefore, it is a concept for the re-synthesis of natural utterances. Finally, application prospects for the presented synthesizer are discussed.


Journal on Multimodal User Interfaces | 2014

Facial expression-based affective speech translation

Éva Székely; Ingmar Steiner; Zeeshan Ahmed; Julie Carson-Berndsen

One of the challenges of speech-to-speech translation is to accurately preserve the paralinguistic information in the speaker’s message. Information about affect and emotional intent of a speaker are often carried in more than one modality. For this reason, the possibility of multimodal interaction with the system and the conversation partner may greatly increase the likelihood of a successful and gratifying communication process. In this work we explore the use of automatic facial expression analysis as an input annotation modality to transfer paralinguistic information at a symbolic level from input to output in speech-to-speech translation. To evaluate the feasibility of this approach, a prototype system, FEAST (facial expression-based affective speech translation) has been developed. FEAST classifies the emotional state of the user and uses it to render the translated output in an appropriate voice style, using expressive speech synthesis.


Archive | 2007

Producing phrasal prominence in German

Bistra Andreeva; William J. Barry; Ingmar Steiner

This study examines the relative change in a number of acoustic parameters usually associated with the production of prominences. The production of six German sentences under different question answer conditions provide de-accented and accented versions of the same words in broad and narrow focus. Normalised energy, F0, duration and spectral measures were found to form a stable hierarchy in their exponency of the three degrees of accentuation.


Computer Speech & Language | 2018

A Multilinear Tongue Model Derived from Speech Related MRI Data of the Human Vocal Tract

Alexander Hewer; Stefanie Wuhrer; Ingmar Steiner; Korin Richmond

Abstract We present a multilinear statistical model of the human tongue that captures anatomical and tongue pose related shape variations separately. The model is derived from 3D magnetic resonance imaging data of 11 speakers sustaining speech related vocal tract configurations. To extract model parameters, we use a minimally supervised method based on an image segmentation approach and a template fitting technique. Furthermore, we use image denoising to deal with possibly corrupt data, palate surface information reconstruction to handle palatal tongue contacts, and a bootstrap strategy to refine the obtained shapes. Our evaluation shows that, by limiting the degrees of freedom for the anatomical and speech related variations, to 5 and 4, respectively, we obtain a model that can reliably register unknown data while avoiding overfitting effects. Furthermore, we show that it can be used to generate plausible tongue animation by tracking sparse motion capture data.


IEEE Transactions on Audio, Speech, and Language Processing | 2017

Synthesis of Tongue Motion and Acoustics From Text Using a Multimodal Articulatory Database

Ingmar Steiner; Sébastien Le Maguer; Alexander Hewer

We present an end-to-end text-to-speech (TTS) synthesis system that generates audio and synchronized tongue motion directly from text. This is achieved by adapting a three-dimensional model of the tongue surface to an articulatory dataset and training a statistical parametric speech synthesis system directly on the tongue model parameters. We evaluate the model at every step by comparing the spatial coordinates of predicted articulatory movements against the reference data. The results indicate a global mean Euclidean distance of less than 2.8 mm, and our approach can be adapted to add an articulatory modality to conventional TTS applications without the need for extra data.


Archive | 2016

Tongue Mesh Extraction from 3D MRI Data of the Human Vocal Tract

Alexander Hewer; Stefanie Wuhrer; Ingmar Steiner; Korin Richmond

In speech science, analyzing the shape of the tongue during human speech production is of great importance. In this field, magnetic resonance imaging (MRI) is currently regarded as the preferred modality for acquiring dense 3D information about the human vocal tract . However, the desired shape information is not directly available from the acquired MRI data. In this chapter, we present a minimally supervised framework for extracting the tongue shape from a 3D MRI scan. It combines an image segmentation approach with a template fitting technique and produces a polygon mesh representation of the identified tongue shape. In our evaluation, we focus on two aspects: First, we investigate whether the approach can be regarded as independent of changes in tongue shape caused by different speakers and phones. Moreover, we check whether an average user who is not necessarily an anatomical expert may obtain acceptable results. In both cases, our framework shows promising results.


intelligent user interfaces | 2013

A system for facial expression-based affective speech translation

Zeeshan Ahmed; Ingmar Steiner; Éva Székely; Julie Carson-Berndsen

In the emerging field of speech-to-speech translation, emphasis is currently placed on the linguistic content, while the significance of paralinguistic information conveyed by facial expression or tone of voice is typically neglected. We present a prototype system for multimodal speech-to-speech translation that is able to automatically recognize and translate spoken utterances from one language into another, with the output rendered by a speech synthesis system. The novelty of our system lies in the technique of generating the synthetic speech output in one of several expressive styles that is automatically determined using a camera to analyze the users facial expression during speech.


arXiv: Human-Computer Interaction | 2012

Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis

Ingmar Steiner; Korin Richmond; Slim Ouni

The importance of modeling speech articulation for high-quality audiovisual (AV) speech synthesis is widely acknowledged. Nevertheless, while state-of-the-art, data-driven approaches to facial animation can make use of sophisticated motion capture techniques, the animation of the intraoral articulators (viz. the tongue, jaw, and velum) typically makes use of simple rules or viseme morphing, in stark contrast to the otherwise high quality of facial modeling. Using appropriate speech production data could significantly improve the quality of articulatory animation for AV synthesis.

Collaboration


Dive into the Ingmar Steiner's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Slim Ouni

University of Lorraine

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Zeeshan Ahmed

University College Dublin

View shared research outputs
Researchain Logo
Decentralizing Knowledge