Gregor Hofer | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gregor Hofer is active.

Explore More

Publication

Featured researches published by Gregor Hofer.

IEEE Journal of Selected Topics in Signal Processing | 2014

Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis

Dietmar Schabus; Michael Pucher; Gregor Hofer

This paper investigates joint speaker-dependent audiovisual Hidden Semi-Markov Models (HSMM) where the visual models produce a sequence of 3D motion tracking data that is used to animate a talking head and the acoustic models are used for speech synthesis. Different acoustic, visual, and joint audiovisual models for four different Austrian German speakers were trained and we show that the joint models perform better compared to other approaches in terms of synchronization quality of the synthesized visual speech. In addition, a detailed analysis of the acoustic and visual alignment is provided for the different models. Importantly, the joint audiovisual modeling does not decrease the acoustic synthetic speech quality compared to acoustic-only modeling so that there is a clear advantage in the common duration model of the joint audiovisual modeling approach that is used for synchronizing acoustic and visual parameter sequences. Finally, it provides a model that integrates the visual and acoustic speech dynamics.

intelligent user interfaces | 2011

Using linguistic and vocal expressiveness in social role recognition

Theresa Wilson; Gregor Hofer

In this paper, we investigate two types of expressiveness, linguistic and vocal, and whether they are useful for recognising the social roles of participants in meetings. Our experiments show that combining expressiveness features with speech activity does improve social role recognition over speech activity features alone.

international conference on computer graphics and interactive techniques | 2010

Carnival: a modular framework for automated facial animation

Michael A. Berger; Gregor Hofer; Hiroshi Shimodaira

Facial animation is difficult to do convincingly. The movements of the face are complex and subtle, and we are innately attuned to faces. It is particularly difficult and labor-intensive to accurately synchronize faces with speech. A technology-based solution to this problem is automated facial animation. There are various ways to automate facial animation, each of which drives a face from some input sequence. In performance-driven animation, the input sequence may be either facial motion capture or video of a face. In automatic lip-syncing, the input is audio (and possibly a text transcript), resulting in facial animation synchronized with that audio. In audio-visual text-to-speech synthesis (AVTTS), only text is input, and synchronous auditory and visual speech are synthesized.

international conference on computer graphics and interactive techniques | 2011

Simultaneous speech and animation synthesis

Dietmar Schabus; Michael Pucher; Gregor Hofer

Talking computer animated characters are a common sight in video games and movies. Although doing the mouth animation by hand gives the best results, because of cost and time constraints it is not always feasible. Furthermore the amount of speech in current games is ever increasing with some games having more than 200,000 lines of dialogue. This work proposes a system that can produce speech and the corresponding lip animation simultaneously using a statistical machine learning framework based on Hidden Markov Models (HMMs). The key point is that with the developed system never before seen or heard animated dialogues can be produced at a push of a button.

international conference on computer graphics and interactive techniques | 2010

Lip synchronization by acoustic inversion

Gregor Hofer; Korin Richmond; Michael A. Berger

Talking computer animated characters are a common sight in video games and movies. Although doing the mouth animation by hand gives the best results it is not always feasible because of cost or time constraints. Therefore producing lip animation automatically is highly desirable. The problem can therefore be phrased as mapping from speech to lip animation or in other words as an acoustic inversion. In our work we propose a solution that takes a sequence of input frames of speech and maps it directly to an output sequence of animation frames. The key point is that there is no need for phonemes or visemes which cuts one step in the usual lip synchronization process.

conference of the international speech communication association | 2007