Gregor Hofer
University of Edinburgh
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Gregor Hofer.
IEEE Journal of Selected Topics in Signal Processing | 2014
Dietmar Schabus; Michael Pucher; Gregor Hofer
This paper investigates joint speaker-dependent audiovisual Hidden Semi-Markov Models (HSMM) where the visual models produce a sequence of 3D motion tracking data that is used to animate a talking head and the acoustic models are used for speech synthesis. Different acoustic, visual, and joint audiovisual models for four different Austrian German speakers were trained and we show that the joint models perform better compared to other approaches in terms of synchronization quality of the synthesized visual speech. In addition, a detailed analysis of the acoustic and visual alignment is provided for the different models. Importantly, the joint audiovisual modeling does not decrease the acoustic synthetic speech quality compared to acoustic-only modeling so that there is a clear advantage in the common duration model of the joint audiovisual modeling approach that is used for synchronizing acoustic and visual parameter sequences. Finally, it provides a model that integrates the visual and acoustic speech dynamics.
intelligent user interfaces | 2011
Theresa Wilson; Gregor Hofer
In this paper, we investigate two types of expressiveness, linguistic and vocal, and whether they are useful for recognising the social roles of participants in meetings. Our experiments show that combining expressiveness features with speech activity does improve social role recognition over speech activity features alone.
international conference on computer graphics and interactive techniques | 2010
Michael A. Berger; Gregor Hofer; Hiroshi Shimodaira
Facial animation is difficult to do convincingly. The movements of the face are complex and subtle, and we are innately attuned to faces. It is particularly difficult and labor-intensive to accurately synchronize faces with speech. A technology-based solution to this problem is automated facial animation. There are various ways to automate facial animation, each of which drives a face from some input sequence. In performance-driven animation, the input sequence may be either facial motion capture or video of a face. In automatic lip-syncing, the input is audio (and possibly a text transcript), resulting in facial animation synchronized with that audio. In audio-visual text-to-speech synthesis (AVTTS), only text is input, and synchronous auditory and visual speech are synthesized.
international conference on computer graphics and interactive techniques | 2011
Dietmar Schabus; Michael Pucher; Gregor Hofer
Talking computer animated characters are a common sight in video games and movies. Although doing the mouth animation by hand gives the best results, because of cost and time constraints it is not always feasible. Furthermore the amount of speech in current games is ever increasing with some games having more than 200,000 lines of dialogue. This work proposes a system that can produce speech and the corresponding lip animation simultaneously using a statistical machine learning framework based on Hidden Markov Models (HMMs). The key point is that with the developed system never before seen or heard animated dialogues can be produced at a push of a button.
international conference on computer graphics and interactive techniques | 2010
Gregor Hofer; Korin Richmond; Michael A. Berger
Talking computer animated characters are a common sight in video games and movies. Although doing the mouth animation by hand gives the best results it is not always feasible because of cost or time constraints. Therefore producing lip animation automatically is highly desirable. The problem can therefore be phrased as mapping from speech to lip animation or in other words as an acoustic inversion. In our work we propose a solution that takes a sequence of input frames of speech and maps it directly to an output sequence of animation frames. The key point is that there is no need for phonemes or visemes which cuts one step in the usual lip synchronization process.
conference of the international speech communication association | 2007
Gregor Hofer; Hiroshi Shimodaira
conference of the international speech communication association | 2005
Gregor Hofer; Korin Richmond; Robert A. J. Clark
international symposium on computer architecture | 2008
Gregor Hofer; Junichi Yamagishi; Hiroshi Shimodaira
conference of the international speech communication association | 2008
Gregor Hofer; Junichi Yamagishi; Hiroshi Shimodaira
international conference on computer graphics and interactive techniques | 2007
Gregor Hofer; Hiroshi Shimodaira; Junichi Yamagishi