Dietmar Schabus | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dietmar Schabus is active.

Explore More

Publication

Featured researches published by Dietmar Schabus.

Speech Communication | 2010

Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis

Michael Pucher; Dietmar Schabus; Junichi Yamagishi; Friedrich Neubarth; Volker Strom

An HMM-based speech synthesis framework is applied to both standard Austrian German and a Viennese dialectal variety and several training strategies for multi-dialect modeling such as dialect clustering and dialect-adaptive training are investigated. For bridging the gap between processing on the level of HMMs and on the linguistic level, we add phonological transformations to the HMM interpolation and apply them to dialect interpolation. The crucial steps are to employ several formalized phonological rules between Austrian German and Viennese dialect as constraints for the HMM interpolation. We verify the effectiveness of this strategy in a number of perceptual evaluations. Since the HMM space used is not articulatory but acoustic space, there are some variations in evaluation results between the phonological rules. However, in general we obtained good evaluation results which show that listeners can perceive both continuous and categorical changes of dialect varieties by using phonological transformations employed as switching rules in the HMM interpolation.

IEEE Journal of Selected Topics in Signal Processing | 2014

Joint Audiovisual Hidden Semi-Markov Model-Based Speech Synthesis

Dietmar Schabus; Michael Pucher; Gregor Hofer

This paper investigates joint speaker-dependent audiovisual Hidden Semi-Markov Models (HSMM) where the visual models produce a sequence of 3D motion tracking data that is used to animate a talking head and the acoustic models are used for speech synthesis. Different acoustic, visual, and joint audiovisual models for four different Austrian German speakers were trained and we show that the joint models perform better compared to other approaches in terms of synchronization quality of the synthesized visual speech. In addition, a detailed analysis of the acoustic and visual alignment is provided for the different models. Importantly, the joint audiovisual modeling does not decrease the acoustic synthetic speech quality compared to acoustic-only modeling so that there is a clear advantage in the common duration model of the joint audiovisual modeling approach that is used for synchronizing acoustic and visual parameter sequences. Finally, it provides a model that integrates the visual and acoustic speech dynamics.

international conference on intelligent transportation systems | 2010

Multimodal highway monitoring for robust incident detection

Michael Pucher; Dietmar Schabus; Peter Schallauer; Yuriy Lypetskyy; Franz Graf; Harald Rainer; Michael Stadtschnitzer; Sabine Sternig; Josef Alois Birchbauer; Wolfgang Schneider; Bernhard Schalko

We present detection and tracking methods for highway monitoring based on video and audio sensors, and the combination of these two modalities. We evaluate the performance of the different systems on realistic data sets that have been recorded on Austrian highways. It is shown that we can achieve a very good performance for video-based incident detection of wrong-way drivers, still standing vehicles, and traffic jams. Algorithms for simultaneous vehicle and driving direction detection using microphone arrays were evaluated and also showed a good performance on these tasks. Robust tracking in case of difficult weather conditions is achieved through multimodal sensor fusion of video and audio sensors.

international conference on computer graphics and interactive techniques | 2011

Simultaneous speech and animation synthesis

Dietmar Schabus; Michael Pucher; Gregor Hofer

Talking computer animated characters are a common sight in video games and movies. Although doing the mouth animation by hand gives the best results, because of cost and time constraints it is not always feasible. Furthermore the amount of speech in current games is ever increasing with some games having more than 200,000 lines of dialogue. This work proposes a system that can produce speech and the corresponding lip animation simultaneously using a statistical machine learning framework based on Hidden Markov Models (HMMs). The key point is that with the developed system never before seen or heard animated dialogues can be produced at a push of a button.

vehicular technology conference | 2011

Distributed Field Estimation Algorithms in Vehicular Sensor Networks

Dietmar Schabus; Thomas Zemen; Michael Pucher

This paper deals with cooperative reconstruction of environmental variables (e.g., temperature) along a road by a vehicular sensor network using wireless communication. Vehicles take repeated measurements and approximate the environment using a set of basis functions. We investigate the applicability and performance of popular averaging techniques (gossiping and consensus propagation) on the basis coefficients, and propose a simpler approach to avoid divergence problems. We have developed a graphical simulation environment to study the behavior of different algorithms in this scenario and we show simulation results which support our simplified approach.

Speech Communication | 2015

Unsupervised and phonologically controlled interpolation of Austrian German language varieties for speech synthesis

Markus Toman; Michael Pucher; Sylvia Moosmüller; Dietmar Schabus

Abstract This paper presents an unsupervised method that allows for gradual interpolation between language varieties in statistical parametric speech synthesis using Hidden Semi-Markov Models (HSMMs). We apply dynamic time warping using Kullback–Leibler divergence on two sequences of HSMM states to find adequate interpolation partners. The method operates on state sequences with explicit durations and also on expanded state sequences where each state corresponds to one feature frame. In an intelligibility and dialect rating subjective evaluation of synthesized test sentences, we show that our method can generate intermediate varieties for three Austrian dialects (Viennese, Innervillgraten, Bad Goisern). We also provide an extensive phonetic analysis of the interpolated samples. The analysis includes input-switch rules, which cover historically different phonological developments of the dialects versus the standard language; and phonological processes, which are phonetically motivated, gradual, and common to all varieties. We present an extended method which linearly interpolates phonological processes but uses a step function for input-switch rules. Our evaluation shows that the integration of this kind of phonological knowledge improves dialect authenticity judgment of the synthesized speech, as performed by dialect speakers. Since gradual transitions between varieties are an existing phenomenon, we can use our methods to adapt speech output systems accordingly.

international conference on computers helping people with special needs | 2010

Design and development of spoken dialog systems incorporating speech synthesis of Viennese varieties

Michael Pucher; Friedrich Neubarth; Dietmar Schabus

This paper describes our work on the design and development of a spoken dialog system, which uses synthesized speech of various different Viennese varieties. In a previous study we investigated the usefulness of synthesis of varieties. The developed spoken dialog system was especially designed for the different personas that can be realized with multiple varieties. This brings more realistic and fun-to-use spoken dialog systems to the end user and can serve as speech-based user interface for blind users and users with visual impairment. The benefits for this group of users are the increased acceptability and also comprehensibility that comes about when the synthesized speech reflects the users linguistic and/or social identity.

Proceedings of the Facial Analysis and Animation on | 2015

Visio-articulatory to acoustic conversion of speech

Michael Pucher; Dietmar Schabus

In this paper we evaluate the performance of combined visual and articulatory features for the conversion to acoustic speech. Such a conversion has possible applications in silent speech interfaces, which are based on the processing of non-acoustic speech signals. With an intelligibility test we show that the usage of joint visual and articulatory features can improve the reconstruction of acoustic speech compared to using only articulatory or visual data. An improvement can be achieved when using the original or using no voicing information.

conference of the international speech communication association | 2010