Markus Toman
Vienna University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Markus Toman.
Computer Graphics and Imaging | 2013
Markus Toman; Michael Pucher
While the synthesis of natural sounding, neutral style speech can be achieved using today’s technology, fast adaptation of speech synthesis to different contexts and situations still poses a challenge. In the context of variety modeling (dialects, sociolects) we have to cope with the problem that no standardized orthographic form is available and that existing speech resources for these varieties are rare. We present recent approaches in the field of cross-lingual speaker transformation for HMM-based speech synthesis and propose a method for transforming an arbitrary speaker’s voice from one variety to another one. We apply Kullback-Leibler divergence for data mapping of HMM-states, transfer probability density functions to the decision tree of the other variety and perform speaker adaptation. A method to integrate structural information in the mapping is also presented and analyzed. Subjective listening tests show that the proposed method produces speech of significantly higher quality than standard speaker adaptation techniques.
Computer Speech & Language | 2017
Michael Pucher; Bettina Zillinger; Markus Toman; Dietmar Schabus; Cassia Valentini-Botinhao; Junichi Yamagishi; Erich Schmid; Thomas Woltron
Abstract In this paper, we evaluate how speaker familiarity influences the engagement times and performance of blind children and young adults when playing audio games made with different synthetic voices. We also show how speaker familiarity influences speaker and synthetic speech recognition. For the first experiment we develop synthetic voices of school children, their teachers and of speakers that are unfamiliar to them and use each of these voices to create variants of two audio games: a memory game and a labyrinth game. Results show that pupils have significantly longer engagement times and better performance when playing games that use synthetic voices built with their own voices. These findings can be used to improve the design of audio games and lecture books for blind and visually impaired children and young adults. In the second experiment we show that blind children and young adults are better in recognizing synthetic voices than their visually impaired companions. We also show that the average familiarity with a speaker and the similarity between a speaker’s synthetic and natural voice are correlated to the speaker’s synthetic voice recognition rate.
text speech and dialogue | 2015
Markus Toman; Michael Pucher
This paper describes a software framework for HMM-based speech synthesis that we have developed and released to the public. The framework is compatible to the well-known HTS toolkit by incorporating hts_engine and Flite. It enables HTS voices to be used as Microsoft Windows system voices and to be integrated into Android and iOS apps. Non-English languages are supported through the capability to load Festival format pronunciation dictionaries and letter to sound rules. The release also includes an Austrian German voice model of a male, professional speaker recorded in studio quality as well as pronunciation dictionary, letter to sound rules and basic text preprocessing procedures for Austrian German. The framework is available under an MIT-style license.
Speech Communication | 2015
Markus Toman; Michael Pucher; Sylvia Moosmüller; Dietmar Schabus
Abstract This paper presents an unsupervised method that allows for gradual interpolation between language varieties in statistical parametric speech synthesis using Hidden Semi-Markov Models (HSMMs). We apply dynamic time warping using Kullback–Leibler divergence on two sequences of HSMM states to find adequate interpolation partners. The method operates on state sequences with explicit durations and also on expanded state sequences where each state corresponds to one feature frame. In an intelligibility and dialect rating subjective evaluation of synthesized test sentences, we show that our method can generate intermediate varieties for three Austrian dialects (Viennese, Innervillgraten, Bad Goisern). We also provide an extensive phonetic analysis of the interpolated samples. The analysis includes input-switch rules, which cover historically different phonological developments of the dialects versus the standard language; and phonological processes, which are phonetically motivated, gradual, and common to all varieties. We present an extended method which linearly interpolates phonological processes but uses a step function for input-switch rules. Our evaluation shows that the integration of this kind of phonological knowledge improves dialect authenticity judgment of the synthesized speech, as performed by dialect speakers. Since gradual transitions between varieties are an existing phenomenon, we can use our methods to adapt speech output systems accordingly.
text speech and dialogue | 2015
Michael Pucher; Valon Xhafa; Agni Dika; Markus Toman
In this paper, we show how adaptive modeling within the statistical parametric speech synthesis framework can be applied to Albanian dialects. We develop speaker dependent voices for the Tosk and Gheg dialect and adapt models for the Gheg dialect from the Tosk models. We show that the adapted Gheg models outperform the speaker dependent Gheg model on an intelligibility and dialect classification task. Furthermore we show that the speaker dependent Tosk model outperforms a formant based synthesizer on an intelligibility, dialect classification and pair-wise comparison task. This formant based synthesizer is the only publicly available synthesizer for Albanian at the moment. We also show that our Gheg and Tosk synthesizers are as intelligible as natural speech. The method where one dialect is modeled through adaptation of a closely related other dialect can be applied to language varieties in general, where the background variety and adapted variety can be chosen based on pragmatic considerations like speaker or data resource availability.
conference of the international speech communication association | 2014
Cassia Valentini-Botinhao; Markus Toman; Michael Pucher; Dietmar Schabus; Junichi Yamagishi
SSW | 2013
Markus Toman; Michael Pucher; Dietmar Schabus
SSW | 2013
Markus Toman; Michael Pucher; Dietmar Schabus
conference of the international speech communication association | 2015
Michael Pucher; Markus Toman; Dietmar Schabus; Cassia Valentini-Botinhao; Junichi Yamagishi; Bettina Zillinger; Erich Schmid
conference of the international speech communication association | 2018
Markus Toman; Geoffrey S. Meltzner; Rupal Patel