Martin Grůber
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Grůber.
text speech and dialogue | 2013
Daniel Tihelka; Martin Grůber; Zdeněk Hanzlíček
The paper points to problematic and usually neglected aspects of using listening tests for TTS evaluation. It shows that simple random selection of phrases to be listened to may not cover those cases which are relevant to the evaluated TTS system. Also, it shows that a reliable phrase set cannot be chosen without a deeper knowledge of the distribution of differences in synthetic speech, which are obtained by comparing the output generated by an evaluated TTS system to what stands as a baseline system. Having such knowledge, the method able to evaluate the reliability of listening tests, as related to the estimation of possible invalidity of listening results-derived conclusion, is proposed here and demonstrated on real examples.
text, speech and dialogue | 2018
Daniel Tihelka; Zdeněk Hanzlíček; Markéta Jůzová; Jakub Vít; Jindřich Matoušek; Martin Grůber
This paper provides a survey of the current state of ARTIC – the modern Czech concatenative corpus-based text-to-speech system. Through more than a decade of research & development in the field of speech technologies and applications, the system was enriched with new languages (and, as a consequence, language-dependent NLP methods), and its speech generation capabilities were significantly improved when new progressive speech generation modules (SPS, DNN, HSS) were (and are still being to) designed and incorporated into it. Also, ARTIC has to deal with various requirements on data used to generate speech from, ranging in size, quality and domain of the output speech, while there always was the requirement to achieve the highest quality in terms of both naturalness and intelligibility. Thus, the paper summarizes some of the most significant achievements and demanding tasks which had to be tackled by the system, illustrating the universality and flexibility of this Czech TTS system.
text speech and dialogue | 2012
Martin Grůber; Zdeněk Hanzlíček
This paper deals with expressive speech synthesis in a limited domain restricted to conversations between humans and a computer on a given topic. Two different methods (unit selection and HMM-based speech synthesis) were employed to produce expressive synthetic speech, both with the same description of expressivity by so-called communicative functions. Such a discrete division is related to our limited domain and it is not intended to be a general solution for expressivity description. Resulting synthetic speech was presented to listeners within a web-based listening test to evaluate whether the expressivity is perceived as expected. The comparison of both methods is also shown.
language and technology conference | 2009
Martin Grůber; Milan Legát; Pavel Ircing; Jan Romportl; Josef Psutka
This paper presents part of the data collection efforts undergone within the project COMPANIONS whose aim is to develop a set of dialogue systems that will be able to act as an artificial “companions” for human users. One of these systems, being developed in Czech language, is designed to be a partner of elderly people which will be able to talk with them about the photographs that capture mostly their family memories. The paper describes in detail the collection of natural dialogues using the Wizard of Oz scenario and also the re-use of the collected data for the creation of the expressive speech corpus that is planned for the development of the limited-domain Czech expressive TTS system.
text speech and dialogue | 2014
Zdeněk Hanzlíček; Martin Grůber
Most modern speech synthesis systems utilize large speech corpora to learn new voices. These speech corpora usually contain several hours of speech spoken by talented speakers who are able to record such an amount of speech data in a sufficient quality. An appropriate phonetic and prosodic annotation of the recorded utterances is necessary for a high quality of synthesized speech. For many languages, the pitch shape within the last prosodic word of a phrase is characteristic for particular types of sentences and phrase structure of compound/complex sentences. However in the real data, this formal convention can be breached and a different pitch shape than expected can be present. This can be a source of prosody inconsistency in synthesized speech. This article presents some experiments on automatic detection of prosodic mismatch in recorded utterances. A simple classifier based on GMM was proposed for this task. Experiments were performed on 5 large speech corpora. The classification results were successfully verified by listening tests.
text, speech and dialogue | 2011
Jindřich Matoušek; Zdenek Hanzlícek; Michal Campr; Z. Krňoul; Pavel Campr; Martin Grůber
A web-based system for automatic reading of technical documents focused on vision-impaired primary-school students is presented in the paper. An overview of the system, both its backend (used by teachers to create and manage the documents) and frontend (used by students for viewing and reading the documents), is given. Text-to-speech synthesis utilised for the automatic reading and, especially, the automatic processing of mathematical and physical formulas are described as well.
international conference on speech and computer | 2016
Daniel Tihelka; Martin Grůber; Markéta Jůzová
We present a sequence of experiments with one–class classification, aimed at examining the ability of such a classifier to detect spectral smoothness of units, as an alternative to heuristics–based measures used within unit selection speech synthesizers. A set of spectral feature distances was computed between neighbouring frames in natural speech recordings, i.e. those representing natural joins, from which the per–vowel classifier was trained. In total, three types of classifiers were examined for distances computed from several different signal parametrizations. For the evaluation, the trained classifiers were tested against smooth or discontinuous joins as they were perceived by human listeners in the ad–hoc listening test designed for this purpose.
international conference on speech and computer | 2013
Martin Grůber; Jindřich Matoušek
In our recent work, a method on how to enumerate differences between various expressive categories communicative functions has been proposed. To improve the overall impact of this approach to both the quality of synthetic expressive speech and expressivity perception by listeners, a few modifications are suggested in this paper. The main ones consist in a different way of expressive data processing and penalty matrix calculation. A complex evaluation using listening tests and some auxiliary measures was performed.
text speech and dialogue | 2010
Martin Grůber; Jindřich Matouýek
Fourth International Workshop on Human-Computer Conversation | 2008
Milan Legát; Martin Grůber; Pavel Ircing