Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Markéta Jůzová is active.

Publication


Featured researches published by Markéta Jůzová.


text, speech and dialogue | 2018

Current State of Text-to-Speech System ARTIC: A Decade of Research on the Field of Speech Technologies

Daniel Tihelka; Zdeněk Hanzlíček; Markéta Jůzová; Jakub Vít; Jindřich Matoušek; Martin Grůber

This paper provides a survey of the current state of ARTIC – the modern Czech concatenative corpus-based text-to-speech system. Through more than a decade of research & development in the field of speech technologies and applications, the system was enriched with new languages (and, as a consequence, language-dependent NLP methods), and its speech generation capabilities were significantly improved when new progressive speech generation modules (SPS, DNN, HSS) were (and are still being to) designed and incorporated into it. Also, ARTIC has to deal with various requirements on data used to generate speech from, ranging in size, quality and domain of the output speech, while there always was the requirement to achieve the highest quality in terms of both naturalness and intelligibility. Thus, the paper summarizes some of the most significant achievements and demanding tasks which had to be tackled by the system, illustrating the universality and flexibility of this Czech TTS system.


text, speech and dialogue | 2017

Last Syllable Unit Penalization in Unit Selection TTS.

Markéta Jůzová; Daniel Tihelka; Radek Skarnitzl

While unit selection speech synthesis tries to avoid speech modifications, it strongly depends on the placement of units into the correct position. Usually, the position is tightly coupled with a distance from the beginning/end of some prosodic or rhythmic units like phrases or words. The present paper shows, however, that it is not necessary to follow position requirements, when the phonetic knowledge of the perception of prosodic patterns (mostly durational in our case) is considered. In particular, we focus on the effects of using word-final units in word-internal positions in synthesized speech, which are often perceived negatively by listeners, due to disruptions in local timing.


text, speech and dialogue | 2018

On the Extension of the Formal Prosody Model for TTS

Markéta Jůzová; Daniel Tihelka; Jan Volín

The formal prosody grammar used for TTS focuses mainly on the description of final prosodic words in phrases/sentences which characterize a special prosodic phenomenon representing a certain communication function within the language system. This paper introduces an extension of the prosody model which also takes into account the importance and distinction of the first prosodic words in the prosodic phrases. This phenomenon can not change the semantic interpretation of the phrase, but for higher naturalness, the beginnings of the prosodic phrases differ from subsequent words and should be, based on the phonetic background, dealt with separately.


text speech and dialogue | 2014

Minimum Text Corpus Selection for Limited Domain Speech Synthesis

Markéta Jůzová; Daniel Tihelka

This paper concerns limited domain TTS system based on the concatenative method, and presents an algorithm capable to extract the minimal domain-oriented text corpus from the real data of the given domain, while still reaching the maximum coverage of the domain. The proposed approach ensures that the least amount of texts are extracted, containing the most common phrases and (possibly) all the words from the domain. At the same time, it ensures that appropriate phrase overlapping is kept, allowing to find smooth concatenation in the overlapped regions to reach high quality synthesized speech. In addition, several recommendations allowing a speaker to record the corpus more fluently and comfortably are presented and discussed. The corpus building is tested and evaluated on several domains differing in size and nature, and the authors present the results of the algorithm and demonstrate the advantages of using the domain oriented corpus for speech synthesis.


text speech and dialogue | 2017

Prosodic Phrase Boundary Classification Based on Czech Speech Corpora

Markéta Jůzová

The correct usage of phrase boundaries is an important issue for ensuring a natural sounding and easily intelligible speech. Therefore, it is not surprising that the boundary detection is also a part of text-to-speech systems. In the presented paper, large speech corpora are used for a classification based approach in order to improve the phrasing of synthesized sentences. The paper compares results of different classifiers to the deterministic approaches based on punctuation and conjunctions and shows that they are able to outperform the simple algorithms.


international conference on speech and computer | 2017

CRF-Based Phrase Boundary Detection Trained on Large-Scale TTS Speech Corpora

Markéta Jůzová

The paper compares different approaches in the phrase boundary detection issue, based on the data gained from speech corpora recorded for the purpose of the text-to-speech (TTS) system. It is showed that conditional random fields model can outperform basic deterministic and classification-based algorithms both in speaker-dependent and speaker independent phrasing. The results on manually annotated sentences with phrase breaks are presented here as well.


international conference on speech and computer | 2016

Experiments with One–Class Classifier as a Predictor of Spectral Discontinuities in Unit Concatenation

Daniel Tihelka; Martin Grůber; Markéta Jůzová

We present a sequence of experiments with one–class classification, aimed at examining the ability of such a classifier to detect spectral smoothness of units, as an alternative to heuristics–based measures used within unit selection speech synthesizers. A set of spectral feature distances was computed between neighbouring frames in natural speech recordings, i.e. those representing natural joins, from which the per–vowel classifier was trained. In total, three types of classifiers were examined for distances computed from several different signal parametrizations. For the evaluation, the trained classifiers were tested against smooth or discontinuous joins as they were perceived by human listeners in the ad–hoc listening test designed for this purpose.


text speech and dialogue | 2014

Tuning Limited Domain Speech Synthesis Using General Text-to-Speech System ∗

Markéta Jůzová; Daniel Tihelka

The subject of the present paper is the building of a limited domain speech synthesis system, where longer units, like words and phrases, can naturally be concatenated together. However, instead of building a single-purpose domain-oriented engine working with longer units, we show that a general-purpose TTS system can be used as a good emulation tool to ensure that a real domain-oriented engine will work correctly. Since the current general speech synthesis system embedding unit selection method concatenates short speech units (diphones), the selection algorithm has been modified to pretend the concatenation of words or even the whole phrases, while still concatenating diphones internally. The behaviour of the system is tested on two limited domains and its output is compared to the output of general (unmodified) version of the same TTS system. The results show clear encouragement for the build of the “real” domain-oriented engine.


text, speech and dialogue | 2018

\(\hbox {F}_0\) Post-Stress Rise Trends Consideration in Unit Selection TTS

Markéta Jůzová; Jan Volín

In spoken Czech language, the stress and post-stress syllables in human speech are usually characterized by an increase in fundamental frequency \(\hbox {F}_0\) (except for phrase-final stress groups). In unit selection text-to-speech systems, where no contour of \(\hbox {F}_0\) is generated to be followed, however, the \(\hbox {F}_0\) behaviour is usually tended very vaguely. The paper presents an experiment of making the unit selection TTS to follow the trends of fundamental frequency rise in synthesized speech to achieve higher naturalness and overall quality of speech synthesis itself.


international conference on speech and computer | 2018

On the Comparison of Different Phrase Boundary Detection Approaches Trained on Czech TTS Speech Corpora

Markéta Jůzová

The phrasing is a very important issue in the process of speech synthesis since it ensures higher naturalness and intelligibility of synthesized sentences. There are many different approaches to phrase boundary detection, including simple classification-based, HMM-based, CRF-based approaches, however, different types of neural networks are used for this task as well. The paper compares representative methods for phrasing of Czech sentences using large-scale TTS speech corpora as training data, taking only speaker-dependent phrasing issue into consideration.

Collaboration


Dive into the Markéta Jůzová's collaboration.

Top Co-Authors

Avatar

Daniel Tihelka

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Jan Volín

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Martin Grůber

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Jakub Vít

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Radek Skarnitzl

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge