Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Milan Legát is active.

Publication


Featured researches published by Milan Legát.


Speech Communication | 2011

On the detection of pitch marks using a robust multi-phase algorithm

Milan Legát; Jindrich Matousek; Daniel Tihelka

A large number of methods for identifying glottal closure instants (GCIs) in voiced speech have been proposed in recent years. In this paper, we propose to take advantage of both glottal and speech signals in order to increase the accuracy of detection of GCIs. All aspects of this particular issue, from determining speech polarity to handling a delay between glottal and corresponding speech signal, are addressed. A robust multi-phase algorithm (MPA), which combines different methods applied on both signals in a unique way, is presented. Within the process, a special attention is paid to determination of speech waveform polarity, as it was found to be considerably influencing the performance of the detection algorithms. Another feature of the proposed method is that every detected GCI is given a confidence score, which allows to locate potentially inaccurate GCI subsequences. The performance of the proposed algorithm was tested and compared with other freely available GCI detection algorithms. The MPA algorithm was found to be more robust in terms of detection accuracy over various sets of sentences, languages and phone classes. Finally, some pitfalls of the GCI detection are discussed.


text speech and dialogue | 2009

Design of the Test Stimuli for the Evaluation of Concatenation Cost Functions

Milan Legát; Jindřich Matoušek

A large number of methods for measuring of audible discontinuities, which occur at concatenation points in synthesized speech, have been proposed in recent years. However, none of them proved to be comparatively better than others across all languages and recording conditions and the presented results have sometimes even been in contradiction. What is more, none of the tested concatenation cost functions seem to be reliably reflecting the human perception of such discontinuities. Thus, the design of the concatenation cost functions is still an open issue, and there is a lot of work remaining to be done. In this paper, we deal with the problem of preparing the test stimuli for evaluating the performance of these functions, which is, in our opinion, one of the key aspects in this field.


text, speech and dialogue | 2007

Pitch marks at peaks or valleys

Milan Legát; Daniel Tihelka; Jindřich Matoušek

This paper deals with the problem of speech waveform polarity. As the polarity of speech waveform can influence the performance of pitch marking algorithms (see Sec. 4), a simple method for the speech signal polarity determination is presented in the paper. We call this problem peak/valley decision making, i.e. making of decision whether pitch marks should be placed at peaks (local maxima) or at valleys (local minima) of a speech waveform. Besides, Besides, the proposed method can be utilized to check the polarity consistence of a speech corpus, which is important for the concatenation of speech units in speech synthesis.


text speech and dialogue | 2011

Identifying concatenation discontinuities by hierarchical divisive clustering of pitch contours

Milan Legát; Jindřich Matoušek

In this paper, we present the results of a clustering experiment, the aim of which was to show whether or not the proximity of pitch contours is sufficient condition for perceptually smooth transitions at concatenation points in concatenative speech synthesis. The experiment was motivated by a previous finding which had shown that the support vector machine (SVM) classifiers are capable of separating with a high accuracy perceptually continuous and discontinuous joins using the pitch contours extracted from the vicinity of concatenation points as predictors. The experiment has shown that clustering of observations in a form of pitch contours represented in different scales using the euclidean distance as a metric does not prove to be a reliable way of identifying discontinuities at concatenation points.


language and technology conference | 2009

Czech senior COMPANION: wizard of Oz data collection and expressive speech corpus recording and annotation

Martin Grůber; Milan Legát; Pavel Ircing; Jan Romportl; Josef Psutka

This paper presents part of the data collection efforts undergone within the project COMPANIONS whose aim is to develop a set of dialogue systems that will be able to act as an artificial “companions” for human users. One of these systems, being developed in Czech language, is designed to be a partner of elderly people which will be able to talk with them about the photographs that capture mostly their family memories. The paper describes in detail the collection of natural dialogues using the Wizard of Oz scenario and also the re-use of the collected data for the creation of the expressive speech corpus that is planned for the development of the limited-domain Czech expressive TTS system.


text speech and dialogue | 2013

Configuring TTS Evaluation Method Based on Unit Cost Outlier Detection

Milan Legát; Daniel Tihelka; Jindřich Matoušek

This paper presents a new analytic method that can be used for analyzing perceptual relevance of unit selection costs and/or their sub-components as well as for automated tuning of the unit selection weights. In particular, configuration options of the method are discussed in detail. A simple guidance on how to leverage the proposed method for the evaluation of a newly designed unit selection cost is also given in the paper. The advantage of using the proposed method is that different unit selection system configurations and tunings can automatically be evaluated without a need to conduct listening tests for each of them.


text speech and dialogue | 2012

The Role of Nasal Contexts on Quality of Vowel Concatenations

Milan Legát; Radek Skarnitzl

This paper deals with the traditional problem of occurrence of audible discontinuities at concatenation points at diphone boundaries in the concatenative speech synthesis. We present results of an analysis of effects of nasal context mismatches on the quality of concatenations in five short Czech vowels. The study was conducted with two voices (one male and one female), and the results suggest that the female voice vowels /a/, /e/ and /o/ are inclined to concatenation discontinuities due to nasalized contexts.


conference of the international speech communication association | 2007

A Robust Multi-Phase Pitch-Mark Detection Algorithm

Milan Legát; Jindrich Matousek; Daniel Tihelka


SSW | 2013

Is Unit Selection Aware of Audible Artifacts

Jindrich Matousek; Daniel Tihelka; Milan Legát


Fourth International Workshop on Human-Computer Conversation | 2008

Wizard of Oz Data Collection for the Czech Senior Companion Dialogue System

Milan Legát; Martin Grůber; Pavel Ircing

Collaboration


Dive into the Milan Legát's collaboration.

Top Co-Authors

Avatar

Martin Grůber

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Daniel Tihelka

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Pavel Ircing

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Jan Romportl

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Jindrich Matousek

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Josef Psutka

University of West Bohemia

View shared research outputs
Top Co-Authors

Avatar

Jan Hajic

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marie Mikulová

Charles University in Prague

View shared research outputs
Researchain Logo
Decentralizing Knowledge