Milan Rusko
Slovak Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Milan Rusko.
text, speech and dialogue | 2011
Sakhia Darjaa; Miloš Cerňak; Štefan Beňuš; Milan Rusko; Róbert Sabo; Marián Trnka
This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows the synthesis of unseen triphones. The proposed technique is compared with decision tree-based state tying, and it is shown that for bigger acoustic models, at a size of 3000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system on a large vocabulary continuous speech transription task. Experiments, performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. Relative decrease of word error rate was 4.23% for models with 7500 states, and 4.13% at 11500 states.
text speech and dialogue | 2004
Milan Rusko; Marián Trnka; Sachia Darzágín; Milos Cernak
After the years of hesitation the conservative Slovak telecommunication market seems to become conscious of the need of voice driven services. In the last year, all the three telecommunication operators have adopted our text to speech system Kempelen in their interactive voice response systems. The diphone concatenative synthesis has probably reached the frontier of its abilities and so the next step is to check for a synthesis method giving more intelligible and more natural synthesized speech with better prosody modelling. Therefore we have decided to build a one speaker speech database in Slovak for experiments and application building in unit-selection speech synthesis. To build such a database, we tried to exploit as much of the existing speech resources in Slovak as possible, to utilize the knowledge from previous projects and to use the existing routines developed at our department. The paper describes the structure, recording and annotation of this database as well as first experiments with unit-selection speech synthesizer.
language and technology conference | 2011
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Martin Lojka
This paper describes the design, development and evaluation of the Slovak dictation system for the judicial domain. The speech is recorded using a close-talk microphone and the dictation system is used for on-line or off-line automatic transcription. The system provides an automatic dictation tool in Slovak for the employees of the Ministry of Justice of the Slovak Republic and all the courts in Slovakia. The system is designed for on-line dictation and off-line transcription of legal texts recorded in acoustical conditions of typical office. Details of the technical solution are given and the evaluation of different versions of the system is presented.
text speech and dialogue | 2007
Milan Rusko; Róbert Sabo; Martin Dzúr
Research and development in speech synthesis and recognition calls for a phonological intonation annotation scheme for the particular language. Inspired by the successful ToBI (Tones and Break Indices) for American English [1] and GToBI [2] for German, this paper introduces a new intonation annotation scheme for Slovak, Sk-ToBI. In spite of the fact that Slovak prosodic rules differ from those of English or German, we decided to follow the main principals of ToBI and to define a special Slovak version of Tones and Break Indices annotation scheme. The speech material belonging to different styles, which was used for the preliminary study of accents in Slovak is shortly described and the conventions of Sk-ToBI annotation are presented.
international conference on speech and computer | 2016
Róbert Sabo; Milan Rusko; Andrej Ridzik; Jakub Rajčáni
This paper reports on initial experiments with the creation of a suitable database for training and testing systems for stress detection in speech and first experimental results. Based on the psychological understanding of the concepts of stress and emotion, we operationalized stress as a level of arousal, which can be detected in speech. We describe here a speech database with three levels of “acted stress” and three levels of soothing. For the very first experiment performed on the database we detect different levels of stress using Gaussian mixture models. The accuracy of detecting three levels of stress was 89 % for speakers included in the training database and 73 % for speakers whose recordings were not used during the adaptation of the GMM models.
Multimodal Signals: Cognitive and Algorithmic Issues | 2009
Štefan Beňuš; Milan Rusko
Hot-spot words are indicators of high emotional invovement of speakers in the conversation and contain cues to the emotional state of the speaker. Understanding and modeling of these cues may improve the effec-tiveness and naturalness of automated cross-modal dialogue systems. In this paper we investigate the relationship between prosody and emotions in a subgroup of hot-spot words: non-verbal vocal gestures with a problematic textual representation. We extracted these gestures from a recording of a puppet play and argue that this corpus is well suited for investigating emotional speech. We identify the expressive load of non-verbal vocal gestures in Slovak and report on multiple ambiguities in their emotional and discourse functions. The relationship between prosody and emotions in non-verbal hot-spot words is very complex and a ToBI-based framework of discrete representation of prosody is useful but not sufficient for modeling this relationship.
text speech and dialogue | 2013
Milan Rusko; Marián Trnka; Sakhia Darjaa; Marian Ritomský
Warnings generated by a specially designed speech synthesizer can be used to inform, warn, instruct and navigate people in dangerous and critical situations. The paper presents the design of the speech synthesizer capable of generating warning messages with different urgency levels in Slovak and also in Romani - the under-resourced and digitally endangered language of the Slovak Roma. An original three-step method is proposed for creating expressive speech databases. Expressive synthesizers trained on these databases and capable of generating Romani and Slovak synthetic warning speech and messages in three levels of urgency are presented.
COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment | 2010
Milan Rusko; Štefan Beňuš
The paper presents a web-based multimodal and multilingual dictionary of gestures. Its current version contains several hundreds of gestures represented by a still image, a description of the gesture and its meaning, and optional sound and video records. The current version includes language and culture dependent content for American English, Slovak, Italian, and Mongolian. Entries for Japanese, Chinese, and Hungarian are being implemented. The primary motivation for database creation is to build a research tool that will facilitate identifying problems in research on nonverbal speech displays and their intercultural and intermodal aspects, and help in testing proposed solutions to these problems.
international conference on speech and computer | 2015
Andrej Ridzik; Milan Rusko
In some speaker verification applications the amount of data available for enrolment and verification can be limited. One of the aims of this paper is to study the impact of the volume of enrolment and verification data on the performance of the system. The second aim is focused on the improvement of the speaker verification using PLDA. The PLDA is generally used to model the speaker and channel variability in the i-vector space using data from several recording sessions. In our experiment, only data from single-session per speaker was available. Therefore, we divided the development recordings into shorter segments and these segments were treated as if they were recorded in different sessions. This approach does not model the inter-session speaker variability, nor the channel variability. However, we assumed that statistical modelling of the intra-session speaker variability could bring an improvement to the results of the verification. Different granularity of segmentation was studied at various amount of enrolment and verification data.
language and technology conference | 2013
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Stanislav Ondáš
This paper describes evaluation and recent advances in application of speech dictation system for the judicial domain. The dictation system incorporates Slovak speech recognition and uses a plugin for widely used office suite. It was introduced recently after preliminary user evaluation in the Slovak courts. The system was improved significantly using new acoustic databases for evaluation and acoustic modeling when compared to the previous version. The speaker adaptation procedure and gender dependent models significantly improve the overall accuracy below 5 % WER for domain specific test set. The language resources were extended and the language modeling techniques were improved as it is described in the paper. An end-user questionnaire about the user interface was evaluated and new functionalities were introduced. According to the available feedback, it can be concluded that the dictation system is able to speed up the court proceedings significantly for each user willing to cooperate with new technologies.