Marián Trnka
Slovak Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marián Trnka.
text, speech and dialogue | 2011
Sakhia Darjaa; Miloš Cerňak; Štefan Beňuš; Milan Rusko; Róbert Sabo; Marián Trnka
This paper presents rule-based triphone mapping for acoustic models training in automatic speech recognition. We test if the incorporation of expanded knowledge at the level of parameter tying in acoustic modeling improves the performance of automatic speech recognition in Slovak. We propose a novel technique of knowledge-based triphone tying, which allows the synthesis of unseen triphones. The proposed technique is compared with decision tree-based state tying, and it is shown that for bigger acoustic models, at a size of 3000 states and more, a triphone mapped HMM system achieves better performance than a tree-based state tying system on a large vocabulary continuous speech transription task. Experiments, performed using 350 hours of a Slovak audio database of mixed read and spontaneous speech, are presented. Relative decrease of word error rate was 4.23% for models with 7500 states, and 4.13% at 11500 states.
text speech and dialogue | 2004
Milan Rusko; Marián Trnka; Sachia Darzágín; Milos Cernak
After the years of hesitation the conservative Slovak telecommunication market seems to become conscious of the need of voice driven services. In the last year, all the three telecommunication operators have adopted our text to speech system Kempelen in their interactive voice response systems. The diphone concatenative synthesis has probably reached the frontier of its abilities and so the next step is to check for a synthesis method giving more intelligible and more natural synthesized speech with better prosody modelling. Therefore we have decided to build a one speaker speech database in Slovak for experiments and application building in unit-selection speech synthesis. To build such a database, we tried to exploit as much of the existing speech resources in Slovak as possible, to utilize the knowledge from previous projects and to use the existing routines developed at our department. The paper describes the structure, recording and annotation of this database as well as first experiments with unit-selection speech synthesizer.
language and technology conference | 2011
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Martin Lojka
This paper describes the design, development and evaluation of the Slovak dictation system for the judicial domain. The speech is recorded using a close-talk microphone and the dictation system is used for on-line or off-line automatic transcription. The system provides an automatic dictation tool in Slovak for the employees of the Ministry of Justice of the Slovak Republic and all the courts in Slovakia. The system is designed for on-line dictation and off-line transcription of legal texts recorded in acoustical conditions of typical office. Details of the technical solution are given and the evaluation of different versions of the system is presented.
conference of the international speech communication association | 2016
Rivka Levitan; Stefan Benus; Ramiro H. Gálvez; Agustín Gravano; Florencia Savoretti; Marián Trnka; Andreas Weise; Julia Hirschberg
Entrainment, aka accommodation or alignment, is the phenomenon by which conversational partners become more similar to each other in behavior. While there has been much work on some behaviors there has been little on entrainment in speech and even less on how Spoken Dialogue Systems (SDS) which entrain to their users’ speech can be created. We present an architecture and algorithm for implementing acoustic-prosodic entrainment in SDS and show that speech produced under this algorithm conforms to the feature targets, satisfying the properties of entrainment behavior observed in human-human conversations. We present results of an extrinsic evaluation of this method, comparing whether subjects are more likely to ask advice from a conversational avatar that entrains vs. one that does not, in English, Spanish and Slovak SDS.
text speech and dialogue | 2013
Milan Rusko; Marián Trnka; Sakhia Darjaa; Marian Ritomský
Warnings generated by a specially designed speech synthesizer can be used to inform, warn, instruct and navigate people in dangerous and critical situations. The paper presents the design of the speech synthesizer capable of generating warning messages with different urgency levels in Slovak and also in Romani - the under-resourced and digitally endangered language of the Slovak Roma. An original three-step method is proposed for creating expressive speech databases. Expressive synthesizers trained on these databases and capable of generating Romani and Slovak synthetic warning speech and messages in three levels of urgency are presented.
language and technology conference | 2013
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Stanislav Ondáš
This paper describes evaluation and recent advances in application of speech dictation system for the judicial domain. The dictation system incorporates Slovak speech recognition and uses a plugin for widely used office suite. It was introduced recently after preliminary user evaluation in the Slovak courts. The system was improved significantly using new acoustic databases for evaluation and acoustic modeling when compared to the previous version. The speaker adaptation procedure and gender dependent models significantly improve the overall accuracy below 5 % WER for domain specific test set. The language resources were extended and the language modeling techniques were improved as it is described in the paper. An end-user questionnaire about the user interface was evaluated and new functionalities were introduced. According to the available feedback, it can be concluded that the dictation system is able to speed up the court proceedings significantly for each user willing to cooperate with new technologies.
text speech and dialogue | 1999
Jan Cernocký; Petr Pollák; Milan Rusko; Václav Hanzl; Marián Trnka
The databases of 5 East-European languages: Czech, Slovak, Russian, Polish and Hungarian are being created within the SpeechDat-E project. This paper describes the overall design of SpeechDat-E databases and concentrates on the Czech (1000 speakers) and Slovak (1000 speakers). The item structure and recording Specifications are presented. More detailed description is included for the language-specific items. Attention is paid also to the geographic and dialect distribution of speakers. The paper also presents the recruitment strategy.
Archive | 2019
Milan Rusko; Marián Trnka; Sakhia Darjaa; Jakub Rajčáni; Michael Finke; Tim H. Stelkens-Kobsch
This document describes the concept of an air traffic management security system and current validation activities. This system uses speech analysis techniques to verify the speaker authorization and to measure the stress level within the air-ground voice communication between pilots and air traffic controllers on one hand, and on the other hand it monitors the current air traffic situation. The purpose of this system is to close an existing security gap by using this multi-modal approach. First validation results are discussed at the end of this article.
Journal of the Acoustical Society of America | 2018
Igor Guoth; Sakhia Darjaa; Marián Trnka; Milan Rusko; Marian Ritomský; Roman Jarina
The most commonly adopted approaches in speech emotion recognition (SER) utilize magnitude spectrum and nonlinear Teager energy operator (TEO) based features while information about phase spectrum is often omitted. The information about phase has been frequently overlooked in approaches applied by speech processing researchers due to the signal processing difficulties. We present study of two phase-based features: The relative phase shift (RPS) based features and modified group delay features (MODGDF) that represents phase structure of speech in the task of emotional arousal recognition. The evaluation is performed on the CRISIS acted speech database which allows us to recognize five levels of emotional arousal from speech. To exploit these features, we employ concept of deep neural network. The efficiency of the approaches based on features mentioned earlier is compared to baseline platform using Mel frequency cepstral coefficients (MFCCs) and all pole group delay features (APGD). The combination of anot...
173rd Meeting of Acoustical Society of America and 8th Forum Acusticum | 2017
Igor Guoth; Milan Rusko; Marian Ritomský; Marián Trnka; Darjaa Sakhia
The mel cepstral coefficients representing the magnitude spectrum and the Teager energy operators, are often used as features in emotion and stress recognition. The phase spectrum information is generally ignored. In this work we propose the usage of an all pole group delay (APGD) function to represent the phase information, for tense arousal level recognition from speech, along with mel-frequency cepstral coefficients (MFCC) and Critical Band Based Teager energy operator Autocorrelation Envelope features (TEO-CB-Auto-Env). The combination of APGD, MFCC and TEO-CB-Auto-Env features has shown the best recognition results confirming the hypothesis that the phase and magnitude spectra contain complementary information and also their combination with nonlinear features can improve the reliability of the tense arousal level recognition system. The evaluation was performed on the level of whole recordings and we observed statistically very significant absolute improvement of recognition rate of 4.54 % and 4.6 %...