Ján Staš
Technical University of Košice
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ján Staš.
Eurasip Journal on Audio, Speech, and Music Processing | 2014
Ján Staš; Jozef Juhár; Daniel Hládek
The robustness of n-gram language models depends on the quality of text data on which they have been trained. The text corpora collected from various resources such as web pages or electronic documents are characterized by many possible topics. In order to build efficient and robust domain-specific language models, it is necessary to separate domain-oriented segments from the large amount of text data, and the remaining out-of-domain data can be used only for updating of existing in-domain n-gram probability estimates. In this paper, we describe the process of classification of heterogeneous text data into two classes, to the in-domain and out-of-domain data, mainly used for language modeling in the task-oriented speech recognition from judicial domain. The proposed algorithm for text classification is based on detection of theme in short text segments based on the most frequent key phrases. In the next step, each text segment is represented in vector space model as a feature vector with term weighting. For classification of these text segments to the in-domain and out-of domain area, document similarity with automatic thresholding are used. The experimental results of modeling the Slovak language and adaptation to the judicial domain show significant improvement in the model perplexity and increasing the performance of the Slovak transcription and dictation system.
Archive | 2012
Jozef Juhár; Ján Staš; Daniel Hládek
Speech technologies have a potentiality to simplify the human-machine interaction as well as the communication between people. The use of speech technology applications has nowadays continuously growing trend. Each speech recognition system, which stands in the heart of every speech application, besides an algorithmic complexity, is strongly language dependent. Therefore, one of the challenging tasks by the development of the Slovak large vocabulary continuous speech recognition (LVCSR) system is a creation of an efficient language model (LM).
international symposium elmar | 2014
Ján Staš; Daniel Hládek; Jozef Juhár
In this paper we aim to describe recent advances in the statistical modeling of the Slovak language for transcription of dictated, semi-spontaneous and spontaneous conversational speech such as judicial readings, broadcast news TV and radio shows, parliament proceedings, educational talks and lectures, or interactive conversations. During the last months, we have improved the efficiency and robustness of the Slovak language models trained on the electronic and web-based language resources, including better text processing and document classification, class-based and filled pauses modeling, augmenting of n-grams and fast language model adaptation. Experimental results performed on the judicial readings, broadcast news recordings and parliament proceeding show significant decrease of the word error rate for multiple Slovak transcription system configurations of acoustic and language models in presented scenarios.
language and technology conference | 2011
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Martin Lojka
This paper describes the design, development and evaluation of the Slovak dictation system for the judicial domain. The speech is recorded using a close-talk microphone and the dictation system is used for on-line or off-line automatic transcription. The system provides an automatic dictation tool in Slovak for the employees of the Ministry of Justice of the Slovak Republic and all the courts in Slovakia. The system is designed for on-line dictation and off-line transcription of legal texts recorded in acoustical conditions of typical office. Details of the technical solution are given and the evaluation of different versions of the system is presented.
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues | 2010
Ján Staš; Daniel Hládek; Matus Pleva; Jozef Juhár
Automatic speech recognition system is one of the parts of the multimodal dialogue system. It is necessary to create correct vocabulary and to generate suitable language model for this purpose. The main aim of this article is to describe a process of building statistical models of the Slovak language with large vocabulary trained on the text data gathered mainly from Internet sources. Several smoothing techniques for different sizes of vocabulary have been used in order to obtain an optimal model of the Slovak language. We have also employed pruning technique based on relative entropy for size reduction of a language model to find the maximum threshold of pruning with minimum degradation in recognition accuracy. Tests were performed by the decoder based on the HTK Toolkit.
international conference on speech and computer | 2015
Ján Staš; Daniel Hládek; Jozef Juhár
Language model and acoustic model adaptation play an important role in enhancing performance and robustness of automatic speech recognition, especially in the case of domain-specific, gender-dependent, or user-adapted systems development. This paper is oriented on the language model speaker adaptation for transcription of parliament proceedings in Slovak for individual speaker. Based on the current research studies, we have developed a framework combining multiple speech recognition outputs with acoustic and language model adaptation at different stages. The preliminary results show a significant decrease in the model perplexity from 45 % to 74 % relatively and the speech recognition word error rate from 29 % to 43 %, for male and female speakers respectively.
international conference radioelektronika | 2016
Tomáš Koctúr; Peter Viszlay; Ján Staš; Martin Lojka; Jozef Juhár
An acoustic model is a necessary component of automatic speech recognition system. Acoustic models are trained on a lot of speech recordings with transcriptions. Usually, hundreds of transcribed recordings are required. It is very time and resource consuming process to create manual transcriptions. Acoustic models may be obtained automatically with unsupervised acoustic model training, which uses online speech resources. Obtained speech data are recognized with low resourced automatic speech recognition system. Unsupervised techniques are able to filter out the erroneous hypotheses from the result and the rest use for acoustic model training. Unsupervised methods for generating speech corpora for acoustic model training are presented in this paper.
Multimedia Tools and Applications | 2017
Daniel Hládek; Ján Staš; Stanislav Ondáš; Jozef Juhár; László Kovács
Large databases of scanned documents (medical records, legal texts, historical documents) require natural language processing for retrieval and structured information extraction. Errors caused by the optical character recognition (OCR) system increase ambiguity of recognized text and decrease performance of natural language processing. The paper proposes OCR post correction system with parametrized string distance metric. The correction system learns specific error patterns from incorrect words and common sequences of correct words. A smoothing technique is proposed to assign non-zero probability to edit operations not present in the training corpus. Spelling correction accuracy is measured on database of OCR legal documents in English language. Language model and learning string metric with smoothing improves Viterbi-based search for the best sequence of corrections and increases performance of the spelling correction system.
international conference radioelektronika | 2016
Ján Staš; Daniel Zlacky; Daniel Hládek
The paper deals with semantically similar document retrieval framework for language model adaptation in Slovak to a specific speaker speaking style. This research extends our previous study oriented on language model speaker adaptation for transcription of Slovak parliament proceedings with available speaker-specific text data. We used a large corpora for retrieving semantically similar subset of text documents for each speaker to adjust parameters of an existing well-trained language model to a specific speaker speaking style. The same large corpora was used to build original topic-specific model of the Slovak language deployed in our automatic subtitling system. In the proposed framework, the latent semantic indexing was implemented to retrieve the subset of semantically similar documents. The output hypotheses from the first step of speech recognition were used to identify patterns between terms and concepts contained in an unstructured collection of text documents. Preliminary results show a slight improvement in speech recognition accuracy for individual speaker in fully automatic subtitling of parliament speech, broadcast news TV shows and TEDx talks.
international symposium elmar | 2017
Ján Staš; Daniel Hládek; Jozef Juhár
The paper presents a semantic indexing and document retrieval approach for personalized language modeling to improve speech recognition accuracy for individual speakers in a lecture speech transcription task. The latent semantic indexing and paragraph vector modeling are implemented to retrieve a subset of documents from an existing background corpus relevant to the topic and speaking style of a speaker. We select a subset of text documents semantically similar to the output hypotheses from recognized speech segments in the first decoding stage. After that, a small user topic-specific language model is created from the relevant documents, interpolated with the background model, adapted to the current topic and applied during the second decoding stage. Experimental results performed for ten speakers from the database of the Slovak TEDx talks show an improvement in word error rate up to 3.03% relatively on average.