Aleš Pražák
University of West Bohemia
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Aleš Pražák.
text speech and dialogue | 2006
Aleš Pražák; Josef Psutka; Jan Hoidekr; Jakub Kanis; Luděk Müller
This paper describes a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings. The recognition system is based on Hidden Markov Models, lexical trees and bigram language model. The acoustic model is trained on 40 hours of parliament speech and the language model on more than 10M tokens of parliament speech trancriptions. The first part of the article is focused on text normalization and class-based language model preparation. The second part describes the recognition network and its decoding with respect to real-time operation demands using up to 100k vocabulary. The third part outlines the application framework allowing generation and displaying of subtitles for any audio/video source. Finally, experimental results obtained on parliament speeches with recognition accuracy varying from 80 to 95 % (according to the discussed topic) are reported and discussed.
Eurasip Journal on Audio, Speech, and Music Processing | 2011
Josef Psutka; Jan Švec; Jan Vaněk; Aleš Pražák; Luboš Šmídl; Pavel Ircing
The main objective of the work presented in this paper was to develop a complete system that would accomplish the original visions of the MALACH project. Those goals were to employ automatic speech recognition and information retrieval techniques to provide improved access to the large video archive containing recorded testimonies of the Holocaust survivors. The system has been so far developed for the Czech part of the archive only. It takes advantage of the state-of-the-art speech recognition system tailored to the challenging properties of the recordings in the archive (elderly speakers, spontaneous speech and emotionally loaded content) and its close coupling with the actual search engine. The design of the algorithm adopting the spoken term detection approach is focused on the speed of the retrieval. The resulting system is able to search through the 1,000 h of video constituting the Czech portion of the archive and find query word occurrences in the matter of seconds. The phonetic search implemented alongside the search based on the lexicon words allows to find even the words outside the ASR system lexicon such as names, geographic locations or Jewish slang.
text speech and dialogue | 2011
Lucie Skorkovská; Pavel Ircing; Aleš Pražák; Jan Lehečka
The paper presents a module for topic identification that is embedded into a complex system for acquisition and storing large volumes of text data from the Web. The module processes each of the acquired data items and assigns keywords to them from a defined topic hierarchy that was developed for this purposes and is also described in the paper. The quality of the topic identification is evaluated in two ways - using classic precision-recall measures and also indirectly, by measuring the ASR performance of the topic-specific language models that are built using the automatically filtered data.
language resources and evaluation | 2014
Jan Švec; Jan Lehečka; Pavel Ircing; Lucie Skorkovská; Aleš Pražák; Jan Vavruška; Petr Stanislav; Jan Hoidekr
The paper describes a general framework for mining large amounts of text data from a defined set of Web pages. The acquired data are meant to constitute a corpus for training robust and reliable language models and thus the framework needs to also incorporate algorithms for appropriate text processing and duplicity detection in order to secure quality and consistency of the data. As we expect the resulting corpus to be very large, we have also implemented topic detection algorithms that allow us to automatically select subcorpora for domain-specific language models. The description of the framework architecture and the implemented algorithms is complemented with a detailed evaluation section. It analyses the basic properties of the gathered Czech corpus containing more than one billion text tokens collected using the described framework, shows the results of the topic detection methods and finally also describes the design and outcomes of the automatic speech recognition experiments with domain-specific language models estimated from the collected data.
text speech and dialogue | 2012
Daniel Soutner; Zdeněk Loose; Luděk Müller; Aleš Pražák
In this paper we investigate whether a combination of statistical, neural network and cache language models can outperform a basic statistical model. These models have been developed, tested and exploited for a Czech spontaneous speech data, which is very different from common written Czech and is specified by a small set of the data available and high inflection of the words. As a baseline model we used a trigram model and after its training several cache models interpolated with the baseline model have been tested and measured on a perplexity. Finally, an evaluation of the model with the lowest perplexity has been performed on speech recordings of phone calls.
text speech and dialogue | 2012
Aleš Pražák; Zdeněk Loose; Jan Trmal; Josef Psutka
In this paper we introduce our complete solution for captioning of live TV programs used by the Czech Television, the public service broadcaster in the Czech Republic. Live captioning using speech recognition and re-speaking is on the increase and widely used for example in BBC; however, many specific issues have to be solved each time a new captioning system is being put in operation. Our concept of re-speaking assumes a complex integration of re-speaker’s skills, not only verbatim repetition with fully automatic processing. This paper describes the recognition system design with advanced re-speaker interaction, distributed captioning system architecture and neglected re-speaker training. Some evaluation of our skilled re-speakers is presented too.
text speech and dialogue | 2014
Josef Psutka; Aleš Pražák; Vlasta Radová
In this paper, we describe our effort and some interesting insights obtained during captioning more than 70 hours of live TV broadcasts from the Olympic Games in Sochi. The closed captioning was prepared for CT Sport, the sport channel of the public service broadcaster in the Czech Republic. We will briefly discuss our solution for distributed captioning architecture on live TV programs using re-speaking approach as well as several modifications of existing live captioning application (especially LVCSR system), but also the way of re-speaking of a real TV commentary for individual sports. We will show that a re-speaker after hard training can achieve such accuracy (more than 98 %) and readability of captions which clearly outperform accuracy of captions created by automatic recognition of TV soundtrack.
text, speech and dialogue | 2018
Jan Lehečka; Aleš Pražák
In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.
international conference on speech and computer | 2018
Zbyněk Zajíc; Lucie Zajícová; Josef Psutka; Petr Salajka; Jaromír Novotný; Aleš Pražák; Luděk Müller
In this paper, we describe the initial stages of the project “Access to a Linguistically Structured Database of Enquiries from the Language Consulting Center”. This project is attempting to provide an improved access to the large archives of mainly telephone conversations collected continuously by the Institute of the Czech Language. The main goal is to open up the unique Czech data acquired from the queries to the Language Consulting Center and to build the semi-automatic system that will facilitate searching and categorizing of these queries. For this purpose, the Automatic Speech Recognizer (ASR) and the language processing methods are being designed. The vocabulary used in such queries contains many unusual words unlike the common speech (e.g. linguistic terms). In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling purposes. In this paper, the proposed telephone system for recording the new data and the baseline speech recognition on these data is described. The first experiments with the topic detection on these data aimed at discovering what can be found in them and also how to preprocess them is also described.
international conference on speech and computer | 2018
Luboš Šmídl; Jan Švec; Aleš Pražák; Jan Trmal
In this paper, we describe a semi-supervised training method used to generalize the Air Traffic Control (ATC) speech recognizer. The paper introduces the problems and challenges in ATC English recognition, describes available datasets and ongoing research projects. The baseline recognition model is then used to recognize the unlabelled data from a publicly available source. We used the LiveATC community portal which records and archives the recordings of ATC communication near the airports. The recognized unlabelled data are filtered using the data selection procedure based on confidence scores and the recognition acoustic model is retrained to obtain a more general model. The results on accented Czech and French data are reported.