Karel Blavka
Technical University of Liberec
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Karel Blavka.
multimedia signal processing | 2012
Jan Nouza; Karel Blavka; Jindrich Zdansky; Petr Cerva; Jan Silovsky; Marek Bohac; Josef Chaloupka; Michaela Kucharova; Ladislav Seps
This paper describes a complex system developed for processing, indexing and accessing data collected in large audio and audio-visual archives that make an important part of Czech cultural heritage. Recently, the system is being applied to the Czech Radio archive, namely to its oral history segment with more than 200.000 individual recordings covering almost ninety years of broadcasting in the Czech Republic and former Czechoslovakia. The ultimate goals are a) to transcribe a significant portion of the archive - with the support of speech, speaker and language recognition technology, b) index the transcriptions, and c) make the audio and text files fully searchable. So far, the system has processed and indexed over 75.000 spoken documents. Most of them come from the last two decades, but the recent demo collection includes also a series of presidential speeches since 1934. The full coverage of the archive should be available by the end of 2014.
International Workshop on Multimedia for Cultural Heritage | 2011
Jan Nouza; Karel Blavka; Marek Bohac; Petr Cerva; Jindrich Zdansky; Jan Silovsky; Jan Prazak
The Czech Radio archive of spoken documents is considered one of the gems of the Czech cultural heritage. It contains the largest collection (more than 100.000 hours) of spoken documents recorded during the last 90 years. We are developing a complex platform that should automatically transcribe a significant portion of the archive, index it and eventually prepare it for full-text search. The four-year project supported by the Czech Ministry of culture is challenging in the way that it copes with huge volumes of data, with historical as well as contemporary language, a rather low signal quality in case of old recordings, and also with documents spoken not only in Czech but also in Slovak. The technology used includes speech, speaker and language recognition modules, speaker and channel adaptation components, tools for data indexation and retrieval, and a web interface that allows for public access to the archive. Recently, a demo version of the platform is available for testing and searching in some 10.000 hours of already processed data.
Journal of Multimedia | 2012
Jan Nouza; Karel Blavka; Petr Cerva; Jindrich Zdansky; Jan Silovsky; Marek Bohac; Jan Prazak
In this paper we describe a complex software platform that is being developed for the automatic transcription and indexation of the Czech Radio archive of spoken documents. The archive contains more than 100.000 hours of audio recordings covering almost ninety years of public broadcasting in the Czech Republic and former Czechoslovakia. The platform is based on modern speech processing technology and includes modules for speech, speaker and language recognition, and tools for multimodal information retrieval. The aim of the project supported by the Czech Ministry of Culture is to make the archive accessible and searchable both for researchers as well as for wide public. After the first project’s year, the key modules have been already implemented and tested on a 27.400-hour subset of the archive. A web-based full-text search engine allows for the demonstration of the project’s current state.
text speech and dialogue | 2013
Marek Bohac; Karel Blavka
In this paper we propose a method for text-to-speech alignment intended for imperfect (text) transcriptions. We designed an ASR-based (automatic speech recognition) tool complemented with a special post-processing layer that finds anchor points in the transcription and then aligns the data between these anchor points. As the system is not dependent on usually employed keyword-spotter it is not as vulnerable to the noisy recordings as some other approaches. We also present other features of the system (e.g. keeping of the document structure and processing of the numbers) that allow us to use it in many other specific tasks. The performance is evaluated over a challenging set of recordings containing spontaneous speech with many hesitations, repetitions etc. as well as over noisy recordings.
text speech and dialogue | 2012
Marek Bohac; Jan Nouza; Karel Blavka
When automatic speech recognition (ASR) system is being developed for an application where a large amount of audio documents is to be transcribed, we need some feedback information that tells us, what the main types of errors are, why and where they occur and what can be done to eliminate them. While the algorithm commonly used for counting the number of word errors is simple, it does not care much about the nature and source of the errors. In this paper, we introduce a scheme that offers a more detailed insight into analysis of ASR errors. We apply it to the performance evaluation of a Czech ASR system whose main goal is to transcribe oral archives containing hundreds of thousands spoken documents. The analysis is performed by comparing 763 hours of manually and automatically transcribed data. We list the main types of errors and present methods that try to eliminate at least the most relevant ones. We show that the proposed error locating method can be useful also when porting an existing ASR system to another language, where it can help in an efficient identification of errors in the lexicon.
international conference on telecommunications | 2012
Marek Bohac; Karel Blavka; Michaela Kucharova; Svatava Škodová
This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings. The post-processing is adapted for the needs of the web presentation of the archive. Up to now it has been used to process about 60,000 audio documents. We present the overall structure of the system as well as its core modules - speech recognition engine, speaker diarization module and final text processing. Special attention is paid to the punctuation issue. The punctuation accuracy is evaluated and compared to human use. In the final part of the paper we propose further improvements and ideas for the future research.
international conference on telecommunications | 2015
Jan Nouza; Karel Blavka; Marek Bohac; Petr Cerva; Jiří Málek
In this paper, we present a system developed in our lab to provide subtitles to audio-visual shows and documents produced by Czech internet company Stream.cz. The main goal is to make these programs understandable also for deaf and hearing impaired persons. We describe the whole process that starts with extracting the audio channel from the document, then identifies speech and converts it to text (using either automatically generated or human edited transcripts), and eventually produces the subtitles synchronized with the audio and video tracks. We present the employed methods (including the adaptation of the system to the target data), compare results on various types of documents and provide some relevant statistics collected during the first year of practical deployment.
2011 10th International Workshop on Electronics, Control, Measurement and Signals | 2011
Marek Bohac; Karel Blavka
The paper deals with automatic processing of spoken documents from the Czech Radio archive that contains hundreds of thousands of audio recordings. The ultimate goal of the project is to transcribe them and to allow the public access to their content. In this paper, we focus on processing of those documents that have been already transcribed (by humans or in another way) and are to be synchronized (time aligned) with the text. We aim at developing a method that is time efficient and at the same time robust enough to incorrect or incomplete transcriptions. The method is based on the combination of two speech recognition techniques. The first one, a word spotting method searches for selected words in the transcription and proposes points where the document can be split into shorter and homogenous segments covered by the text transcription. For them, we utilize a modified forced-alignment procedure to get time stamps for each word in the transcription. The method runs with 0.5 real-time factor and yields 95.5% word alignment precision. So far, it has been used for transcribing and indexing more than 552 hours of archive recordings.
multimedia signal processing | 2012
Petr Cerva; Jan Silovsky; Jindrich Zdansky; Ondrej Smola; Karel Blavka; Karel Palecek; Jan Nouza; Jiri Malek
This paper presents a complex system developed to improve the quality of distance learning by allowing people to browse the content of various (academic) lectures. The system consists of several main modules. The first automatic speech recognition (ASR) module is designed to cope with inflective Czech language and provides time-aligned transcriptions of input audio-visual recordings of lectures. These transcriptions are generated off-line in two recognition passes using speaker adaptation methods and language models mixed from various text sources including transcriptions of broadcast programs, spontaneous telephone talks, web discussions, thesis, etc. Lecture recordings and their transcriptions are then indexed and stored in the database. The next module, client-server web lecture browser, allows to browse or play the indexed content and search in it.
international conference on telecommunications | 2013
Marek Bohac; Jiri Malek; Karel Blavka
In this paper we propose an algorithm for grapheme-to-phoneme (G2P) alignment. Such alignment is needed mainly for the data-driven training of G2P conversion tools. Our approach utilizes a given phonetic alphabet and a set of given orthographic-phonetic word pairs as a source of prior knowledge. The development data are taken from a manually created pronunciation lexicon for a large vocabulary speech recognition system for Czech. The alignment method is based on extended Minimum Edit Distance algorithm. Moreover, we propose an approach to avoid the creation of reference alignments - we evaluate the improvements through a specially designed G2P converter, i.e. we compare the phonetic transcription directly to a set of test orthographic-phonetic word pairs. Results of our approach are comparable or even slightly better than the state-of-the-art.