Josef Chaloupka
Technical University of Liberec
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Josef Chaloupka.
multimedia signal processing | 2012
Jan Nouza; Karel Blavka; Jindrich Zdansky; Petr Cerva; Jan Silovsky; Marek Bohac; Josef Chaloupka; Michaela Kucharova; Ladislav Seps
This paper describes a complex system developed for processing, indexing and accessing data collected in large audio and audio-visual archives that make an important part of Czech cultural heritage. Recently, the system is being applied to the Czech Radio archive, namely to its oral history segment with more than 200.000 individual recordings covering almost ninety years of broadcasting in the Czech Republic and former Czechoslovakia. The ultimate goals are a) to transcribe a significant portion of the archive - with the support of speech, speaker and language recognition technology, b) index the transcriptions, and c) make the audio and text files fully searchable. So far, the system has processed and indexed over 75.000 spoken documents. Most of them come from the last two decades, but the recent demo collection includes also a series of presidential speeches since 1934. The full coverage of the archive should be available by the end of 2014.
international conference on telecommunications | 2013
Karel Palecek; Josef Chaloupka
It is a well-known fact that the visual part of speech can improve the resulting recognition rate mainly in noisy conditions. Main goal of this work is to find a set of visual features which would be possible to use in our audio-visual speech recognition systems. Discrete Cosine Transform (DCT) and Active Appearance Model (AAM) based visual features are extracted from visual speech signals, enhanced by a simplified variant of Hierarchical Linear Discriminant Analysis (HiLDA) and normalized across speakers. The visual features are then combined with standard MFCC audio features by the middle fusion method. The results from audio-visual speech recognition are compared with the results from experiments where the log-spectra minimum mean square error and multiband spectral subtraction methods for reducing additive noise in the audio signal are used.
multimedia signal processing | 2008
Jindrich Zdansky; Josef Chaloupka; Jan Nouza
In the paper we present a complex platform for automatic processing of Czech TV news programmes. Its audio processing module provides text transcription in form of metadata that contain information about spoken content, speaker identities, used pronunciation, word positions and intonation. The video processing module provides pictures representing individual video scenes and information about detected and possibly recognized human faces. The audio and video data are merged into single XML files that are indexed and stored in a searchable database. A simple Web-based search engine can be used to retrieve information from the database that recently contain more than 1800 hours of transcribed programmes from Czech CT24 station.
text speech and dialogue | 2013
Josef Chaloupka; Jan Nouza; Petr Cerva; Jiří Málek
This paper deals with the task of adaptation of an existing Czech large-vocabulary speech recognition (LVCSR) system to the language used in previous historical epochs (before 1990). The goal is to fit its lexicon and language model (LM) so that the system could be employed for the automatic transcription of old spoken documents in the Czech Radio archive. The main problem is the lack of texts (in electronic form) from the 1945-1990 period. The only available and large enough source is digitized copies of Rude Pravo, the newspaper of the former Communist party of Czechoslovakia, the actual ruling body in the state. The newspaper has been scanned and converted into text via an OCR software. However, the amount of OCR errors is very high and so we have to apply several text pre-processing techniques to get a corpus suitable for the lexicon and language model ’downdating’ (i.e. adaptation to the past). The proposed techniques helped us a) to reduce the number of out-of-vocabulary strings from 8.5 to 6.4 millions, b) to identify 6.7 thousand history-conditioned word candidates to be added to the lexicon and c) to build a more appropriate LM. The adapted LVCSR system was evaluated on broadcast news from 1969-1989 where its word-error-rate decreased from 17.05 to 14.33%.
Multimodal Signals: Cognitive and Algorithmic Issues | 2009
Josef Chaloupka; Jan Nouza; Jindrich Zdansky; Petr Cerva; Jan Silovsky; Martin Kroul
This contribution is about a system called VoiCenter that allows motor-handicapped people to control PC and standard electric and electronics devices (lights, heating, climate control, electric blinds, TV, radio, DVD, HI-FI etc.) in their homes. The PC and the devices are possible to be controlled with the help of simple voice commands. The wireless connection between PC and the devices was used in this system. This is good for fast installation of this complex system in homes. We have used our own speech recognition and control computer system that is described in this paper too.
text speech and dialogue | 2002
Jan Nouza; Petr Kolár; Josef Chaloupka
In this paper we present our initial attempt to link speech processing technology, namely continuous speech recognition, text-to-speech synthesis and artificial talking head, with text processing techniques in order to design a Czech demonstration system that allows for informal voice chatting with virtual characters. Legendary novel figure ?vejk is the first personality who can be interviewed in the recently implemented version.
international conference on speech and computer | 2017
Josef Chaloupka
In this paper, a system for digits to words conversion for almost all Slavic languages is proposed. This system was developed for improvement of text corpora which we are using for building of a lexicon or for training of language models and acoustic models in the task of Large Vocabulary Continuous Speech Recognition (LVCSR). Strings of digits, some other special characters (%, €,
2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) | 2015
Thien Chuong Nguyen; Josef Chaloupka; Jan Nouza
, ...) or abbreviations of physical units (km, m, cm, kg, l, \({}^\circ \)C, etc.) occur very often in our text corpora. It is in about 5% cases. The strings of digits or special characters are usually omitted if a lexicon is being built or if the language model is being trained. The task of digits to words conversion in non-inflected languages (e.g. English) is solved by relatively simple conversion or lookup table. The problem is more complex in inflected Slavic languages. The string of digits can be converted into several different word combinations. It depends on the context and resulting words are inflected by gender or cases. The main goal of this research was to find the rules (patterns) for conversion of string of digits into words for Slavic languages. The second goal was to unify this patterns over Slavic languages and to integrate them to the universal system for digits to words conversion.
text speech and dialogue | 2013
Thien Chuong Nguyen; Josef Chaloupka
Vietnamese is a syllable-based tonal language where the tone used in syllable pronunciation carries important information about the meaning. In this paper, we investigate several approaches how to incorporate the tone into an acoustic model. We propose 3 basic strategies: a) a phoneme-based, b) a vowel-based, and c) a rhyme-based one. Each can be modified so that we obtain 15 different schemes that are described and compared in experiments performed within the framework of large-vocabulary continuous speech recognition of Vietnamese. We show that the phoneme-based context dependent model performs best, particularly when information about the tone is linked to the syllable end. On the test set, made of 85 minutes of mostly broadcast speech, we achieve 74% syllable accuracy rate. The accuracy is further improved to 78% when the pronunciation lexicon and the language model takes into account also 40,000 most frequent syllable pairs.
international conference on image analysis and processing | 2013
Josef Chaloupka; Jan Nouza; Michaela Kucharova
This paper describes our study on solving two basic problems of large vocabulary continuous speech recognition (LVCSR) of Vietnamese, which can be used as a standard reference for Vietnamese researchers and other researchers who are interested in Vietnamese language. First, a standard phoneme set is proposed with its corresponding grapheme-to-phoneme map. This phoneme set is the core to solve other problems related to LVCSR of Vietnamese. Then the creation of standard pronouncing dictionary based on the grapheme-to-phoneme map and the analysis of Vietnamese syllable is also described. Finally, we present the results on LVCSR using different types of pronouncing dictionary, which show some interesting aspects of Vietnamese language such as the structure of Vietnamese syllable and the effect of tone in the relationship with syllable.