Is this you? Create Your Porfile

Juraj Kacur

Slovak University of Technology in Bratislava

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Juraj Kacur is active.

Explore More

Publication

Featured researches published by Juraj Kacur.

Archive | 2008

Practical Issues of Building Robust HMM Models Using HTK and SPHINX Systems

Juraj Kacur; Gregor Rozinaj

For a couple of decades there has been a great effort spent to build and employ ASR systems in areas like information retrieval systems, dialog systems, etc., but only as the technology has evolved further other applications like dictation systems or even automatic transcription of natural speech (Nouza et al., 2005) are emerging. These advanced systems should be capable to operate on a real time base, must be speaker independent, reaching high accuracy and support dictionaries containing several hundreds of thousands of words. These strict requirements can be currently met by HMM models of tied context dependent (CD) phonemes with multiple Gaussian mixtures, which is a technique known from the 60ties (Baum & Eagon, 1967). As this statistical concept is mathematically tractable it, unfortunately, doesn’t completely reflect the physical underlying process. Therefore soon after its creation there have been lot of attempts to alleviate that. Nowadays the classical concept of HMM has evolved into areas like hybrid solutions with neural networks, utilisation of different than ML or MAP training strategies that minimize recognition errors by the means of corrective training, maximizing mutual information (Huang et. al., 1990) or by constructing large margin HMMs (Jiang & Li, 2007). Furthermore, a few methods have been designed and tested aiming to suppress the first order Markovian restriction by e.g. explicitly modelling the time duration (Levinson, 1986), splitting states into more complex structures (Bonafonte et al., 1996), using double (Casar & Fonollosa, 2007) or multilayer structures of HMM. Another vital issue is the robust and accurate feature extraction method. Again this matter is not fully solved and various popular features and techniques exist like: MFCC and CLPC coefficients, PLP features, TIFFING (Nadeu & Macho, 2001), RASTA filter (Hermasky & Morgan, 1994), etc. Even despite the huge variety of advanced solutions many of them are either not general enough or are rather impractical for the real-life employment. Thus most of the currently employed systems are based on continuous context independent (CI) or tied CD HMM models of phonemes with multiple Gaussian mixtures trained by ML or MAP criteria. As there is no analytical solution of this task, the training process must be an iterative one (Huang et al., 1990). Unfortunately, there is no guarantee of reaching local maxima, thus lot of effort is paid to the training phase in which many stages are involved. Thus there are some complex systems that allow convenient and flexible training of HMM models, where the most famous are HTK and SPHINX.

IEEE Computer | 2014

Smart AppStore: Expanding the Frontiers of Smartphone Ecosystems

Félix Gómez Mármol; Gregor Rozinaj; Sebastian Schumann; Ondrej Labaj; Juraj Kacur

Smart AppStore offers five important features for todays smartphone users: biometric authentication, multilevel authorization, gesture recognition and navigation, user-tailored reputation scores, and identity management. The Web extra at http://youtu.be/15KPOUp5H_A is a video demonstration of Smart AppStore, which offers five important features for todays smartphone users: biometric authentication, multilevel authorization, gesture recognition and navigation, user-tailored reputation scores, and identity management.

ELMAR 2007 | 2007

The training of Slovak speech recognition system based on Sphinx 4 for GSM networks

Juraj Vojtko; Juraj Kacur; Gregor Rozinaj

In the submitted paper we present the training process of HMM models that are designed to be used in ASR systems employed in GSM networks. First a brief overview regarding the current problems and applications of ASR systems is given, followed by the description of MOBILDAT-SK speech database and the SPHINX 4 and SphitixTrain capabilities. Then the process of HMM models training is presented utilizing the facility of the SphinxTrain system adjusted for the structure of MOBILDAT database and the Slovak language. The article is concluded by presenting the achieved results using the tools of the SHINX 4 by the means of 3 types of tests: application words, isolated digits, and looped digits. The WER for the looped digits and CD phoneme models is 1.8% which is roughly comparable to the performance of other systems.

Telecommunication Systems | 2013

Building accurate and robust HMM models for practical ASR systems

Juraj Kacur; Gregor Rozinaj

In this article the relevant training aspects for building robust and accurate HMM models for large vocabulary recognition system are discussed and adjusted, namely: speech features, training steps, and the tying options for context dependent (CD) phonemes. As the basis for building HMM models the well known MASPER training scheme is assumed. First the incorporation of the voicing information and its effect on the classical extraction methods like MFCC and PLP will be shown together with the derivative features, where the relative error reductions are up to 50%. Next the suggested enhancement of the standard training procedure by introducing garbled speech models will be presented and tested on real data. As it will be shown it brings more than a 5% drop in the error rate. Finally, the options for tying states of CD phonemes using decision trees and phoneme classification will be adjusted, tested, and explained.

eurasip conference focused on video image processing and multimedia communications | 2003

Speech detection in the noisy environment using wavelet transform

Juraj Kacur; Juraj Frank; Gregor Rozinaj

In this article we present speech detection systems based on Daubechie, Coiflet and Symlet wavelet transforms respectively. For each a selection of the most eligible levels of signal decomposition for the corrupted speech detection problem was made. Using those levels the distinction between noise and corrupted signal can be amplified as far as 100 times. Tests were accomplished using a set of Slovak words artificially noised to several SNR by white WSS noise.

International Journal of Signal and Imaging Systems Engineering | 2014

Topological invariants as speech features for automatic speech recognition

Juraj Kacur; Vladimir Chudy

The article presents topological invariants as speech features for speech recognition systems based on hidden Markov models. A short introduction is provided to the mathematical concept of topological invariants and space symmetries for the speech recognition problem. This involves a basic overview of the relevant auditory characteristic and its modelling in order to identify possible symmetries and invariants. Once the concept is derived, several of its modifications vital for HMM systems such as reduction of dimensions, within–class feature decorrelation and a signal plane rotation are presented and evaluated on a real system. The final system is evaluated and compared to other features using both context–dependent and context–independent models. Tests were accomplished on the professional speech database, where the achieved accuracies reached up to 97.7%, 98.7% and 98.9% for string of digits, application words and isolated digits tests, respectively.

international symposium on telecommunications | 2012

ZCPA features for speech recognition

Juraj Kacur; Mario Varga; Gregor Rozinaj

In this article we present implementation, modifications and optimization of zero-crossing peak amplitude (ZCPA) speech feature extraction method into Slovak speech recognition system. ZCPA features are closely mimicking the human auditory system in the time domain, and thus they should be more robust against common noises. Except the basic configuration several modifications have been suggested, implemented and evaluated. Furthermore, optimization of settings on a real system using professional database and MASPER training procedure have been found and compared to classical features presented by MFCC and PLP in different scenarios and noise conditions.

international conference on systems, signals and image processing | 2009

Adding Voicing Features into Speech Recognition Based on HMM in Slovak

Juraj Kacur; Gregor Rozinaj

This article discusses the impact of substituting some of the basic speech features with the voiced/ unvoiced information and possibly with the estimated pitch value. As a good measure of the signal’s voicing the average magnitude difference function was assumed, especially the ratio of its average value to its local minima found within the accepted ranges of the pitch. Furthermore, the pitch itself was used as an auxiliary feature to the base MFCC and PLP features. Experiments were performed on the professional database SPEECHDAT-SK for mobile applications working in harsh conditions, using various HMM models of context dependent and independent phonemes. All models were trained following the MASPER training scheme. In all cases the voicing feature brought improved results by more than 9% compared to the base systems. However the role of the pitch itself in the case of speaker independent ASR system evaluated over different tasks was not always so beneficial. Keywords-speech recognition; speech features; HMM; AMDF MFC; PLP, MASPER

international conference on control decision and information technologies | 2017

Enhanced hierarchical mask creation for image coding using saliency maps

Radoslav Vargic; Juraj Kacur; Filip Csoka

In this paper we analyse a basic hierarchical mask creation methods for image coding using saliency maps. For saliency maps (SM) based image coding we use specific extension of SPIHT algorithm called SM SPIHT that extends region of interest encoding to encoding with individual weight of importance for each pixel in image using the form of saliency map. This approach is proved to be effective. In this article, we compare basic hierarchical mask creation methods and provide one new method that outperforms all previous methods.

international symposium elmar | 2016

Modifications of KNN classifier for speaker identification system

Juraj Kacur

In this article modifications and adjustments of weighted K-nearest neighbor (KNN) classification method are discussed. The main focus is on KNN performance in the speaker text independent identification task, operating in real time and minimizing the enrolment time for a new user. The main concern was in design of weighting schemes for feature distance and application of different data dependent supervised and unsupervised feature transformations applied either locally or globally. All tests were accomplished on a speaker database containing 2 environments, having training and testing parts. The results were compared to a standard Gaussian Mixture Model (GMM) method. It was shown that the best results do not significantly change with the used weighting schemes if they are properly tuned for certain environment. However, Gaussian window is better in terms of fine tuning and in average accuracy. Further, KNN outperformed GMM when no universal background model (UBM) was used, and comparable results were achieved if MAP adaptation from UMB was applied. For 21 speakers the best averaged accuracy was over 94% measured on 3s intervals.

Explore More