Martin Lojka
Technical University of Košice
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Martin Lojka.
international conference on multimedia communications | 2013
Eva Kiktova; Martin Lojka; Matus Pleva; Jozef Juhár; Anton Cizmar
With the increasing use of audio sensors in surveillance or monitoring applications, the detection of acoustic event performed in a real condition has emerged as a very important research problem. This paper is focused on the comparison of different feature extraction algorithms which were used for the parametric representation of the foreground and background sounds in a noisy environment. Our aim was to automatically detect shots and sounds of breaking glass in different SNR conditions. The well known feature extraction method like Mel-frequency cepstral coefficients (MFCC) and other effective spectral features such as logarithmic Mel-filter bank coefficients (FBANK) and Mel-filter bank coefficients (MELSPEC) were extracted from an input sound. Hidden Markov model (HMM) based learning technique performs the classification of mentioned sound categories.
international conference on multimedia communications | 2012
Eva Vozarikova; Martin Lojka; Jozef Juhár; Anton Cizmar
This paper is focused on the detection of abnormal situations via sound information. As a main feature extraction algorithm, basic spectral low - level descriptors defined in MPEG-7 standard were used. Various settings for spectral descriptors such as Audio Spectrum Envelope, Audio Spectrum Flatness, Audio Spectrum Centroid and Audio Spectrum Spread were used and many experiments were done for finding the limits of using them for the purpose of acoustic event detection in urban environment. For improving the recognition rate we also applied the feature selection algorithm called Minimum Redundancy Maximum Relevance. The proposed framework of recognizing potentially dangerous acoustic events such as breaking glass and gun shots, based on the extraction of basic spectral descriptors through well known Hidden Markov Models based classification is presented here.
international conference on telecommunications | 2015
Jozef Vavrek; Peter Viszlay; Eva Kiktova; Martin Lojka; Jozef Juhár; Anton Cizmar
We introduce a novel approach to Query-by-Example (QbE) retrieval, utilizing fundamental principles of posteriorgram-based Spoken Term Detection (STD), in this paper. Proposed approach is a kind of modification of widely used seg-mental variant of dynamic programming algorithm. Our solution represents sequential variant of DTW algorithm, employing one step forward moving strategy. Each DTW search is carried out sequentially, block by block, where each block represents squared input distance matrix, with size equal to the length of retrieved query. We also examine a way how to speed up sequential DTW algorithm without considerable loss in retrieving performance, by implementing linear time-aligned accumulated distance. The increase of detection accuracy is ensured by weighted cumulative distance score parameter. Therefore, we called this approach Weighted Fast Sequential - DTW (WFS-DTW) algorithm. A novel PCA-based silence discriminator is used along with this algorithm. Evaluation of proposed algorithm is carried out on ParDat1 corpus, using Term Weighted Value (TWV).
language and technology conference | 2011
Milan Rusko; Jozef Juhár; Marián Trnka; Ján Staš; Sakhia Darjaa; Daniel Hládek; Róbert Sabo; Matus Pleva; Marian Ritomský; Martin Lojka
This paper describes the design, development and evaluation of the Slovak dictation system for the judicial domain. The speech is recorded using a close-talk microphone and the dictation system is used for on-line or off-line automatic transcription. The system provides an automatic dictation tool in Slovak for the employees of the Ministry of Justice of the Slovak Republic and all the courts in Slovakia. The system is designed for on-line dictation and off-line transcription of legal texts recorded in acoustical conditions of typical office. Details of the technical solution are given and the evaluation of different versions of the system is presented.
international conference radioelektronika | 2016
Tomáš Koctúr; Peter Viszlay; Ján Staš; Martin Lojka; Jozef Juhár
An acoustic model is a necessary component of automatic speech recognition system. Acoustic models are trained on a lot of speech recordings with transcriptions. Usually, hundreds of transcribed recordings are required. It is very time and resource consuming process to create manual transcriptions. Acoustic models may be obtained automatically with unsupervised acoustic model training, which uses online speech resources. Obtained speech data are recognized with low resourced automatic speech recognition system. Unsupervised techniques are able to filter out the erroneous hypotheses from the result and the rest use for acoustic model training. Unsupervised methods for generating speech corpora for acoustic model training are presented in this paper.
3rd International Workshop on Biometrics and Forensics (IWBF 2015) | 2015
Eva Kiktova; Martin Lojka; Matus Pleva; Jozef Juhár; Anton Cizmar
This paper describes an extension of an intelligent acoustic event detection system, which is able to recognize sounds of dangerous events such as breaking glass or gunshot sounds in urban environment from commonly used noise monitoring stations. We propose to extend the system the way that it would not only detect the gunshots, but it would identify a suspects gun/pistol type as well. Such extension could help the investigation process and the suspect identification. The proposed extension provides a new functionality of the gun type recognition (classification) based on audio recordings captured. This research topic is discussed in other research papers marginally. Different kinds of features were extracted for this challenging task and feature vectors were reduced by using mutual information based feature selection algorithms. The proposed system uses two phase selection process, HMM (Hidden Markov Model) classification and Viterbi based decoding algorithm. The presented approach reached promising results in the experiments (higher than 80% of ACC and TPR).
international conference on multimedia communications | 2014
Martin Lojka; Matus Pleva; Eva Kiktova; Jozef Juhár; Anton Čižmár
This paper introduces acoustic events detection system capable of processing continuous input audio stream in order to detect potentially dangerous acoustic events. The system is representing a light, easy extendable, log-term running and complete solution to acoustic event detection. The system is based on its own approach to detection and classification of acoustic events using modified Viterbi decoding process using in combination with Weighted Finite-State Transducers (WFSTs) to support extensibility and acoustic modeling based on Hidden Markov Models (HMMs). Thesystem is completely programmed in C++ language and was designed to be self sufficient and to not require any additional dependencies. Additionally also a signal preprocessing part for feature extraction of Mel-Frequency Cepstral Coefficient(MFCC), Frequency Bank Coefficient (FBANK) and Mel-Spectral Coefficient (MELSPEC) is included. For robustness increase the system contains Cepstral Mean Normalization (CMN) and our proposed removal of basic coefficients from feature vector.
Multimedia Tools and Applications | 2016
Martin Lojka; Matus Pleva; Eva Kiktova; Jozef Juhár; Anton ČiźMár
An efficient acoustic events detection system EAR-TUKE is presented in this paper. The system is capable of processing continuous input audio stream in order to detect potentially dangerous acoustic events, specifically gunshots or breaking glass. The system is programmed entirely in C++ language (core math. functions in C) and was designed to be self sufficient without requiring additional dependencies. In the design and development process the main focus was put on easy support of new acoustic events detection, low memory profile, low computational requirements to operate on devices with low resources, and on long-term operation and continuous input stream monitoring without any maintenance. In order to satisfy these requirements on the system, EAR-TUKE is based on a custom approach to detection and classification of acoustic events. The system is using acoustic models of events based on Hidden Markov Models (HMMs) and a modified Viterbi decoding process with an additional module to allow continuous monitoring. These features in combination with Weighted Finite-State Transducers (WFSTs) for the search network representation fulfill the easy extensibility requirement. Extraction algorithms for Mel-Frequency Cepstral Coefficients (MFCC), Frequency Bank Coefficients (FBANK) and Mel-Spectral Coefficients (MELSPEC) are also included in the preprocessing part. The system contains Cepstral Mean Normalization (CMN) and our proposed removal of basic coefficients from feature vectors to increase robustness. This paper also presents the development process and results evaluating the final design of the system.
international symposium elmar | 2014
Eva Kiktova; Martin Lojka; Jozef Juhár; Anton Cizmar
This paper brings the comparison of mutual information based selection algorithms for the acoustic event detection system (EAR TUKE). High dimensional feature vectors were reduced according to the different selection criteria. Proposed features were used to train Hidden Markov Models (HMM), which were evaluated by the Viterbi based decoding algorithm. The comparison of applied selection criteria, their corresponding performances and the identification of convenient features were demonstrated via representative experimental results.
network-based information systems | 2018
Martin Lojka; Peter Viszlay; Ján Staš; Daniel Hládek; Jozef Juhár
We have developed a working prototype of automatic subtitling system for transcription, archiving, and indexing of Slovak audiovisual recordings, such as lectures, talks, discussions or broadcast news. To go further in the development and research, we had to incorporate more and more modern speech technologies and embrace nowadays deep learning techniques. This paper describes transition and changes made to our working prototype regarding speech recognition core replacement, architecture changes and new web-based user interface. We have used the state-of-the art speech toolkit KALDI and distributed architecture to achieve better responsivity of the interface and faster processing of the audiovisual recordings. Using acoustic models based on time delay deep neural networks we have been able to lower the system’s average word error rate from previously reported 24% to 15%, absolutely.