Jan Vanek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jan Vanek is active.

Explore More

Publication

Featured researches published by Jan Vanek.

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Optimized Acoustic Likelihoods Computation for NVIDIA and ATI/AMD Graphics Processors

Jan Vanek; Jan Trmal; Josef Psutka

In this paper, we describe an optimized version of a Gaussian-mixture-based acoustic model likelihood evaluation algorithm for graphical processing units (GPUs). The evaluation of these likelihoods is one of the most computationally intensive parts of automatic speech recognizers, but it can be parallelized and offloaded to GPU devices. Our approach offers a significant speed-up over the recently published approaches, because it utilizes the GPU architecture in a more effective manner. All the recent implementations have been intended only for NVIDIA graphics processors, programmed either in CUDA or OpenCL GPU programming frameworks. We present results for both CUDA and OpenCL. Further, we have developed an OpenCL implementation optimized for ATI/AMD GPUs. Results suggest that even very large acoustic models can be used in real-time speech recognition engines on computers equipped with a low-end GPU or laptops. In addition, the completely asynchronous GPU management provides additional CPU resources for the decoder part of the LVCSR. The optimized implementation enables us to apply fusion techniques together with evaluating many (10 or even more) speaker-specific acoustic models. We apply this technique to a real-time parliamentary speech recognition system where the speaker changes frequently.

parallel and distributed computing: applications and technologies | 2011

Fast Estimation of Gaussian Mixture Model Parameters on GPU Using CUDA

Lukáš Machlica; Jan Vanek; Zbynek Zajic

Gaussian Mixture Models (GMMs) are widely used among scientists e.g. in statistics toolkits and data mining procedures. In order to estimate parameters of a GMM the Maximum Likelihood (ML) training is often utilized, more precisely the Expectation-Maximization (EM) algorithm. Nowadays, a lot of tasks works with huge datasets, what makes the estimation process time consuming (mainly for complex mixture models containing hundreds of components). The paper presents an efficient and robust implementation of the estimation of GMM statistics used in the EM algorithm on GPU using NVIDIAs Compute Unified Device Architecture (CUDA). Also an augmentation of the standard CPU version is proposed utilizing SSE instructions. Time consumptions of presented methods are tested on a large dataset of real speech data from the NIST Speaker Recognition Evaluation (SRE) 2008. Estimation on GPU proves to be more than 400 times faster than the standard CPU version and 130 times faster than the SSE version, thus a huge speed up was achieved without any approximations made in the estimation formulas. Proposed implementation was also compared to other implementations developed by other departments over the world and proved to be the fastest (at least 5 times faster than the best implementation published recently).

international symposium on signal processing and information technology | 2012

Full covariance Gaussian mixture models evaluation on GPU

Jan Vanek; Jan Trmal; Josef Psutka

Gaussian mixture models (GMMs) are often used in various data processing and classification tasks to model a continuous probability density in a multi-dimensional space. In cases, where the dimension of the feature space is relatively high (e.g. in the automatic speech recognition (ASR)), GMM with a higher number of Gaussians with diagonal covariances (DC) instead of full covariances (FC) is used from the two reasons. The first reason is a problem how to estimate robust FC matrices with a limited training data set. The second reason is a much higher computational cost during the GMM evaluation. The first reason was addressed in many recent publications. In contrast, this paper describes an efficient implementation on Graphic Processing Unit (GPU) of the FC-GMM evaluation, which addresses the second reason. The performance was tested on acoustic models for ASR, and it is shown that even a low-end laptop GPU is capable to evaluate a large acoustic model in a fraction of the real speech time. Three variants of the algorithm were implemented and compared on various GPUs: NVIDIA CUDA, NVIDIA OpenCL, and ATI/AMD OpenCL.

international conference on signal processing | 2014

An open-source GPU-accelerated feature extraction tool

Josef Michalek; Jan Vanek

An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the elapsed time. The saved time can be invested elsewhere and thus it can speed-up research. The implementation was developed in CUDA which supports NVidia GPUs only. So, we added an Open-CL implementation to support any current GPU. The project is an open-source package, thus research community can modify or adapt the implementation to their needs.

international conference on signal processing | 2014

Sports video classification in continuous TV broadcasts

Pavel Campr; Milan Herbig; Jan Vanek; Josef Psutka

This paper is focused on classification of video footages or continuous TV broadcasts by its content. The considered classification categories (topics) are either general (talk show, sport, movie, cartoon...) or more specific (summer and winter Olympic sports, e.g. cycling, tennis, archery, box...). At first, each frame of the video is classified separately. It is shown that the classification results are more accurate and robust when the per-frame results are filtered in time domain. The main part of the paper deals with selection of robust image features and classifiers. It is shown that simple feature extractors are surpassed by complex features based on convolutional neural networks.

conference of the international speech communication association | 2011