Ondrej Glembek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Ondrej Glembek is active.

Explore More

Publication

Featured researches published by Ondrej Glembek.

international conference on acoustics, speech, and signal processing | 2009

Comparison of scoring methods used in speaker recognition with Joint Factor Analysis

Ondrej Glembek; Lukas Burget; Najim Dehak; Niko Brümmer; Patrick Kenny

The aim of this paper is to compare different log-likelihood scoring methods, that different sites used in the latest state-of-the-art Joint Factor Analysis (JFA) Speaker Recognition systems. The algorithms use various assumptions and have been derived from various approximations of the objective functions of JFA. We compare the techniques in terms of speed and performance. We show, that approximations of the true log-likelihood ratio (LLR) may lead to significant speedup without any loss in performance.

international conference on acoustics, speech, and signal processing | 2011

Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification

Lukas Burget; Oldrich Plchot; Sandro Cumani; Ondrej Glembek; Pavel Matejka; Niko Brümmer

Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however, parameters of this function are estimated based on a discriminative training criterion. We propose to use the objective function to directly address the task in speaker verification: discrimination between same-speaker and different-speaker trials. Compared with a baseline which uses a generatively trained PLDA model, discriminative training provides up to 40% relative improvement on the NIST SRE 2010 evaluation task.

international conference on acoustics, speech, and signal processing | 2011

Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification

Pavel Matejka; Ondrej Glembek; Fabio Castaldo; Md. Jahangir Alam; Oldrich Plchot; Patrick Kenny; Lukas Burget; Jan Cernocky

In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. The i-vectors are scored using a simple cosine distance and advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) and heavy-tailed variant of PLDA (PLDA-HT). Finally, we investigate into dimensionality reduction of i-vectors before entering the PLDA-HT modeling. The results are very competitive: on NIST 2010 SRE task, the results of a single full-covariance LDA-PLDA-HT system approach those of complex fused system.

international conference on acoustics, speech, and signal processing | 2011

Simplification and optimization of i-vector extraction

Ondrej Glembek; Lukas Burget; Pavel Matejka; Martin Karafiát; Patrick Kenny

This paper introduces some simplifications to the i-vector speaker recognition systems. I-vector extraction as well as training of the i-vector extractor can be an expensive task both in terms of memory and speed. Under certain assumptions, the formulas for i-vector extraction—also used in i-vector extractor training—can be simplified and lead to a faster and memory more efficient code. The first assumption is that the GMM component alignment is constant across utterances and is given by the UBM GMM weights. The second assumption is that the i-vector extractor matrix can be linearly transformed so that its per-Gaussian components are orthogonal. We use PCA and HLDA to estimate this transform.

international conference on acoustics, speech, and signal processing | 2009

Support vector machines and Joint Factor Analysis for speaker verification

Najim Dehak; Patrick Kenny; Réda Dehak; Ondrej Glembek; Pierre Dumouchel; Lukas Burget; Valiantsina Hubeika; Fabio Castaldo

This article presents several techniques to combine between Support vector machines (SVM) and Joint Factor Analysis (JFA) model for speaker verification. In this combination, the SVMs are applied to different sources of information produced by the JFA. These informations are the Gaussian Mixture Model supervectors and speakers and Common factors. We found that using SVM in JFA factors gave the best results especially when within class covariance normalization method is applied in order to compensate for the channel effect. The new combination results are comparable to other classical JFA scoring techniques.

ieee automatic speech recognition and understanding workshop | 2011

iVector-based discriminative adaptation for automatic speech recognition

Martin Karafiát; Lukas Burget; Pavel Matejka; Ondrej Glembek; Jan Cernocky

We presented a novel technique for discriminative feature-level adaptation of automatic speech recognition system. The concept of iVectors popular in Speaker Recognition is used to extract information about speaker or acoustic environment from speech segment. iVector is a low-dimensional fixed-length representing such information. To utilized iVectors for adaptation, Region Dependent Linear Transforms (RDLT) are discriminatively trained using MPE criterion on large amount of annotated data to extract the relevant information from iVectors and to compensate speech feature. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well tuned RDLT system with standard CMLLR adaptation we reached 0.8% additive absolute WER improvement.

international conference on acoustics, speech, and signal processing | 2009

Neural network based language models for highly inflective languages

Tomas Mikolov; Jiri Kopecky; Lukas Burget; Ondrej Glembek; Jan Cernocky

Speech recognition of inflectional and morphologically rich languages like Czech is currently quite a challenging task, because simple n-gram techniques are unable to capture important regularities in the data. Several possible solutions were proposed, namely class based models, factored models, decision trees and neural networks. This paper describes improvements obtained in recognition of spoken Czech lectures using language models based on neural networks. Relative reductions in word error rate are more than 15% over baseline obtained with adapted 4-gram backoff language model using modified Kneser-Ney smoothing.

international conference on acoustics, speech, and signal processing | 2016

Analysis of DNN approaches to speaker identification

Pavel Matejka; Ondrej Glembek; Ondrej Novotny; Oldrich Plchot; Frantisek Grezl; Lukas Burget; Jan Cernocky

This work studies the usage of the Deep Neural Network (DNN) Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition. We decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization and we analyze the usage of BN and MFCC features (and their concatenation) in the two stages. We also show the effect of using full-covariance GMM models, and, as a contrast, we compare the result to the recent DNN-alignment approach. On the NIST SRE2010, telephone condition, we show 60% relative gain over the traditional MFCC baseline for EER (and similar for the NIST DCF metrics), resulting in 0.94% EER.

international conference on acoustics, speech, and signal processing | 2013

Developing a speaker identification system for the DARPA RATS project

Oldrich Plchot; Spyros Matsoukas; Pavel Matejka; Najim Dehak; Jeff Z. Ma; Sandro Cumani; Ondrej Glembek; Hynek Hermansky; Sri Harish Reddy Mallidi; Nima Mesgarani; Richard M. Schwartz; Mehdi Soufifar; Zheng-Hua Tan; Samuel Thomas; Bing Zhang; Xinhui Zhou

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity detection (VAD) and feature extraction. We show that (a) unsupervised VAD performs as well supervised methods in terms of downstream SID performance, (b) noise-robust feature extraction methods such as CFCCs out-perform MFCC front-ends on noisy audio, and (c) fusion of multiple systems provides 24% relative improvement in EER compared to the single best system when using a novel SVM-based fusion algorithm that uses side information such as gender, language, and channel id.

international conference on acoustics, speech, and signal processing | 2007

STBU System for the NIST 2006 Speaker Recognition Evaluation

R. Matejka; Lukas Burget; Petr Schwarz; Ondrej Glembek; Martin Karafiát; Frantisek Grezl; Jan Cernocky; D.A. van Leeuwen; Niko Brümmer; A. Strasheim

This paper describes STBU 2006 speaker recognition system, which performed well in the NIST 2006 speaker recognition evaluation. STBU is consortium of 4 partners: Spescom DataVoice (South Africa), TNO (Netherlands), BUT (Czech Republic) and University of Stellenbosch (South Africa). The primary system is a combination of three main kinds of systems: (1) GMM, with short-time MFCC or PLP features, (2) GMM-SVM, using GMM mean supervectors as input and (3) MLLR-SVM, using MLLR speaker adaptation coefficients derived from English LVCSR system. In this paper, we describe these sub-systems and present results for each system alone and in combination on the NIST Speaker Recognition Evaluation (SRE) 2006 development and evaluation data sets.

Explore More