Lukas Burget | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lukas Burget is active.

Explore More

Publication

Featured researches published by Lukas Burget.

international conference on acoustics, speech, and signal processing | 2011

Extensions of recurrent neural network language model

Tomas Mikolov; Stefan Kombrink; Lukas Burget; Jan Cernocky; Sanjeev Khudanpur

We present several modifications of the original recurrent neural network language model (RNN LM).While this model has been shown to significantly outperform many competitive language modeling techniques in terms of accuracy, the remaining problem is the computational complexity. In this work, we show approaches that lead to more than 15 times speedup for both training and testing phases. Next, we show importance of using a backpropagation through time algorithm. An empirical comparison with feedforward networks is also provided. In the end, we discuss possibilities how to reduce the amount of parameters in the model. The resulting RNN model can thus be smaller, faster both during training and testing, and more accurate than the basic one.

ieee automatic speech recognition and understanding workshop | 2011

Strategies for training large scale neural network language models

Tomas Mikolov; Anoop Deoras; Daniel Povey; Lukas Burget; Jan Cernocky

We describe how to effectively train neural network based language models on large data sets. Fast convergence during training and better overall performance is observed when the training data are sorted by their relevance. We introduce hash-based implementation of a maximum entropy model, that can be trained as a part of the neural network model. This leads to significant reduction of computational complexity. We achieved around 10% relative reduction of word error rate on English Broadcast News speech recognition task, against large 4-gram model trained on 400M tokens.

Computer Speech & Language | 2011

The subspace Gaussian mixture model-A structured model for speech recognition

Daniel Povey; Lukas Burget; Mohit Agarwal; Pinar Akyazi; Feng Kai; Arnab Ghoshal; Ondřej Glembek; Nagendra Goel; Martin Karafiát; Ariya Rastrow; Richard C. Rose; Petr Schwarz; Samuel Thomas

We describe a new approach to speech recognition, in which all Hidden Markov Model (HMM) states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state. The model is defined by vectors associated with each state with a dimension of, say, 50, together with a global mapping from this vector space to the space of parameters of the GMM. This model appears to give better results than a conventional model, and the extra structure offers many new opportunities for modeling innovations while maintaining compatibility with most standard techniques.

international conference on acoustics, speech, and signal processing | 2007

The AMI System for the Transcription of Speech in Meetings

Thomas Hain; Vincent Wan; Lukas Burget; Martin Karafiát; John Dines; Jithendra Vepa; Giulia Garau; Mike Lincoln

In this paper we describe the 2005 AMI system for the transcription of speech in meetings used in the 2005 NIST RT evaluations. The system was designed for participation in the speech to text part of the evaluations, in particular for transcription of speech recorded with multiple distant microphones and independent headset microphones. System performance was tested on both conference room and lecture style meetings. Although input sources are processed using different front-ends, the recognition process is based on a unified system architecture. The system operates in multiple passes and makes use of state of the art technologies such as discriminative training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, speaker adaptation with maximum likelihood linear regression and minimum word error rate decoding. In this paper we describe the system performance on the official development and test sets for the NIST RT05s evaluations. The system was jointly developed in less than 10 months by a multi-site team and was shown to achieve competitive performance.

international conference on acoustics, speech, and signal processing | 2009

Comparison of scoring methods used in speaker recognition with Joint Factor Analysis

Ondrej Glembek; Lukas Burget; Najim Dehak; Niko Brümmer; Patrick Kenny

The aim of this paper is to compare different log-likelihood scoring methods, that different sites used in the latest state-of-the-art Joint Factor Analysis (JFA) Speaker Recognition systems. The algorithms use various assumptions and have been derived from various approximations of the objective functions of JFA. We compare the techniques in terms of speed and performance. We show, that approximations of the true log-likelihood ratio (LLR) may lead to significant speedup without any loss in performance.

international conference on acoustics, speech, and signal processing | 2011

Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification

Lukas Burget; Oldrich Plchot; Sandro Cumani; Ondrej Glembek; Pavel Matejka; Niko Brümmer

Recently, i-vector extraction and Probabilistic Linear Discriminant Analysis (PLDA) have proven to provide state-of-the-art speaker verification performance. In this paper, the speaker verification score for a pair of i-vectors representing a trial is computed with a functional form derived from the successful PLDA generative model. In our case, however, parameters of this function are estimated based on a discriminative training criterion. We propose to use the objective function to directly address the task in speaker verification: discrimination between same-speaker and different-speaker trials. Compared with a baseline which uses a generatively trained PLDA model, discriminative training provides up to 40% relative improvement on the NIST SRE 2010 evaluation task.

international conference on acoustics, speech, and signal processing | 2011

Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification

Pavel Matejka; Ondrej Glembek; Fabio Castaldo; Md. Jahangir Alam; Oldrich Plchot; Patrick Kenny; Lukas Burget; Jan Cernocky

In this paper, we describe recent progress in i-vector based speaker verification. The use of universal background models (UBM) with full-covariance matrices is suggested and thoroughly experimentally tested. The i-vectors are scored using a simple cosine distance and advanced techniques such as Probabilistic Linear Discriminant Analysis (PLDA) and heavy-tailed variant of PLDA (PLDA-HT). Finally, we investigate into dimensionality reduction of i-vectors before entering the PLDA-HT modeling. The results are very competitive: on NIST 2010 SRE task, the results of a single full-covariance LDA-PLDA-HT system approach those of complex fused system.

international conference on acoustics, speech, and signal processing | 2011

Simplification and optimization of i-vector extraction

Ondrej Glembek; Lukas Burget; Pavel Matejka; Martin Karafiát; Patrick Kenny

This paper introduces some simplifications to the i-vector speaker recognition systems. I-vector extraction as well as training of the i-vector extractor can be an expensive task both in terms of memory and speed. Under certain assumptions, the formulas for i-vector extraction—also used in i-vector extractor training—can be simplified and lead to a faster and memory more efficient code. The first assumption is that the GMM component alignment is constant across utterances and is given by the UBM GMM weights. The second assumption is that the i-vector extractor matrix can be linearly transformed so that its per-Gaussian components are orthogonal. We use PCA and HLDA to estimate this transform.

international conference on acoustics, speech, and signal processing | 2006

Discriminative Training Techniques for Acoustic Language Identification

Lukas Burget; Pavel Matejka; Jan Cernocky

This paper presents comparison of maximum likelihood (ML) and discriminative maximum mutual information (MMI) training for acoustic modeling in language identification (LID). Both approaches are compared on state-of-the-art shifted delta-cepstra features, the results are reported on data from NIST 2003 evaluations. Clear advantage of MMI over ML training is shown. Further improvements of acoustic LID are discussed: heteroscedastic linear discriminant analysis (HLDA) for feature de-correlation and dimensionality reduction and ergodic hidden Markov models (EHMM) for better modeling of dynamics in the acoustic space. The final error rate compares favorably to other results published on NIST 2003 data

IEEE Transactions on Audio, Speech, and Language Processing | 2012

Transcribing Meetings With the AMIDA Systems

Thomas Hain; Lukas Burget; John Dines; Philip N. Garner; Frantisek Grezl; Asmaa El Hannani; Marijn Huijbregts; Martin Karafiát; Mike Lincoln; Vincent Wan

In this paper, we give an overview of the AMIDA systems for transcription of conference and lecture room meetings. The systems were developed for participation in the Rich Transcription evaluations conducted by the National Institute for Standards and Technology in the years 2007 and 2009 and can process close talking and far field microphone recordings. The paper first discusses fundamental properties of meeting data with special focus on the AMI/AMIDA corpora. This is followed by a description and analysis of improved processing and modeling, with focus on techniques specifically addressing meeting transcription issues such as multi-room recordings or domain variability. In 2007 and 2009, two different strategies of systems building were followed. While in 2007 we used our traditional style system design based on cross adaptation, the 2009 systems were constructed semi-automatically, supported by improved decoders and a new method for system representation. Overall these changes gave a 6%-13% relative reduction in word error rate compared to our 2007 results while at the same time requiring less training material and reducing the real-time factor by five times. The meeting transcription systems are available at www.webasr.org.

Explore More