Petr Motlicek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Petr Motlicek is active.

Explore More

Publication

Featured researches published by Petr Motlicek.

international conference on acoustics, speech, and signal processing | 2012

Generating exact lattices in the WFST framework

Daniel Povey; Mirko Hannemann; Gilles Boulianne; Lukas Burget; Arnab Ghoshal; Milos Janda; Martin Karafiát; Stefan Kombrink; Petr Motlicek; Yanmin Qian; Korbinian Riedhammer; Karel Vesely; Ngoc Thang Vu

We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is “fully expanded”, i.e. where the arcs correspond to HMM transitions. It outputs lattices that include HMM-state-level alignments as well as word labels. The general idea is to create a state-level lattice during decoding, and to do a special form of determinization that retains only the best-scoring path for each word sequence. This special determinization algorithm is a solution to the following problem: Given a WFST A, compute a WFST B that, for each input-symbol-sequence of A, contains just the lowest-cost path through A.

international conference on acoustics, speech, and signal processing | 2014

Multilingual Deep Neural Network based Acoustic Modeling For Rapid Language Adaptation

Ngoc Thang Vu; David Imseng; Daniel Povey; Petr Motlicek; Tanja Schultz

This paper presents a study on multilingual deep neural network (DNN) based acoustic modeling and its application to new languages. We investigate the effect of phone merging on multilingual DNN in context of rapid language adaptation. Moreover, the combination of multilingual DNNs with Kullback-Leibler divergence based acoustic modeling (KL-HMM) is explored. Using ten different languages from the Globalphone database, our studies reveal that crosslingual acoustic model transfer through multilingual DNNs is superior to unsupervised RBM pre-training and greedy layer-wise supervised training. We also found that KL-HMM based decoding consistently outperforms conventional hybrid decoding, especially in low-resource scenarios. Furthermore, the experiments indicate that multilingual DNN training equally benefits from simple phoneset concatenation and manually derived universal phonesets.

international conference on acoustics, speech, and signal processing | 2007

Unsupervised Speech/Non-Speech Detection for Automatic Speech Recognition in Meeting Rooms

Hari Krishna Maganti; Petr Motlicek; Daniel Gatica-Perez

The goal of this work is to provide robust and accurate speech detection for automatic speech recognition (ASR) in meeting room settings. The solution is based on computing long-term modulation spectrum, and examining specific frequency range for dominant speech components to classify speech and non-speech signals for a given audio signal. Manually segmented speech segments, short-term energy, short-term energy and zero-crossing based segmentation techniques, and a recently proposed multi layer perceptron (MLP) classifier system are tested for comparison purposes. Speech recognition evaluations of the segmentation methods are performed on a standard database and tested in conditions where the signal-to-noise ratio (SNR) varies considerably, as in the cases of close-talking headset, lapel, distant microphone array output, and distant microphone. The results reveal that the proposed method is more reliable and less sensitive to mode of signal acquisition and unforeseen conditions.

IEEE Signal Processing Letters | 2013

A Simple Continuous Pitch Estimation Algorithm

Philip N. Garner; Milos Cernak; Petr Motlicek

Recent work in text to speech synthesis has pointed to the benefit of using a continuous pitch estimate; that is, one that records pitch even when voicing is not present. Such an approach typically requires interpolation. The purpose of this letter is to show that a continuous pitch estimation is available from a combination of otherwise well known techniques. Further, in the case of an autocorrelation based estimate, the continuous requirement negates the need for other heuristics to correct for common errors. An algorithm is suggested, illustrated, and demonstrated using a parametric vocoder.

Speech Communication | 2014

Using out-of-language data to improve an under-resourced speech recognizer

David Imseng; Petr Motlicek; Philip N. Garner

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we report how to boost the performance of an Afrikaans automatic speech recognition system by using already available Dutch data. We successfully exploit available multilingual resources through (1) posterior features, estimated by multilayer perceptrons (MLP) and (2) subspace Gaussian mixture models (SGMMs). Both the MLPs and the SGMMs can be trained on out-of-language data. We use three different acoustic modeling techniques, namely Tandem, Kullback-Leibler divergence based HMMs (KL-HMM) as well as SGMMs and show that the proposed multilingual systems yield 12% relative improvement compared to a conventional monolingual HMM/GMM system only trained on Afrikaans. We also show that KL-HMMs are extremely powerful for under-resourced languages: using only six minutes of Afrikaans data (in combination with out-of-language data), KL-HMM yields about 30% relative improvement compared to conventional maximum likelihood linear regression and maximum a posteriori based acoustic model adaptation.

Journal of the Acoustical Society of America | 2007

Multistream network feature processing for a distributed speech recognition system

Harinath Garudadri; Sunil Sivadas; Hynek Hermansky; Nelson Morgan; Chuck Wooters; André Gustavo Adami; Maria Carmen Benitez Ortuzar; Lukas Burget; Stephane N. Dupont; Frantisek Grezl; Pratibha Jain; Sachin S. Kajarekar; Petr Motlicek

A distributed voice recognition system and method for obtaining acoustic features and speech activity at multiple frequencies by extracting high frequency components thereof on a device, such as a subscriber station and transmitting them to a network server having multiple stream processing capability, including cepstral feature processing, MLP nonlinear transformation processing, and multiband temporal pattern architecture processing. The features received at the network server are processed using all three streams, wherein each of the three streams provide benefits not available in the other two, thereby enhancing feature interpretation. Feature extraction and feature interpretation may operate at multiple frequencies, including but not limited to 8 kHz, 11 kHz, and 16 kHz.

international conference on acoustics, speech, and signal processing | 2012

Improving acoustic based keyword spotting using LVCSR lattices

Petr Motlicek; Fabio Valente; Igor Szöke

This paper investigates detection of English keywords in a conversational scenario using a combination of acoustic and LVCSR based keyword spotting systems. Acoustic KWS systems search predefined words in parameterized spoken data. Corresponding confidences are represented by likelihood ratios given the keyword models and a background model. First, due to the especially high number of false-alarms, the acoustic KWS system is augmented with confidence measures estimated from corresponding LVCSR lattices. Then, various strategies to combine scores estimated by the acoustic and several LVCSR based KWS systems are explored. We show that a linear regression based combination significantly outperforms other (model-based) techniques. Due to that, the relative number of false-alarms of the combined KWS system decreased by more than 50% compared to the acoustic KWS system. Finally, an attention is also paid to the complexities of the KWS systems enabling them to potentially be exploited in real-detection tasks.

international conference on acoustics, speech, and signal processing | 2013

On the (UN)importance of the contextual factors in HMM-based speech synthesis and coding

Milos Cernak; Petr Motlicek; Philip N. Garner

This paper presents an evaluation of the contextual factors of HMM-based speech synthesis and coding systems. Two experimental setups are proposed that are based on successive context addition from phonetic to full-context. The aim was to investigate the impact of the individual contextual factors on the speech quality. In that sense important and unimportant (i.e., not having significant impact on speech quality, also called weak) contextual factors were identified. The results imply that in speech coding the improvement in quality can be achieved just with reconstruction of syllable contexts. The sentence and utterance contexts are unimportant on the decoder side, and it is not necessary to deal with them. Although in speech coding the wider context was not necessary, in speech synthesis current syllable and utterance contexts are more important over others (previous and next word/phrase contexts).

international conference on acoustics, speech, and signal processing | 2015

Learning feature mapping using deep neural network bottleneck features for distant large vocabulary speech recognition

Ivan Himawan; Petr Motlicek; David Imseng; Blaise Potard; Nam-hoon Kim; Jae-won Lee

Automatic speech recognition from distant microphones is a difficult task because recordings are affected by reverberation and background noise. First, the application of the deep neural network (DNN)/hidden Markov model (HMM) hybrid acoustic models for distant speech recognition task using AMI meeting corpus is investigated. This paper then proposes a feature transformation for removing reverberation and background noise artefacts from bottleneck features using DNN trained to learn the mapping between distant-talking speech features and close-talking speech bottleneck features. Experimental results on AMI meeting corpus reveal that the mismatch between close-talking and distant-talking conditions is largely reduced, with about 16% relative improvement over conventional bottleneck system (trained on close-talking speech). If the feature mapping is applied to close-talking speech, a minor degradation of 4% relative is observed.

ieee automatic speech recognition and understanding workshop | 2013

Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition

David Imseng; Petr Motlicek; Philip N. Garner

Posterior based acoustic modeling techniques such as Kullback-Leibler divergence based HMM (KL-HMM) and Tandem are able to exploit out-of-language data through posterior features, estimated by a Multi-Layer Perceptron (MLP). In this paper, we investigate the performance of posterior based approaches in the context of under-resourced speech recognition when a standard three-layer MLP is replaced by a deeper five-layer MLP. The deeper MLP architecture yields similar gains of about 15% (relative) for Tandem, KL-HMM as well as for a hybrid HMM/MLP system that directly uses the posterior estimates as emission probabilities. The best performing system, a bilingual KL-HMM based on a deep MLP, jointly trained on Afrikaans and Dutch data, performs 13% better than a hybrid system using the same bilingual MLP and 26% better than a subspace Gaussian mixture system only trained on Afrikaans data.

Explore More