Michael Heck
Nara Institute of Science and Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Michael Heck.
Procedia Computer Science | 2016
Michael Heck; Sakriani Sakti; Satoshi Nakamura
Abstract In this work we make use of unsupervised linear discriminant analysis (LDA) to support acoustic unit discovery in a zero resource scenario. The idea is to automatically find a mapping of feature vectors into a subspace that is more suitable for Dirichlet process Gaussian mixture model (DPGMM) based clustering, without the need of supervision. Supervised acoustic modeling typically makes use of feature transformations such as LDA to minimize intra-class discriminability, to maximize inter-class discriminability and to extract relevant informations from high-dimensional features spanning larger contexts. The need of class labels makes it difficult to use this technique in a zero resource setting where the classes and even their amount are unknown. To overcome this issue we use a first iteration of DPGMM clustering on standard features to generate labels for the data, that serve as basis for learning a proper transformation. A second clustering operates on the transformed features. The application of unsupervised LDA demonstrably leads to better clustering results given the unsupervised data. We show that the improved input features consistently outperform our baseline input features.
international conference on speech and computer | 2013
Michael Heck; Christian Mohr; Sebastian Stüker; Markus Müller; Kevin Kilgour; Jonas Gehring; Quoc Bao Nguyen; Van Huy Nguyen; Alex Waibel
In this paper we investigate the automatic segmentation of recorded telephone conversations based on models for speech and non-speech to find sentence-like chunks for use in speech recognition systems. Presented are two different approaches, based on Gaussian Mixture Models GMMs and Support Vector Machines SVMs, respectively. The proposed methods provide segmentations that allow for competitive speech recognition performance in terms of word error rate WER compared to manual segmentation.
international conference on acoustics, speech, and signal processing | 2012
Michael Heck; Sebastian Stüker; Alex Waibel
In this paper we describe our work in constructing a language identification system for use in our simultaneous lecture translation system. We first built PPR and PPRLM baseline systems that produce score-fusing language cue feature vectors for language discrimination and utilize an SVM back-end classifier for the actual language identification. On our bi-lingual lecture tasks the PPRLM system clearly outperforms the PPR system in various segment length conditions, however at the cost of slower run-time. By using lexical information in the form of keyword spotting, and additional language models we show ways to improve the performance of both baseline systems. In order to combine the faster run-time of the PPR system with the better performance of the PPRLM system we finally built a hybrid of both approaches that clearly outperforms the PPR system while not adding any additional computing time. This hybrid system is therefore our choice for the use in the lecture translation system due to its faster run-time and good performance.
conference of the international speech communication association | 2016
Michael Heck; Sakriani Sakti; Satoshi Nakamura
In this work we utilize a supervised acoustic model training pipeline without supervision to improve Dirichlet process Gaussian mixture model (DPGMM) based feature vector clustering. We exploit methods common in supervised acoustic modeling to unsupervisedly learn feature transformations for application to the input data prior to clustering. The idea is to automatically find mappings of feature vectors into sub-spaces that are more robust to channel, context and speaker variability. The need of labels for these techniques makes it difficult to use them in a zero resource setting. To overcome this issue we utilize a first iteration of DPGMM clustering to generate frame based class labels for the target data. The labels serve as basis for learning an acoustic model in the form of hidden Markov models (HMMs) using linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT) and speaker adaptive training (SAT). We show that the learned transformations lead to features that consistently outperform untransformed features on the ABX sound class discriminability task. We also demonstrate that the combination of multiple clustering runs is a suitable method to further enhance sound class discriminability.
ieee automatic speech recognition and understanding workshop | 2015
Quoc Truong Do; Michael Heck; Sakriani Sakti; Graham Neubig; Tomoki Toda; Satoshi Nakamura
The Multi-Genre Broadcast challenge is an official challenge of the IEEE Automatic Speech Recognition and Understanding Workshop. This paper presents NAISTs contribution to the premiere of this challenge. The presented speech-to-text system for English makes use of various front-ends (e.g., MFCC, i-vector and FBANK), DNN acoustic models and several language models for decoding and rescoring (N-gram, RNNLM). Subsets of the training data with varying sizes were evaluated with respect to the overall training quality. Two speech segmentation systems were developed for the challenge, based on DNNs and GMM-HMMs. Recognition was performed in three stages: Decoding, lattice rescoring and system combination. This paper focuses on the system combination experiments and presents a rank-score based system weighting approach, which gave better performance compared to a normal system combination strategy. The DNN based ASR system trained on MFCC + i-vector features with the sMBR training criterion gives the best performance of 27.8% WER, and thus significantly outperforms the baseline DNN-HMM sMBR yielding 33.7% WER.
spoken language technology workshop | 2016
Michael Heck; Sakriani Sakti; Satoshi Nakamura
In this paper we propose a framework for building a full-fledged acoustic unit recognizer in a zero resource setting, i.e., without any provided labels. For that, we combine an iterative Dirichlet process Gaussian mixture model (DPGMM) clustering framework with a standard pipeline for supervised GMM-HMM acoustic model (AM) and n-gram language model (LM) training, enhanced by a scheme for iterative model re-training. We use the DPGMM to cluster feature vectors into a dynamically sized set of acoustic units. The frame based class labels serve as transcriptions of the audio data and are used as input to the AM and LM training pipeline. We show that iterative unsupervised model re-training of this DPGMM-HMM acoustic unit recognizer improves performance according to an ABX sound class discriminability task based evaluation. Our results show that the learned models generalize well and that sound class discriminability benefits from contextual information introduced by the language model. Our systems are competitive with supervisedly trained phone recognizers, and can beat the baseline set by DPGMM clustering.
Proceedings of the 2010 international workshop on Searching spontaneous conversational speech | 2010
Sebastian Stüker; Michael Heck; Katja Renner; Alex Waibel
In this paper we present our work in expanding the View4You system developed at the Interactive Systems Laboratories (ISL). The View4You system allows the user the retrieval of automatically found news clips from recorded German broadcast news by natural spoken queries. While modular in design, so far, the architecture has required the components to at least run in a common file space. By utilizing Flash technology we turned this single machine setup into a distributed set-up that gives us access to our news database over the World Wide Web. The client side of our architecture only requires a web browser with Flash extension in order to record and send the speech of the queries to the servers and in order to display the retrieved news clips. Our future work will focus on turning the monolingual German system into a multilingual system that provides cross-lingual access and retrieval in multiple languages.
IWSLT | 2012
Christian Saam; Christian Mohr; Kevin Kilgour; Michael Heck; Matthias Sperber; Keigo Kubo; Sebastian Stüker; Sakriani Sakti; Graham Neubig; Tomoki Toda; Satoshi Nakamura; Alex Waibel
IWSLT | 2012
Michael Heck; Keigo Kubo; Matthias Sperber; Sakriani Sakti; Christian Saam; Kevin Kilgour; Christian Mohr; Graham Neubig; Tomoki Toda; Satoshi Nakamura; Alex Waibel
IEICE Transactions on Information and Systems | 2018
Michael Heck; Sakriani Sakti; Satoshi Nakamura