Is this you? Create Your Porfile

Kevin Kilgour

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Kevin Kilgour is active.

Explore More

Publication

Featured researches published by Kevin Kilgour.

ieee automatic speech recognition and understanding workshop | 2013

Models of tone for tonal and non-tonal languages

Florian Metze; Zaid A. W. Sheikh; Alex Waibel; Jonas Gehring; Kevin Kilgour; Quoc Bao Nguyen; Van Huy Nguyen

Conventional wisdom in automatic speech recognition asserts that pitch information is not helpful in building speech recognizers for non-tonal languages and contributes only modestly to performance in speech recognizers for tonal languages. To maintain consistency between different systems, pitch is therefore often ignored, trading the slight performance benefits for greater system uniformity/ simplicity. In this paper, we report results that challenge this conventional approach. We present new models of tone that deliver consistent performance improvements for tonal languages (Cantonese, Vietnamese) and even modest improvements for non-tonal languages. Using neural networks for feature integration and fusion, these models achieve significant gains throughout, and provide us with system uniformity and standardization across all languages, tonal and non-tonal.

ieee international conference on high performance computing data and analytics | 2011

Quaero Speech-to-Text and Text Translation Evaluation Systems

Sebastian Stüker; Kevin Kilgour; Jan Niehues

Our laboratory has used the HP XC4000, the high performance computer of the federal state Baden-Wnrttemberg, in order to participate in the second Quaero evaluation for automatic speech recognition (ASR) and Machine Translation (MT). State-of-the-art automatic speech recognition and machine translation systems train use stochastic models which are trained on large amounts of training data using techniques from the field of machine learning. Using these techniques the systems search for the most likely speech recognition hypothesis, translation hypothesis respectively.

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF) | 2013

Optimizing deep bottleneck feature extraction

Quoc Bao Nguyen; Jonas Gehring; Kevin Kilgour; Alex Waibel

We investigate several optimizations to a recently published architecture for extracting bottleneck features for large-vocabulary speech recognition with deep neural networks. We are able to improve recognition performance of first-pass systems from a 12% relative word error rate reduction reported previously to 21%, compared to MFCC baselines on a Tagalog conversational telephone speech corpus. This is achieved by using different input features, training the network to predict context-dependent targets, employing an efficient learning rate schedule and varying several architectural details. Evaluations on two larger German and French speech transcription tasks show that the optimizations proposed are universally applicable and yield comparable gains on other corpora (19.9% and 22.8%, respectively).

ieee international conference on high performance computing data and analytics | 2012

Quaero 2010 Speech-to-Text Evaluation Systems

Sebastian Stüker; Kevin Kilgour; Florian Kraft

Quaero is a French program with German participation, within which KIT is also working on the problem of Automatic Speech Recognition for audio data from various sources from the World Wide Web. In this paper we describe the development of our English and German speech recognition systems for the 2010 Quaero evaluation for which, at least in part, we have utilized the XC4000 HPC cluster at KIT. Both recognition systems were trained with the help of the Janus Recognition Toolkit developed at the Interactive Systems Laboratory, and both are expansions of the 2009 evaluation systems. Both systems use various front-ends, state-of-the art acoustic models that include discriminative training, and very large language models which require the use of shared memory. Both systems also make use of domain specific acoustic and language model training material which became available for the 2010 evaluation. In total the expansion of the system and the addition of domain-dependent training material let to significant improved performance over the 2009 systems.

international conference on speech and computer | 2013

Segmentation of Telephone Speech Based on Speech and Non-speech Models

Michael Heck; Christian Mohr; Sebastian Stüker; Markus Müller; Kevin Kilgour; Jonas Gehring; Quoc Bao Nguyen; Van Huy Nguyen; Alex Waibel

In this paper we investigate the automatic segmentation of recorded telephone conversations based on models for speech and non-speech to find sentence-like chunks for use in speech recognition systems. Presented are two different approaches, based on Gaussian Mixture Models GMMs and Support Vector Machines SVMs, respectively. The proposed methods provide segmentations that allow for competitive speech recognition performance in terms of word error rate WER compared to manual segmentation.

conference of the international speech communication association | 2016

Dynamic Transcription for Low-Latency Speech Translation.

Jan Niehues; Thai Son Nguyen; Eunah Cho; Thanh-Le Ha; Kevin Kilgour; Markus Müller; Matthias Sperber; Sebastian Stüker; Alex Waibel

Latency is one of the main challenges in the task of simultaneous spoken language translation. While significant improvements in recent years have led to high quality automatic translations, their usefulness in real-time settings is still severely limited due to the large delay between the input speech and the delivered translation. In this paper, we present a novel scheme which reduces the latency of a large scale speech translation system drastically. Within this scheme, the transcribed text and its translation can be updated when more context is available, even after they are presented to the user. Thereby, this scheme allows us to display an initial transcript and its translation to the user with a very low latency. If necessary, both transcript and translation can later be updated to better, more accurate versions until eventually the final versions are displayed. Using this framework, we are able to reduce the latency of the source language transcript into half. For the translation, an average delay of 3.3s was achieved, which is more than twice as fast as our initial system.

international conference on acoustics, speech, and signal processing | 2013

Warped Minimum Variance Distortionless Response based bottle neck features for LVCSR

Kevin Kilgour; Igor Tseyzer; Quoc Bao Nguyen; Alex Waibel

This paper presents the results of our experiments on bottleneck feature applied to a wMVDR (Warped Minimum Variance Distortionless Response) frontend. We examine how to best optimize wMVDR-BNF features and wMVDR combined with MFCC bottleneck features (wMVDR+MFCC-BNF). Our wMVDR+MFCC-BNF frontend improves a single pass system from 18.7% (20.7%) to 18.1% compared to a MFCC-BNF (MFCC) system tested on the Quaero 2010 German evaluation set. When used in a system combination our wMVDR-BNF and wMVDR+MFCC-BNF systems reduced the overall WER from 14.3% to 13.3% on the IWSLT 2010 test set while at the same time reducing the number of systems needed from 9 to 5. Our result of 11.9% on the 2012 IWSLT testset is better than the best result submitted during the evaluation campaign.

international conference on speech and computer | 2014

A Neural Network Keyword Search System for Telephone Speech

Kevin Kilgour; Alex Waibel

In this paper we propose a pure “neural network” (NN) based keyword search system developed in the IARPA Babel program for conversational telephone speech. Using a common keyword search evaluation metric, “actual term weighted value” (ATWV), we demonstrate that our NN-keyword search system can achieve a performance similar to a comparible but more complex and slower “hybrid deep neural network - hidden markov model” (DNN-HMM Hybrid) based speech recognition system without using either an HMM decoder or a language model.

ieee-ras international conference on humanoid robots | 2010

Towards social integration of humanoid robots by conversational concept learning

Florian Kraft; Kevin Kilgour; Rainer Saam; Sebastian Stüker; Matthias Wölfel; Tamim Asfour; Alex Waibel

Several real world applications of humanoids in general will require continuous service over a long time period. A humanoid robot operating in different environments over a long period of time means that A) there will be a lot of variation in the speech it has to ground semantically and B) it has to know when a conversation is of interest in order to respond.

international conference on human-computer interaction | 2015

Using Neural Networks for Data-Driven Backchannel Prediction: A Survey on Input Features and Training Techniques

Markus Müller; David Leuschner; Lars Briem; Maria Schmidt; Kevin Kilgour; Sebastian Stüker; Alex Waibel

In order to make human computer interaction more social, the use of supporting backchannel cues can be beneficial. Such cues can be delivered in different channels like vision, speech or gestures. In this work, we focus on the prediction of acoustic backchannels in terms of speech. Previously, this prediction has been accomplished by using rule-based approaches. But like every rule-based implementation, it is dependent on a fixed set of handwritten rules which have to be changed every time the mechanism is adjusted or different data is used. In this paper we want to overcome these limitations by making use of recent advancements in the field of machine learning. We show that backchannel predictions can be generated by means of a neural network based approach. Such a method has the advantage of depending only on the training data, without the need of handwritten rules.

Explore More