Florian Kraft
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Florian Kraft.
IEEE Transactions on Robotics | 2007
Rainer Stiefelhagen; Hazim Kemal Ekenel; Christian Fügen; Petra Gieselmann; Hartwig Holzapfel; Florian Kraft; Kai Nickel; Michael Voit; Alex Waibel
In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.
international conference on machine learning | 2006
Christian Fügen; Shajith Ikbal; Florian Kraft; Kenichi Kumatani; Kornel Laskowski; John W. McDonough; Mari Ostendorf; Sebastian Stüker; Matthias Wölfel
This paper describes the 2006 lecture and conference meeting speech-to-text system developed at the Interactive Systems Laboratories (ISL), for the individual head-mounted microphone (IHM), single distant microphone (SDM), and multiple distant microphone (MDM) conditions, which was evaluated in the RT-06S Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely improved acoustic and language models, cross adaptation between systems with different front-ends and phoneme sets, and the use of various automatic speech segmentation algorithms.
ieee international conference on high performance computing data and analytics | 2012
Sebastian Stüker; Kevin Kilgour; Florian Kraft
Quaero is a French program with German participation, within which KIT is also working on the problem of Automatic Speech Recognition for audio data from various sources from the World Wide Web. In this paper we describe the development of our English and German speech recognition systems for the 2010 Quaero evaluation for which, at least in part, we have utilized the XC4000 HPC cluster at KIT. Both recognition systems were trained with the help of the Janus Recognition Toolkit developed at the Interactive Systems Laboratory, and both are expansions of the 2009 evaluation systems. Both systems use various front-ends, state-of-the art acoustic models that include discriminative training, and very large language models which require the use of shared memory. Both systems also make use of domain specific acoustic and language model training material which became available for the 2010 evaluation. In total the expansion of the system and the addition of domain-dependent training material let to significant improved performance over the 2009 systems.
international conference on acoustics, speech, and signal processing | 2007
Daniel Gärtner; Florian Kraft; Thomas Schaaf
Nowadays, a large part of all music ever recorded is digitally available and due to MP3 already ten thousands of songs can be carried around on a mobile device. Intelligent automatic song selection is more and more required alternatively to random selection or manual playlist generation. We propose a system, that generates playlists including songs similar to accepted ones, discarding songs similar to rejected ones, where similar refers to timbre. Additional adaptivity is achieved with a user-adaptive distance function which in our case requires modeling features separately. After a seed-song (which is the first accepted song) is given by the user, the distance function is used by a song selection strategy to select songs. Minimal user feedback is collected with a skip button that is pressed to directly jump to the next song and explicitly reject the current one while acceptance is implicitly given by listening to a song.
spoken language technology workshop | 2008
Matthias Wölfel; Muntsin Kolss; Florian Kraft; Jan Niehues; Matthias Paulik; Alex Waibel
An increasingly globalized world fosters the exchange of students, researchers or employees. As a result, situations in which people of different native tongues are listening to the same lecture become more and more frequent. In many such situations, human interpreters are prohibitively expensive or simply not available. For this reason, and because first prototypes have already demonstrated the feasibility of such systems, automatic translation of lectures receives increasing attention. A large vocabulary and strong variations in speaking style make lecture translation a challenging, however not hopeless, task. The scope of this paper is to investigate a variety of challenges and to highlight possible solutions in building a system for simultaneous translation of lectures from German to English. While some of the investigated challenges are more general, e.g. environment robustness, other challenges are more specific for this particular task, e.g. pronunciation of foreign words or sentence segmentation. We also report our progress in building an end-to-end system and analyze its performance in terms of objective and subjective measures.
Lecture Notes in Computer Science | 2008
Matthias Wölfel; Sebastian Stüker; Florian Kraft
This paper describes the 2007 meeting speech-to-text system for lecture roomsdeveloped at the Interactive Systems Laboratories (ISL), for the multiple distant microphone condition, which has been evaluated in the RT-07 Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely the use of a signal adaptive front-end (realized by warped-twice warped minimum variance distortionless response spectral estimation), improved acoustic (including maximum mutual information estimation) and language models, cross adaptation between systems which differ in the front-end as well as the phoneme set, the use of a discriminative criteria instead of the signal-to-noise ratio for the selection of the channel to be used and the use of decoder based speech segmentation.
ieee-ras international conference on humanoid robots | 2010
Florian Kraft; Kevin Kilgour; Rainer Saam; Sebastian Stüker; Matthias Wölfel; Tamim Asfour; Alex Waibel
Several real world applications of humanoids in general will require continuous service over a long time period. A humanoid robot operating in different environments over a long period of time means that A) there will be a lot of variation in the speech it has to ground semantically and B) it has to know when a conversation is of interest in order to respond.
intelligent robots and systems | 2007
Florian Kraft; Matthias Wölfel
Automatic speech recognition on a humanoid robot is exposed to numerous known noises produced by the robots own motion system and background noises such as fans. Those noises interfere with target speech by an unknown transfer function at high distortion levels, since some noise sources might be closer to the robots microphones than the target speech sources. In this paper we show how to remedy those distortions by a speech feature enhancement technique based on the recently proposed particle filters. A significant increase of recognition accuracy could be reached at different distances for both engine and background noises.
international conference on spoken language processing | 2006
Szu-Chen Stan Jou; Tanja Schultz; Matthias Walliczek; Florian Kraft; Alex Waibel
conference of the international speech communication association | 2005
Florian Kraft; Robert G. Malkin; Thomas Schaaf; Alex Waibel