Christian Fügen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Fügen is active.

Explore More

Publication

Featured researches published by Christian Fügen.

ieee automatic speech recognition and understanding workshop | 2001

A one-pass decoder based on polymorphic linguistic context assignment

Hagen Soltau; Florian Metze; Christian Fügen; Alex Waibel

In this study, we examine how fast decoding of conversational speech with large vocabularies profits from efficient use of linguistic information, i.e. language models and grammars. Based on a re-entrant single pronunciation prefix tree, we use the concept of linguistic context polymorphism to allow an early incorporation of language model information. This approach allows us to use all available language model information in a one-pass decoder, using the same engine to decode with statistical n-gram language models as well as context free grammars or re-scoring of lattices in an efficient way. We compare this approach to our previous decoder, which needed three passes to incorporate all available information. The results on a very large vocabulary task show that the search can be speeded up by almost a factor of three, without introducing additional search errors.

intelligent robots and systems | 2004

Natural human-robot interaction using speech, head pose and gestures

Rainer Stiefelhagen; Christian Fügen; R. Gieselmann; Hartwig Holzapfel; Kai Nickel; Alex Waibel

In this paper we present our ongoing work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing and visual perception of a user, which includes the recognition of pointing gestures as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. In order to demonstrate and measure the usefulness of such technologies for human-robot interaction, all components have been integrated on a mobile robot platform and have been used for real-time human-robot interaction in a kitchen scenario.

IEEE Transactions on Robotics | 2007

Enabling Multimodal Human–Robot Interaction for the Karlsruhe Humanoid Robot

Rainer Stiefelhagen; Hazim Kemal Ekenel; Christian Fügen; Petra Gieselmann; Hartwig Holzapfel; Florian Kraft; Kai Nickel; Michael Voit; Alex Waibel

In this paper, we present our work in building technologies for natural multimodal human-robot interaction. We present our systems for spontaneous speech recognition, multimodal dialogue processing, and visual perception of a user, which includes localization, tracking, and identification of the user, recognition of pointing gestures, as well as the recognition of a persons head orientation. Each of the components is described in the paper and experimental results are presented. We also present several experiments on multimodal human-robot interaction, such as interaction using speech and gestures, the automatic determination of the addressee during human-human-robot interaction, as well on interactive learning of dialogue strategies. The work and the components presented here constitute the core building blocks for audiovisual perception of humans and multimodal human-robot interaction used for the humanoid robot developed within the German research project (Sonderforschungsbereich) on humanoid cooperative robots.

Machine Translation | 2007

Simultaneous translation of lectures and speeches

Christian Fügen; Alex Waibel; Muntsin Kolss

With increasing globalization, communication across language and cultural boundaries is becoming an essential requirement of doing business, delivering education, and providing public services. Due to the considerable cost of human translation services, only a small fraction of text documents and an even smaller percentage of spoken encounters, such as international meetings and conferences, are translated, with most resorting to the use of a common language (e.g. English) or not taking place at all. Technology may provide a potentially revolutionary way out if real-time, domain-independent, simultaneous speech translation can be realized. In this paper, we present a simultaneous speech translation system based on statistical recognition and translation technology. We discuss the technology, various system improvements and propose mechanisms for user-friendly delivery of the result. Over extensive component and end-to-end system evaluations and comparisons with human translation performance, we conclude that machines can already deliver comprehensible simultaneous translation output. Moreover, while machine performance is affected by recognition errors (and thus can be improved), human performance is limited by the cognitive challenge of performing the task in real time.

ieee automatic speech recognition and understanding workshop | 2005

Speech translation enhanced automatic speech recognition

M. Paulik; Sebastian Stüker; Christian Fügen; Tanja Schultz; Thomas Schaaf; Alex Waibel

Nowadays official documents have to be made available in many languages, like for example in the EU with its 20 official languages. Therefore, the need for effective tools to aid the multitude of human translators in their work becomes easily apparent. An ASR system, enabling the human translator to speak his translation in an unrestricted manner, instead of typing it, constitutes such a tool. In this work we improve the recognition performance of such an ASR system on the target language of the human translator by taking advantage of an either written or spoken source language representation. To do so, machine translation techniques are used to translate between the different languages and then the involved ASR systems are biased towards the gained knowledge. We present an iterative approach for ASR improvement and outperform our baseline system by a relative word error rate reduction of 35.8%/29.9% in the case of a written/spoken source language representation. Further, we show how multiple target languages, as for example provided by different simultaneous translators during European Parliament debates, can be incorporated into our system design for an improvement of all involved ASR systems

IEEE Signal Processing Magazine | 2008

Spoken language translation

Alex Waibel; Christian Fügen

In this article we have reviewed state-of-the-art speech translation systems. We have discussed issues of performance as well as deployment, and we reviewed the history and technical underpinnings of this growing and challenging research area. The field provides a plethora of fascinating research challenges for scientists as well as opportunities for true impact in the society of tomorrow.

international conference on acoustics, speech, and signal processing | 2000

Integrating dynamic speech modalities into context decision trees

Christian Fügen; Ivica Rogina

Context decision trees are widely used in the speech recognition community. Besides questions about phonetic classes of a phones context, questions about their position within a word and questions about the gender of the current speaker have been used so far. In this paper we additionally incorporate questions about current modalities of the spoken utterance like the speakers dialect, the speaking rate, the signal to noise ratio, the latter two of which may change while speaking one utterance. We present a framework that treats all these modalities in a uniform way. Experiments with the Janus speech recognizer have produced error rate reductions of up to 10% when compared to systems that do not use modality questions.

international conference on acoustics, speech, and signal processing | 2006

Open Domain Speech Recognition a Translation:Lectures and Speeches

Christian Fügen; Muntsin Kolss; Dietmar Bernreuther; Matthias Paulik; Sebastian Stüker; Stephan Vogel; Alex Waibel

For years speech translation has focused on the recognition and translation of discourses in limited domains, such as hotel reservations or scheduling tasks. Only recently research projects have been started to tackle the problem of open domain speech recognition and translation of complex tasks such as lectures and speeches. In this paper we present the on-going work at our laboratory in open domain speech translation of lectures and parliamentary speeches. Starting from a translation system for European parliamentary plenary sessions and a lecture speech recognition system we show how both components perform in unison on speech translation of lectures

international conference on machine learning | 2006

The ISL RT-06S speech-to-text system

Christian Fügen; Shajith Ikbal; Florian Kraft; Kenichi Kumatani; Kornel Laskowski; John W. McDonough; Mari Ostendorf; Sebastian Stüker; Matthias Wölfel

This paper describes the 2006 lecture and conference meeting speech-to-text system developed at the Interactive Systems Laboratories (ISL), for the individual head-mounted microphone (IHM), single distant microphone (SDM), and multiple distant microphone (MDM) conditions, which was evaluated in the RT-06S Rich Transcription Meeting Evaluation sponsored by the US National Institute of Standards and Technologies (NIST). We describe the principal differences between our current system and those submitted in previous years, namely improved acoustic and language models, cross adaptation between systems with different front-ends and phoneme sets, and the use of various automatic speech segmentation algorithms.

ieee automatic speech recognition and understanding workshop | 2003

Efficient handling of multilingual language models

Christian Fügen; Sebastian Stüker; Hagen Soltau; Florian Metze; Tanja Schultz

We introduce techniques for building a multilingual speech recognizer. More specifically, we present a new language model method that allows for the combination of several monolingual into one multilingual language model. Furthermore, we extend our techniques to the concept of grammars. All linguistic knowledge sources share one common interface to the search engine. As a consequence, new language model types can be easily integrated into our Ibis decoder. Based on a multilingual acoustic model, we compare multilingual statistical n-gram language models with multilingual grammars. Results are given in terms of recognition performance as well as resource requirements. They show that: (a) n-gram LMs can be easily combined at the meta level without major loss in performance; (b) grammars are very suitable to model multilinguality; (c) language switches can be significantly reduced by using the introduced techniques; (d) the resource overhead for handling multiple languages in one language model is acceptable; (e) language identification can be done implicitly during decoding.

Explore More