Petra Geutner
Karlsruhe Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Petra Geutner.
international conference on acoustics, speech, and signal processing | 1997
Michael Finke; Petra Geutner; Hermann Hild; Thomas Kemp; Klaus Ries; Martin Westphal
Verbmobil, a German research project, aims at machine translation of spontaneous speech input. The ultimate goal is the development of a portable machine translator that will allow people to negotiate in their native language. Within this project the University of Karlsruhe has developed a speech recognition engine that has been evaluated on a yearly basis during the project and shows very promising speech recognition word accuracy results on large vocabulary spontaneous speech. We introduce the Janus Speech Recognition Toolkit underlying the speech recognizer. The main new contributions to the acoustic modeling part of our 1996 evaluation system-speaker normalization, channel normalization and polyphonic clustering-are discussed and evaluated. Besides the acoustic models we delineate the different language models used in our evaluation system: word trigram models interpolated with class based models and a separate spelling language model were applied. As a result of using the toolkit and integrating all these parts into the recognition engine the word error rate on the German spontaneous scheduling task (GSST) could be decreased from 30% word error rate in 1995 to 13.8% in 1996.
Proceedings of the IEEE | 2000
Alex Waibel; Petra Geutner; Laura Mayfield Tomokiyo; Tanja Schultz; Monika Woszczyna
Building modern speech and language systems currently requires large data resources such as texts, voice recordings, pronunciation lexicons, morphological decomposition information and parsing grammars. Based on a study of the most important differences between language groups, we introduce approaches to efficiently deal with the enormous task of covering even a small percentage of the worlds languages. For speech recognition, we have reduced the resource requirements by applying acoustic model combination, bootstrapping and adaption techniques. Similar algorithms have been applied to improve the recognition of foreign accents. Segmenting language into appropriate units reduces the amount of data required to robustly estimate statistical models. The underlying morphological principles are also used to automatically adapt the coverage of our speech recognition dictionaries with the Hypothesis-Driven Lexical Adaptation (HDLA) algorithm. This reduces the out-of-vocabulary problems encountered in agglutinative languages. Speech recognition results are reported for the read GlobalPhone database and some broadcast news data. For speech translation, using a task-oriented Interlingua allows to build a system with N languages with linear, rather than quadratic effort. We have introduced a modular grammar design to maximize reusability and portability. End-to-end translation results are reported on a travel-domain task in the framework of C-STAR.
international conference on acoustics, speech, and signal processing | 1995
Petra Geutner
To guarantee unrestricted natural language processing, state-of-the-art speech recognition systems require huge dictionaries that increase search space and result in performance degradations. This is especially true for languages where there do exist a large number of inflections and compound words such as German, Spanish, etc. One way to keep up decent recognition results with increasing vocabulary is the use of other base units than simply words. Different decomposition methods originally based on morphological decomposition for the German language are compared. Not only do they counteract the immense vocabulary growth with an increasing amount of training data, also the rate of out-of-vocabulary words, which worsens recognition performance significantly in German, is decreased. A smaller dictionary also leads to 30 K speed improvement during the recognition process. Moreover even if the amount of available training data is quite huge it is often not enough to guarantee robust language model estimations, whereas morphem-based models are capable to do so.
international conference on acoustics, speech, and signal processing | 2000
Kenan Çarki; Petra Geutner; Tanja Schultz
The Turkish language belongs to the Turkic family. All members of this family are close to one another in terms of linguistic structure. Typological similarities are vowel harmony, verb-final word order and agglutinative morphology. This latter property causes a very fast vocabulary growth resulting in a large number of out-of-vocabulary words. In this paper we describe our first experiments in a speaker independent LVCSR engine for Modern Standard Turkish. First results on our Turkish speech recognition system are presented. The currently best system shows very promising results achieving 16.9% word error rate. To overcome the OOV-problem we propose a morphem-based and the Hypothesis Driven Lexical Adaptation approach. The final Turkish system is integrated into the multilingual recognition engine of the GlobalPhone project.
international conference on acoustics speech and signal processing | 1998
Petra Geutner; Michael Finke; Peter Scheytt
One of the most prevailing problems of large-vocabulary speech recognition systems is the large number of out-of-vocabulary words. This is especially the case for automatically transcribing broadcast news in languages other than English, that have a large number of inflections and compound words. We introduce a set of techniques to decrease the number of out-of-vocabulary words during recognition by using linguistic knowledge about morphology and a two-pass recognition approach, where the first pass only serves to dynamically adapt the recognition dictionary to the speech segment to be recognized. A second recognition run is then carried out on the adapted vocabulary. With the proposed techniques we were able to reduce the OOV-rate by more than 40% thereby also improving the recognition results by an absolute 5.8% from a 64% word accuracy to 69.8%.
international conference on acoustics speech and signal processing | 1999
Petra Geutner; Michael Finke; Alex Waibel
Adapting the vocabulary of a speech recognizer to the utterance to be recognized has proven to be successful both in reducing high out-of-vocabulary as well as word error rates. This applies especially to languages that have a rapid vocabulary growth due to a large number of inflections and composita. This paper presents various adaptation methods within the hypothesis driven lexical adaptation (HDLA) framework which allow speech recognition on a virtually unlimited vocabulary. Selection criteria for the adaptation process are either based on morphological knowledge or distance measures at phoneme or grapheme level. Different methods are introduced for determining distances between phoneme pairs and for creating the large fallback lexicon the adapted vocabulary is chosen from. HDLA reduces the out-of-vocabulary-rate by 55% for Serbo-Croatian, 35% for German and 27% for Turkish. The reduced out-of-vocabulary rate also decreases the word error rate by an absolute 4.1% to 25.4% on Serbo-Croatian broadcast news data.
international conference on acoustics speech and signal processing | 1998
Peter Scheytt; Petra Geutner; Alex Waibel
This paper describes the development of a Serbo-Croatian dictation and broadcast news speech recognizer. The intention is to generate an automatic text transcription of a news show, which will be submitted to a multilingual informedia database. We outline the complete system development process using the JanusRTk, beginning with data collection, design and training of the parameters, tuning and evaluation. We report on general recognition techniques like segmentation, adaptation and language model interpolation, as well as language specific problems, e.g. high OOV rate due to inflected word forms. We show that even with a low amount of acoustic training data, combined with Web based interpolated language models, it is sufficient to build up a fairly reliable automatic news transcription system, which yields a performance of 36.0% word error (WE).
international conference on spoken language processing | 1996
Petra Geutner
Building robust stochastic language models is a major issue in speech recognition systems. Conventional word-based n-gram models do not capture any linguistic constraints inherent in speech. In this paper, the notion of function and content words (open/closed word classes) is used to provide linguistic knowledge that can be incorporated into language models. Function words are articles, prepositions and personal pronouns. Content words are nouns, verbs, adjectives and adverbs. Based on this class definition resulting in function and content word markers, a new language model is defined. A combination of the word-based model with this new model is introduced. The combined model shows modest improvements both in perplexity results and recognition performance.
international joint conference on artificial intelligence | 1996
Petra Geutner; Bernhard Suhm; Finn Dag Buø; Thomas Kemp; Laura Mayfield; Arthur E. McNair; Ivica Rogina; Tanja Schultz; Tilo Sloboda; Wayne H. Ward; Monika Woszczyna; Alex Waibel
Building multilingual spoken language translation systems requires knowledge about both acoustic models and language models of each language to be translated. Our multilingual translation system JANUS-2 is able to translate English and German spoken input into either English, German, Spanish, Japanese or Korean output. Getting optimal acoustic and language models as well as developing adequate dictionaries for all these languages requires a lot of hand-tuning and is time-consuming and labor intensive. In this paper we will present learning techniques that improve acoustic models by automatically adapting codebook sizes, a learning algorithm that increases and adapts phonetic dictionaries for the recognition process and also a statistically based language model with some linguistic knowledge that increases recognition performance. To ensure a robust translation system, semantic rather than syntactic analysis is done. Concept based speech translation and a connectionist parser that learns to parse into feature structures are introduced. Furthermore, different repair mechanisms to recover from recognition errors will be described.
Archive | 1997
Michael Finke; Jürgen Fritsch; Petra Geutner; Klaus Ries; Torsten Zeppenfeld; Alex Waibel