Is this you? Create Your Porfile

Tim Schlippe

Karlsruhe Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tim Schlippe is active.

Explore More

Publication

Featured researches published by Tim Schlippe.

international conference on acoustics, speech, and signal processing | 2012

A first speech recognition system for Mandarin-English code-switch conversational speech

Ngoc Thang Vu; Dau-Cheng Lyu; Jochen Weiner; Dominic Telaar; Tim Schlippe; Fabian Blaicher; Eng Siong Chng; Tanja Schultz; Haizhou Li

This paper presents first steps toward a large vocabulary continuous speech recognition system (LVCSR) for conversational Mandarin-English code-switching (CS) speech. We applied state-of-the-art techniques such as speaker adaptive and discriminative training to build the first baseline system on the SEAME corpus [1] (South East Asia Mandarin-English). For acoustic modeling, we applied different phone merging approaches based on the International Phonetic Alphabet (IPA) and Bhattacharyya distance in combination with discriminative training to improve accuracy. On language model level, we investigated statistical machine translation (SMT) - based text generation approaches for building code-switching language models. Furthermore, we integrated the provided information from a language identification system (LID) into the decoding process by using a multi-stream approach. Our best 2-pass system achieves a Mixed Error Rate (MER) of 36.6% on the SEAME development set.

international conference on acoustics, speech, and signal processing | 2013

GlobalPhone: A multilingual text & speech database in 20 languages

Tanja Schultz; Ngoc Thang Vu; Tim Schlippe

This paper describes the advances in the multilingual text and speech database GlobalPhone, a multilingual database of high-quality read speech with corresponding transcriptions and pronunciation dictionaries in 20 languages. GlobalPhone was designed to be uniform across languages with respect to the amount of data, speech quality, the collection scenario, the transcription and phone set conventions. With more than 400 hours of transcribed audio data from more than 2000 native speakers GlobalPhone supplies an excellent basis for research in the areas of multilingual speech recognition, rapid deployment of speech processing systems to yet unsupported languages, language identification tasks, speaker recognition in multiple languages, multilingual speech synthesis, as well as monolingual speech recognition in a large variety of languages.

international conference on acoustics, speech, and signal processing | 2012

Grapheme-to-phoneme model generation for Indo-European languages

Tim Schlippe; Sebastian Ochs; Tanja Schultz

In this paper, we evaluate grapheme-to-phoneme (g2p) models among languages and of different quality. We created g2p models for Indo-European languages with word-pronunciation pairs from the GlobalPhone project and from Wiktionary [1]. Then we checked their quality in terms of consistency and complexity as well as their impact on Czech, English, French, Spanish, Polish, and German ASR. While the GlobalPhone dictionaries were manually cross-checked and have been used successfully in LVCSR, Wiktionary pronunciations have been provided by the Internet community and can be used to rapidely and economically create pronunciation dictionaries for new languages and domains.

international conference on acoustics, speech, and signal processing | 2013

Recurrent neural network language modeling for code switching conversational speech

Heike Adel; Ngoc Thang Vu; Franziska Kraus; Tim Schlippe; Haizhou Li; Tanja Schultz

Code-switching is a very common phenomenon in multilingual communities. In this paper, we investigate language modeling for conversational Mandarin-English code-switching (CS) speech recognition. First, we investigate the prediction of code switches based on textual features with focus on Part-of-Speech (POS) tags and trigger words. Second, we propose a structure of recurrent neural networks to predict code-switches. We extend the networks by adding POS information to the input layer and by factorizing the output layer into languages. The resulting models are applied to our task of code-switching language modeling. The final performance shows 10.8% relative improvement in perplexity on the SEAME development set which transforms into a 2% relative improvement in terms of Mixed Error Rate and a relative improvement of 16.9% in perplexity on the evaluation set which leads to a 2.7% relative improvement of MER.

Speech Communication | 2014

Web-based tools and methods for rapid pronunciation dictionary creation

Tim Schlippe; Sebastian Ochs; Tanja Schultz

In this paper we study the potential as well as the challenges of using the World Wide Web as a seed for the rapid generation of pronunciation dictionaries in new languages. In particular, we describe Wiktionary, a community-driven resource of pronunciations in IPA notation, which is available in many different languages. First, we analyze Wiktionary in terms of language and vocabulary coverage and compare it in terms of quality and coverage with another source of pronunciation dictionaries in multiple languages (GlobalPhone). Second, we investigate the performance of statistical grapheme-to-phoneme models in ten different languages and measure the model performance for these languages over the amount of training data. The results show that for the studied languages about 15k phone tokens are sufficient to train stable grapheme-to-phoneme models. Third, we create grapheme-to-phoneme models for ten languages using both the GlobalPhone and the Wiktionary resources. The resulting pronunciation dictionaries are carefully evaluated along several quality checks, i.e. in terms of consistency, complexity, model confidence, grapheme n-gram coverage, and phoneme perplexity. Fourth, as a crucial prerequisite for a fully automated process of dictionary generation, we implement and evaluate methods to automatically remove flawed and inconsistent pronunciations from dictionaries. Last but not least, speech recognition experiments in six languages evaluate the usefulness of the dictionaries in terms of word error rates. Our results indicate that the web resources of Wiktionary can be successfully leveraged to fully automatically create pronunciation dictionaries in new languages.

spoken language technology workshop | 2012

Word segmentation through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

We present our new alignment model Model 3P for cross-lingual word-to-phoneme alignment, and show that unsupervised learning of word segmentation is more accurate when information of another language is used. Word segmentation with cross-lingual information is highly relevant to bootstrap pronunciation dictionaries from audio data for Automatic Speech Recognition, bypass the written form in Speech-to-Speech Translation or build the vocabulary of an unseen language, particularly in the context of under-resourced languages. Using Model 3P for the alignment between English words and Spanish phonemes outperforms a state-of-the-art monolingual word segmentation approach [1] on the BTEC corpus [2] by up to 42% absolute in F-Score on the phoneme level and a GIZA++ alignment based on IBM Model 3 by up to 17%.

international conference on acoustics, speech, and signal processing | 2013

Rapid bootstrapping of a Ukrainian large vocabulary continuous speech recognition system

Tim Schlippe; Mykola Volovyk; Kateryna Yurchenko; Tanja Schultz

We report on our efforts toward an LVCSR system for the Slavic language Ukrainian. We describe the Ukrainian text and speech database recently collected as a part of our GlobalPhone corpus [1] with our Rapid Language Adaptation Toolkit [2]. The data was complemented by a large collection of text data crawled from various Ukrainian websites. For the production of the pronunciation dictionary, we investigate strategies using grapheme-to-phoneme (g2p) models derived from existing dictionaries of other languages, thereby reducing severely the necessary manual effort. Russian and Bulgarian g2p models even decrease the number of pronunciation rules to one fifth. We achieve significant improvement by applying state-of-the art techniques for acoustic modeling and our day-wise text collection and language model interpolation strategy [3]. Our best system achieves a word error rate of 11.21% on the test set on read newspaper speech.

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013

Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. Analyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.

Computer Speech & Language | 2016

Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

Graphical abstractDisplay Omitted HighlightsHuman translations guided language discovery for speech processing.Pronunciation extraction for non-written languages using cross-lingual information.Alignment model Model 3P for cross-lingual word-to-phoneme alignment.Algorithm to deduce phonetic transcriptions of words from Model 3P alignments.Analysis of appropriate source languages based on efficient evaluation measures. In this paper, we study methods to discover words and extract their pronunciations from audio data for non-written and under-resourced languages. We examine the potential and the challenges of pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In our scenario a human translator produces utterances in the (non-written) target language from prompts in a resource-rich source language. We add the resource-rich source language prompts to help the word discovery and pronunciation extraction process. By aligning the source language words to the target language phonemes, we segment the phoneme sequences into word-like chunks. The resulting chunks are interpreted as putative word pronunciations but are very prone to alignment and phoneme recognition errors. Thus we suggest our alignment model Model?3P that is particularly designed for cross-lingual word-to-phoneme alignment. We present two different methods (source word dependent and independent clustering) that extract word pronunciations from word-to-phoneme alignments and compare them. We show that both methods compensate for phoneme recognition and alignment errors. We also extract a parallel corpus consisting of 15 different translations in 10 languages from the Christian Bible to evaluate our alignment model and error recovery methods. For example, based on noisy target language phoneme sequences with 45.1% errors, we build a dictionary for an English Bible with a Spanish Bible translation with 4.5% OOV rate, where 64% of the extracted pronunciations contain no more than one wrong phoneme. Finally, we use the extracted pronunciations in an automatic speech recognition system for the target language and report promising word error rates - given that pronunciation dictionary and language model are learned completely unsupervised and no written form for the target language is required for our approach.

international conference on acoustics, speech, and signal processing | 2013

Statistical machine translation based text normalization with crowdsourcing

Tim Schlippe; Chenfei Zhu; Daniel Lemcke; Tanja Schultz

In [1], we have proposed systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of Internet users and evaluated those with French texts. Internet users normalize text displayed in a web interface in an annotation process, thereby providing a parallel corpus of normalized and non-normalized text. With this corpus, SMT models are generated to translate non-normalized into normalized text. In this paper, we analyze their efficiency for other languages. Additionally, we embedded the English annotation process for training data in Amazon Mechanical Turk and compare the quality of texts thoroughly annotated in our lab to those annotated by the Turkers. Finally, we investigate how to reduce the user effort by iteratively applying an SMT system to the next sentences to be edited, built from the sentences which have been annotated so far.

Explore More