Is this you? Create Your Porfile

Stephan Vogel

Qatar Computing Research Institute

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stephan Vogel is active.

Explore More

Publication

Featured researches published by Stephan Vogel.

spoken language technology workshop | 2014

A complete KALDI recipe for building Arabic speech recognition systems

Ahmed M. Ali; Yifan Zhang; Patrick Cardinal; Najim Dahak; Stephan Vogel; James R. Glass

In this paper we present a recipe and language resources for training and testing Arabic speech recognition systems using the KALDI toolkit. We built a prototype broadcast news system using 200 hours GALE data that is publicly available through LDC. We describe in detail the decisions made in building the system: using the MADA toolkit for text normalization and vowelization; why we use 36 phonemes; how we generate pronunciations; how we build the language model. We report results using state-of-the-art modeling and decoding techniques. The scripts are released through KALDI and resources are made available on QCRIs language resources web portal. This is the first effort to share reproducible sizable training and testing results on MSA system.

empirical methods in natural language processing | 2015

How to Avoid Unwanted Pregnancies: Domain Adaptation using Neural Network Models

Shafiq R. Joty; Hassan Sajjad; Nadir Durrani; Kamla Al-Mannai; Ahmed Abdelali; Stephan Vogel

We present novel models for domain adaptation based on the neural network joint model (NNJM). Our models maximize the cross entropy by regularizing the loss function with respect to in-domain model. Domain adaptation is carried out by assigning higher weight to out-domain sequences that are similar to the in-domain data. In our alternative model we take a more restrictive approach by additionally penalizing sequences similar to the outdomain data. Our models achieve better perplexities than the baseline NNJM models and give improvements of up to 0.5 and 0.6 BLEU points in Arabic-to-English and English-to-German language pairs, on a standard task of translating TED talks.

spoken language technology workshop | 2012

Word segmentation through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

We present our new alignment model Model 3P for cross-lingual word-to-phoneme alignment, and show that unsupervised learning of word segmentation is more accurate when information of another language is used. Word segmentation with cross-lingual information is highly relevant to bootstrap pronunciation dictionaries from audio data for Automatic Speech Recognition, bypass the written form in Speech-to-Speech Translation or build the vocabulary of an unseen language, particularly in the context of under-resourced languages. Using Model 3P for the alignment between English words and Spanish phonemes outperforms a state-of-the-art monolingual word segmentation approach [1] on the BTEC corpus [2] by up to 42% absolute in F-Score on the phoneme level and a GIZA++ alignment based on IBM Model 3 by up to 17%.

international conference on image analysis and processing | 2015

The QCRI recognition system for handwritten Arabic

Felix Stahlberg; Stephan Vogel

This paper describes our recognition system for handwritten Arabic. We propose novel text line image normalization procedures and a new feature extraction method. Our recognition system is based on the Kaldi recognition toolkit which is widely used in automatic speech recognition (ASR) research. We show that the combination of sophisticated text image normalization and state-of-the art techniques originating from ASR results in a very robust and accurate recognizer. Our system outperforms the best systems in the literature by over 20% relative on the abcde-s configuration of the IFN/ENIT database and achieves comparable performance on other configurations. On the KHATT corpus, we report 11% relative improvement compared to the best system in the literature.

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013

Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

With the help of written translations in a source language, we cross-lingually segment phoneme sequences in a target language into word units using our new alignment model Model 3P [17]. From this, we deduce phonetic transcriptions of target language words, introduce the vocabulary in terms of word IDs, and extract a pronunciation dictionary. Our approach is highly relevant to bootstrap dictionaries from audio data for Automatic Speech Recognition and bypass the written form in Speech-to-Speech Translation, particularly in the context of under-resourced languages, and those which are not written at all. n nAnalyzing 14 translations in 9 languages to build a dictionary for English shows that the quality of the resulting dictionary is better in case of close vocabulary sizes in source and target language, shorter sentences, more word repetitions, and formal equivalent translations.

Computer Speech & Language | 2016

Word segmentation and pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment

Felix Stahlberg; Tim Schlippe; Stephan Vogel; Tanja Schultz

Graphical abstractDisplay Omitted HighlightsHuman translations guided language discovery for speech processing.Pronunciation extraction for non-written languages using cross-lingual information.Alignment model Model 3P for cross-lingual word-to-phoneme alignment.Algorithm to deduce phonetic transcriptions of words from Model 3P alignments.Analysis of appropriate source languages based on efficient evaluation measures. In this paper, we study methods to discover words and extract their pronunciations from audio data for non-written and under-resourced languages. We examine the potential and the challenges of pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment. In our scenario a human translator produces utterances in the (non-written) target language from prompts in a resource-rich source language. We add the resource-rich source language prompts to help the word discovery and pronunciation extraction process. By aligning the source language words to the target language phonemes, we segment the phoneme sequences into word-like chunks. The resulting chunks are interpreted as putative word pronunciations but are very prone to alignment and phoneme recognition errors. Thus we suggest our alignment model Model?3P that is particularly designed for cross-lingual word-to-phoneme alignment. We present two different methods (source word dependent and independent clustering) that extract word pronunciations from word-to-phoneme alignments and compare them. We show that both methods compensate for phoneme recognition and alignment errors. We also extract a parallel corpus consisting of 15 different translations in 10 languages from the Christian Bible to evaluate our alignment model and error recovery methods. For example, based on noisy target language phoneme sequences with 45.1% errors, we build a dictionary for an English Bible with a Spanish Bible translation with 4.5% OOV rate, where 64% of the extracted pronunciations contain no more than one wrong phoneme. Finally, we use the extracted pronunciations in an automatic speech recognition system for the target language and report promising word error rates - given that pronunciation dictionary and language model are learned completely unsupervised and no written form for the target language is required for our approach.

international conference on document analysis and recognition | 2015

Detecting dense foreground stripes in Arabic handwriting for accurate baseline positioning

Felix Stahlberg; Stephan Vogel

Since Arabic script has a strong baseline, many state-of-the-art recognition systems for handwritten Arabic make use of baseline-dependent features. For printed Arabic, the baseline can be detected reliably by finding the maximum in the horizontal projection profile or the Hough transformed image. However, the performance of these methods drops significantly on handwritten Arabic. In this work, we present a novel approach to baseline detection in handwritten Arabic which is based on the detection of stripes in the image with dense foreground. Such a stripe usually corresponds to the area between lower and upper baseline. Our method outperforms a previous method by 22.4% relative for the task of finding acceptable baselines in Tunisian town names in the IFN/ENIT database.

Studies in health technology and informatics | 2015

BrailleEasy: One-handed Braille Keyboard for Smartphones.

Barbara Sepic; Abdurrahman Ghanem; Stephan Vogel

The evolution of mobile technology is moving at a very fast pace. Smartphones are currently considered a primary communication platform where people exchange voice calls, text messages and emails. The human-smartphone interaction, however, is generally optimized for sighted people through the use of visual cues on the touchscreen, e.g., typing text by tapping on a visual keyboard. Unfortunately, this interaction scheme renders smartphone technology largely inaccessible to visually impaired people as it results in slow typing and higher error rates. Apple and some third party applications provide solutions specific to blind people which enables them to use Braille on smartphones. These applications usually require both hands for typing. However, Brailling with both hands while holding the phone is not very comfortable. Furthermore, two-handed Brailling is not possible on smartwatches, which will be used more pervasively in the future. Therefore, we develop a platform for one-handed Brailing consisting of a custom keyboard called BrailleEasy to input Arabic or English Braille codes within any application, and a BrailleTutor application for practicing. Our platform currently supports Braille grade 1, and will be extended to support contractions, spelling correction, and more languages. Preliminary analysis of user studies for blind participants showed that after less than two hours of practice, participants were able to type significantly faster with the BrailleEasy keyboard than with the standard QWERTY keyboard.

north american chapter of the association for computational linguistics | 2016

Eyes Don't Lie: Predicting Machine Translation Quality Using Eye Movement

Hassan Sajjad; Francisco Guzmán; Nadir Durrani; Ahmed Abdelali; Houda Bouamor; Irina P. Temnikova; Stephan Vogel

Poorly translated text is often disfluent and difficult to read. In contrast, well-formed translations require less time to process. In this paper, we model the differences in reading patterns of Machine Translation (MT) evaluators using novel features extracted from their gaze data, and we learn to predict the quality scores given by those evaluators. We test our predictions in a pairwise ranking scenario, measuring Kendall’s tau correlation with the judgments. We show that our features provide information beyond fluency, and can be combined with BLEU for better predictions. Furthermore, our results show that reading patterns can be used to build semi-automatic metrics that anticipate the scores given by the evaluators.

RANLP 2017 - Workshop on Human-Informed Translation and Interpreting Technology | 2017

Interpreting Strategies Annotation in the WAWCorpus

Irina P. Temnikova; Ahmed Abdelali; Samy Hedaya; Stephan Vogel; Aishah Al Daher

With the aim to teach our automatic speech-to-text translation system human interpreting strategies, our first step is to identify which interpreting strategies are most often used in the language pair of our interest (English-Arabic). In this article we run an automatic analysis of a corpus of parallel speeches and their human interpretations, and provide the results of manually annotating the human interpreting strategies in a sample of the corpus. We give a glimpse of the corpus, whose value surpasses the fact that it contains a high number of scientific speeches with their interpretations from English into Arabic, as it also provides rich information about the interpreters. We also discuss the difficulties, which we encountered on our way, as well as our solutions to them: our methodology for manual re-segmentation and alignment of parallel segments, the choice of annotation tool, and the annotation procedure. Our annotation findings explain the previously extracted specific statistical features of the interpreted corpus (compared with a translation one) as well as the quality of interpretation provided by different interpreters.

Explore More