Saša Hasan
RWTH Aachen University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saša Hasan.
workshop on statistical machine translation | 2009
Thomas Deselaers; Saša Hasan; Oliver Bender; Hermann Ney
In this paper we present a novel transliteration technique which is based on deep belief networks. Common approaches use finite state machines or other methods similar to conventional machine translation. Instead of using conventional NLP techniques, the approach presented here builds on deep belief networks, a technique which was shown to work well for other machine learning problems. We show that deep belief networks have certain properties which are very interesting for transliteration and possibly also for translation and that a combination with conventional techniques leads to an improvement over both components on an Arabic-English transliteration task.
empirical methods in natural language processing | 2009
Arne Mauser; Saša Hasan; Hermann Ney
In this work, we propose two extensions of standard word lexicons in statistical machine translation: A discriminative word lexicon that uses sentence-level source information to predict the target words and a trigger-based lexicon model that extends IBM model 1 with a second trigger, allowing for a more fine-grained lexical choice of target words. The models capture dependencies that go beyond the scope of conventional SMT models such as phrase-and language models. We show that the models improve translation quality by 1% in BLEU over a competitive baseline on a large-scale task.
empirical methods in natural language processing | 2008
Saša Hasan; Juri Ganitkevitch; Hermann Ney; Jesús Andrés-Ferrer
This paper describes a lexical trigger model for statistical machine translation. We present various methods using triplets incorporating long-distance dependencies that can go beyond the local context of phrases or n-gram based language models. We evaluate the presented methods on two translation tasks in a reranking framework and compare it to the related IBM model 1. We show slightly improved translation quality in terms of BLEU and TER and address various constraints to speed up the training based on Expectation-Maximization and to lower the overall number of triplets without loss in translation performance.
ieee automatic speech recognition and understanding workshop | 2007
Oliver Bender; Stefan Hahn; Saša Hasan; Shahram Khadivi; Hermann Ney
We present the RWTH phrase-based statistical machine translation system designed for the translation of Arabic speech into English text. This system was used in the Global Autonomous Language Exploitation (GALE) Go/No-Go Translation Evaluation 2007. Using a two-pass approach, we first generate n-best translation candidates and then rerank these candidates using additional models. We give a short review of the decoder as well as of the models used in both passes. We stress the difficulties of spoken language translation, i.e. how to combine the recognition and translation systems and how to compensate for missing punctuation. In addition, we cover our work on domain adaptation for the applied language models. We present translation results for the official GALE 2006 evaluation set and the GALE 2007 development set.
international conference on acoustics, speech, and signal processing | 2009
Murat Akbacak; Horacio Franco; Michael W. Frandsen; Saša Hasan; Huda Jameel; Andreas Kathol; Shahram Khadivi; Xin Lei; Arindam Mandal; Saab Mansour; Kristin Precoda; Colleen Richey; Dimitra Vergyri; Wen Wang; Mei Yang; Jing Zheng
We summarize recent progress on SRIs IraqComm™ Iraqi Arabic-English two-way speech-to-speech translation system. In the past year we made substantial developments in our speech recognition and machine translation technology, leading to significant improvements in both accuracy and speed of the IraqComm system. On the 2008 NIST-evaluation dataset our twoway speech-to-text (S2T) system achieved 6% to 8% absolute improvement in BLEU in both directions, compared to our previous year system [1].
north american chapter of the association for computational linguistics | 2007
Saša Hasan; Richard Zens; Hermann Ney
This paper describes an efficient method to extract large n-best lists from a word graph produced by a statistical machine translation system. The extraction is based on the k shortest paths algorithm which is efficient even for very large k. We show that, although we can generate large amounts of distinct translation hypotheses, these numerous candidates are not able to significantly improve overall system performance. We conclude that large n-best lists would benefit from better discriminating models.
north american chapter of the association for computational linguistics | 2009
Saša Hasan; Hermann Ney
We show how the integration of an extended lexicon model into the decoder can improve translation performance. The model is based on lexical triggers that capture long-distance dependencies on the sentence level. The results are compared to variants of the model that are applied in reranking of n-best lists. We present how a combined application of these models in search and rescoring gives promising results. Experiments are reported on the GALE Chinese-English task with improvements of up to +0.9% BLEU and -1.5% TER absolute on a competitive baseline.
conference of the european chapter of the association for computational linguistics | 2003
Karin Harbusch; Saša Hasan; Hajo Hoffmann; Michael Kühn; Bernhard Schüler
In this paper, we investigate whether and how domain-specific corpora increase precision of word disambiguation for typing on an ambiguous keyboard. Basically, the disambiguation for our ambiguous keyboard with three letter keys is based on language-specific word frequencies of the lexicon CELEX (in this study English and German is dealt with). The more specific frequency information is extracted from texts in the special domains of school homework in three subjects and articles in two different scientific areas. All in all, we could not always reach a better performance by deploying domain-specific predictions. As a general solution we propose an interpolated language model combining both the general and the specific language model. For all our domains good results---compared to an ideal prediction on the basis of all available models---could be achieved by this method.
empirical methods in natural language processing | 2015
Saša Hasan; Carmen Heger; Saab Mansour
We use character-based statistical machine translation in order to correct user search queries in the e-commerce domain. The training data is automatically extracted from event logs where users re-issue their search queries with potentially corrected spelling within the same session. We show results on a test set which was annotated by humans and compare against online autocorrection capabilities of three additional web sites. Overall, the methods presented in this paper outperform fully productized spellchecking and autocorrection services in terms of accuracy and F1 score. We also propose novel evaluation steps based on retrieved search results of the corrected queries in terms of quantity and relevance.
Machine Translation | 2012
Saša Hasan; Saab Mansour; Hermann Ney
In this article, we investigate different methodologies of Arabic segmentation for statistical machine translation by comparing a rule-based segmenter to different statistically-based segmenters. We also present a method for segmentation that serves the needs of a real-time translation system without impairing the translation accuracy. Second, we report on extended lexicon models based on triplets that incorporate sentence-level context during the decoding process. Results are presented on different translation tasks that show improvements in both BLEU and TER scores.