Mahmoud Ghoneim
George Washington University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mahmoud Ghoneim.
workshop on computational approaches to code switching | 2014
Thamar Solorio; Elizabeth Blair; Suraj Maharjan; Steven Bethard; Mona T. Diab; Mahmoud Ghoneim; Abdelati Hawwari; Fahad AlGhamdi; Julia Hirschberg; Alison Chang; Pascale Fung
We present an overview of the first shared task on language identification on codeswitched data. The shared task included code-switched data from four language pairs: Modern Standard ArabicDialectal Arabic (MSA-DA), MandarinEnglish (MAN-EN), Nepali-English (NEPEN), and Spanish-English (SPA-EN). A total of seven teams participated in the task and submitted 42 system runs. The evaluation showed that language identification at the token level is more difficult when the languages present are closely related, as in the case of MSA-DA, where the prediction performance was the lowest among all language pairs. In contrast, the language pairs with the higest F-measure where SPA-EN and NEP-EN. The task made evident that language identification in code-switched data is still far from solved and warrants further research.
meeting of the association for computational linguistics | 2015
Houda Bouamor; Wajdi Zaghouani; Mona T. Diab; Ossama Obeid; Kemal Oflazer; Mahmoud Ghoneim; Abdelati Hawwari
Arabic script writing is typically underspecified for short vowels and other mark up, referred to as diacritics. Apart from the lexical ambiguity found in words, similar to that exhibited in other languages, the lack of diacritics in written Arabic script adds another layer of ambiguity which is an artifact of the orthography. Diacritization of written text has a significant impact on Arabic NLP applications. In this paper, we present a pilot study on building a diacritized multi-genre corpus in Arabic. We annotate a sample of nondiacritized words extracted from five text genres. We explore different annotation strategies: Basic where we present only the bare undiacritized forms to the annotators, Intermediate (Basic forms+their POS tags), and Advanced (automatically diacritized words). We present the impact of the annotation strategy on annotation quality. Moreover, we study different diacritization schemes in the process.
empirical methods in natural language processing | 2014
Maryam Aminian; Mahmoud Ghoneim; Mona T. Diab
Dialects and standard forms of a language typically share a set of cognates that could bear the same meaning in both varieties or only be shared homographs but serve as faux amis. Moreover, there are words that are used exclusively in the dialect or the standard variety. Both phenomena, faux amis and exclusive vocabulary, are considered out of vocabulary (OOV) phenomena. In this paper, we present this problem of OOV in the context of machine translation. We present a new approach for dialect to English Statistical Machine Translation (SMT) enhancement based on normalizing dialectal language into standard form to provide equivalents to address both aspects of the OOV problem posited by dialectal language use. We specifically focus on Arabic to English SMT. We use two publicly available dialect identification tools: AIDA and MADAMIRA, to identify and replace dialectal Arabic OOV words with their modern standard Arabic (MSA) equivalents. The results of evaluation on two blind test sets show that using AIDA to identify and replace MSA equivalents enhances translation results by 0.4% absolute BLEU (1.6% relative BLEU) and using MADAMIRA achieves 0.3% absolute BLEU (1.2% relative BLEU) enhancement over the baseline. We show our replacement scheme reaches a noticeable enhancement in SMT performance for faux amis words.
north american chapter of the association for computational linguistics | 2015
Maryam Aminian; Mahmoud Ghoneim; Mona T. Diab
Lexical false friends (FF) are the phenomena where words that look the same, do not have the same meaning or lexical usage. FF impose several challenges to statistical machine translation. We present a methodology which exploits word context modeling as well as information provided by word alignments for identifying false friends and choosing the right sense for them in the context. We show that our approach enhances SMT lexical choice for false friends across language variants. We demonstrate that our approach reduces word error rate (WER) and position independent error rate (PER) for Egyptian-English SMT by 0.6% and 0.1% compared to the baseline.
international joint conference on natural language processing | 2013
Mahmoud Ghoneim; Mona T. Diab
language resources and evaluation | 2016
Wajdi Zaghouani; Houda Bouamor; Abdelati Hawwari; Mona T. Diab; Ossama Obeid; Mahmoud Ghoneim; Sawsan Alqahtani; Kemal Oflazer
Archive | 2015
Houda Bouamor; Wajdi Zaghouani; Mona T. Diab; Ossama Obeid; Kemal Oflazer; Mahmoud Ghoneim; Abdelati Hawwari
language resources and evaluation | 2016
Mona T. Diab; Mahmoud Ghoneim; Abdelati Hawwari; Fahad AlGhamdi; Nada AlMarwani; Mohamed Al-Badrashiny
international conference on computational linguistics | 2016
Mohamed Al-Badrashiny; Abdelati Hawwari; Mahmoud Ghoneim; Mona T. Diab
language resources and evaluation | 2016
Abdelati Hawwari; Mohammed Attia; Mahmoud Ghoneim; Mona T. Diab