Younes Samih
University of Düsseldorf
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Younes Samih.
workshop on computational approaches to code switching | 2016
Younes Samih; Suraj Maharjan; Mohammed Attia; Laura Kallmeyer; Thamar Solorio
This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second Workshop on Computational Approaches to Code Switching. Our system ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second place for Spanish-English with an F1-score of 0.90. The HHU-UHG system introduces a novel unified neural network architecture for language identification in code-switched tweets for both SpanishEnglish and MSA-Egyptian dialect. The system makes use of word and character level representations to identify code-switching. For the MSA-Egyptian dialect the system does not rely on any kind of language-specific knowledge or linguistic resources such as, Part Of Speech (POS) taggers, morphological analyzers, gazetteers or word lists to obtain state-ofthe-art performance.
Proceedings of the Third Arabic Natural Language Processing Workshop | 2017
Younes Samih; Mohammed Attia; Mohamed Eldesouki; Ahmed Abdelali; Hamdy Mubarak; Laura Kallmeyer; Kareem Darwish
The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any normalization or reliance on lexical features or linguistic resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that heavily depend on additional resources.
Natural Language Engineering | 2016
Mohammed Attia; Pavel Pecina; Younes Samih; Khaled Shaalan; Josef van Genabith
A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
conference on computational natural language learning | 2017
Younes Samih; Mohamed Eldesouki; Mohammed Attia; Kareem Darwish; Ahmed Abdelali; Hamdy Mubarak; Laura Kallmeyer
Arabic dialects do not just share a common koine, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.
workshop on computational approaches to code switching | 2016
Younes Samih; Wolfgang Maier; Laura Kallmeyer
We present SAWT, a web-based tool for the annotation of token sequences with an arbitrary set of labels. The key property of the tool is simplicity and ease of use for both annotators and administrators. SAWT runs in any modern browser, including browsers on mobile devices, and only has minimal server-side requirements.
language resources and evaluation | 2012
Khaled Shaalan; Mohammed Attia; Pavel Pecina; Younes Samih; Josef van Genabith
international conference on computational linguistics | 2012
Mohammed Attia; Pavel Pecina; Younes Samih; Khaled Shaalan; Josef van Genabith
international conference on computational linguistics | 2012
Mohammed Attia; Younes Samih; Khaled Shaalan; Josef van Genabith
language resources and evaluation | 2016
Younes Samih; Wolfgang Maier
international conference on computational linguistics | 2016
Mohammed Attia; Suraj Maharjan; Younes Samih; Laura Kallmeyer; Thamar Solorio