Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Younes Samih is active.

Publication


Featured researches published by Younes Samih.


workshop on computational approaches to code switching | 2016

Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

Younes Samih; Suraj Maharjan; Mohammed Attia; Laura Kallmeyer; Thamar Solorio

This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second Workshop on Computational Approaches to Code Switching. Our system ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second place for Spanish-English with an F1-score of 0.90. The HHU-UHG system introduces a novel unified neural network architecture for language identification in code-switched tweets for both SpanishEnglish and MSA-Egyptian dialect. The system makes use of word and character level representations to identify code-switching. For the MSA-Egyptian dialect the system does not rely on any kind of language-specific knowledge or linguistic resources such as, Part Of Speech (POS) taggers, morphological analyzers, gazetteers or word lists to obtain state-ofthe-art performance.


Proceedings of the Third Arabic Natural Language Processing Workshop | 2017

A Neural Architecture for Dialectal Arabic Segmentation

Younes Samih; Mohammed Attia; Mohamed Eldesouki; Ahmed Abdelali; Hamdy Mubarak; Laura Kallmeyer; Kareem Darwish

The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any normalization or reliance on lexical features or linguistic resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that heavily depend on additional resources.


Natural Language Engineering | 2016

Arabic spelling error detection and correction

Mohammed Attia; Pavel Pecina; Younes Samih; Khaled Shaalan; Josef van Genabith

A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.


conference on computational natural language learning | 2017

Learning from Relatives: Unified Dialectal Arabic Segmentation

Younes Samih; Mohamed Eldesouki; Mohammed Attia; Kareem Darwish; Ahmed Abdelali; Hamdy Mubarak; Laura Kallmeyer

Arabic dialects do not just share a common koine, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.


workshop on computational approaches to code switching | 2016

SAWT: Sequence Annotation Web Tool.

Younes Samih; Wolfgang Maier; Laura Kallmeyer

We present SAWT, a web-based tool for the annotation of token sequences with an arbitrary set of labels. The key property of the tool is simplicity and ease of use for both annotators and administrators. SAWT runs in any modern browser, including browsers on mobile devices, and only has minimal server-side requirements.


language resources and evaluation | 2012

Arabic Word Generation and Modelling for Spell Checking

Khaled Shaalan; Mohammed Attia; Pavel Pecina; Younes Samih; Josef van Genabith


international conference on computational linguistics | 2012

Improved Spelling Error Detection and Correction for Arabic

Mohammed Attia; Pavel Pecina; Younes Samih; Khaled Shaalan; Josef van Genabith


international conference on computational linguistics | 2012

The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the Detection and Lemmatization of Unknown Words

Mohammed Attia; Younes Samih; Khaled Shaalan; Josef van Genabith


language resources and evaluation | 2016

An Arabic-Moroccan Darija Code-Switched Corpus.

Younes Samih; Wolfgang Maier


international conference on computational linguistics | 2016

CogALex-V Shared Task: GHHH - Detecting Semantic Relations via Word Embeddings

Mohammed Attia; Suraj Maharjan; Younes Samih; Laura Kallmeyer; Thamar Solorio

Collaboration


Dive into the Younes Samih's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Laura Kallmeyer

University of Düsseldorf

View shared research outputs
Top Co-Authors

Avatar

Ahmed Abdelali

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Kareem Darwish

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mohamed Eldesouki

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Khaled Shaalan

British University in Dubai

View shared research outputs
Top Co-Authors

Avatar

Pavel Pecina

Charles University in Prague

View shared research outputs
Researchain Logo
Decentralizing Knowledge