Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Hamdy Mubarak is active.

Publication


Featured researches published by Hamdy Mubarak.


north american chapter of the association for computational linguistics | 2016

SemEval-2016 Task 3: Community Question Answering

Preslav Nakov; Lluís Màrquez; Alessandro Moschitti; Walid Magdy; Hamdy Mubarak; abed Alhakim Freihat; James R. Glass; Bilal Randeree

This paper describes the SemEval–2016 Task 3 on Community Question Answering, which we offered in English and Arabic. For English, we had three subtasks: Question–Comment Similarity (subtask A), Question–Question Similarity (B), and Question–External Comment Similarity (C). For Arabic, we had another subtask: Rerank the correct answers for a new question (D). Eighteen teams participated in the task, submitting a total of 95 runs (38 primary and 57 contrastive) for the four subtasks. A variety of approaches and features were used by the participating systems to address the different subtasks, which are summarized in this paper. The best systems achieved an official score (MAP) of 79.19, 76.70, 55.41, and 45.83 in subtasks A, B, C, and D, respectively. These scores are significantly better than those for the baselines that we provided. For subtask A, the best system improved over the 2015 winner by 3 points absolute in terms of Accuracy.


empirical methods in natural language processing | 2014

Using Twitter to Collect a Multi-Dialectal Corpus of Arabic

Hamdy Mubarak; Kareem Darwish

This paper describes the collection and classification of a multi-dialectal corpus of Arabic based on the geographical information of tweets. We mapped information of user locations to one of the Arab countries, and extracted tweets that have dialectal word(s). Manual evaluation of the extracted corpus shows that the accuracy of assignment of tweets to some countries (like Saudi Arabia and Egypt) is above 93% while the accuracy for other countries, such Algeria and Syria is below 70%.


north american chapter of the association for computational linguistics | 2015

QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English

Massimo Nicosia; Simone Filice; Alberto Barrón-Cedeño; Iman Saleh; Hamdy Mubarak; Wei Gao; Preslav Nakov; Giovanni Da San Martino; Alessandro Moschitti; Kareem Darwish; Lluís Màrquez; Shafiq R. Joty; Walid Magdy

This paper describes QCRI’s participation in SemEval-2015 Task 3 “Answer Selection in Community Question Answering”, which targeted real-life Web forums, and was offered in both Arabic and English. We apply a supervised machine learning approach considering a manifold of features including among others word n-grams, text similarity, sentiment analysis, the presence of specific words, and the context of a comment. Our approach was the best performing one in the Arabic subtask and the third best in the two English subtasks.


north american chapter of the association for computational linguistics | 2016

Farasa: A Fast and Furious Segmenter for Arabic.

Ahmed Abdelali; Kareem Darwish; Nadir Durrani; Hamdy Mubarak

In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is based on SVM-rank using linear kernels. We measure the performance of the segmenter in terms of accuracy and efficiency, in two NLP tasks, namely Machine Translation (MT) and Information Retrieval (IR). Farasa outperforms or is at par with the stateof-the-art Arabic segmenters (Stanford and MADAMIRA), while being more than one order of magnitude faster.


empirical methods in natural language processing | 2014

Verifiably Effective Arabic Dialect Identification

Kareem Darwish; Hassan Sajjad; Hamdy Mubarak

Several recent papers on Arabic dialect identification have hinted that using a word unigram model is sufficient and effective for the task. However, most previous work was done on a standard fairly homogeneous dataset of dialectal user comments. In this paper, we show that training on the standard dataset does not generalize, because a unigram model may be tuned to topics in the comments and does not capture the distinguishing features of dialects. We show that effective dialect identification requires that we account for the distinguishing lexical, morphological, and phonological phenomena of dialects. We show that accounting for such can improve dialect detection accuracy by nearly 10% absolute.


spoken language technology workshop | 2016

The MGB-2 challenge: Arabic multi-dialect broadcast media recognition

Ahmed M. Ali; Peter Bell; James R. Glass; Yacine Messaoui; Hamdy Mubarak; Steve Renals; Yifan Zhang

This paper describes the Arabic Multi-Genre Broadcast (MGB-2) Challenge for SLT-2016. Unlike last years English MGB Challenge, which focused on recognition of diverse TV genres, this year, the challenge has an emphasis on handling the diversity in dialect in Arabic speech. Audio data comes from 19 distinct programmes from the Aljazeera Arabic TV channel between March 2005 and December 2015. Programmes are split into three groups: conversations, interviews, and reports. A total of 1,200 hours have been released with lightly supervised transcriptions for the acoustic modelling. For language modelling, we made available over 110M words crawled from Aljazeera Arabic website Aljazeera.net for a 10 year duration 2000−2011. Two lexicons have been provided, one phoneme based and one grapheme based. Finally, two tasks were proposed for this years challenge: standard speech transcription, and word alignment. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained.


empirical methods in natural language processing | 2014

Automatic Correction of Arabic Text: a Cascaded Approach

Hamdy Mubarak; Kareem Darwish

This paper describes the error correction model that we used for the Automatic Correction of Arabic Text shared task. We employed two correction models, namely a character-level model and a casespecific model, and two punctuation recovery models, namely a simple statistical model and a CRF model. Our results on the development set suggest that using a cascaded correction model yields the best results.


meeting of the association for computational linguistics | 2015

Best Practices for Crowdsourcing Dialectal Arabic Speech Transcription

Samantha Wray; Hamdy Mubarak; Ahmed M. Ali

In this paper, we investigate different approaches in crowdsourcing transcriptions of Dialectal Arabic speech with automatic quality control to ensure good transcription at the source. Since Dialectal Arabic has no standard orthographic representation, it is very challenging to perform quality control. We propose a complete recipe for speech transcription quality control that includes using output of an Automatic Speech Recognition system. We evaluated the quality of the transcribed speech and through this recipe, we achieved a reduction in transcription error of 1.0% compared with 13.2% baseline with no quality control for Egyptian data, and down to 4% compared with 7.8% for the North African dialect.


Proceedings of the Third Arabic Natural Language Processing Workshop | 2017

A Neural Architecture for Dialectal Arabic Segmentation

Younes Samih; Mohammed Attia; Mohamed Eldesouki; Ahmed Abdelali; Hamdy Mubarak; Laura Kallmeyer; Kareem Darwish

The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any normalization or reliance on lexical features or linguistic resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that heavily depend on additional resources.


Proceedings of the First Workshop on Abusive Language Online | 2017

Abusive Language Detection on Arabic Social Media

Hamdy Mubarak; Kareem Darwish; Walid Magdy

In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site’s rules and guidelines.

Collaboration


Dive into the Hamdy Mubarak's collaboration.

Top Co-Authors

Avatar

Kareem Darwish

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Ahmed Abdelali

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Younes Samih

University of Düsseldorf

View shared research outputs
Top Co-Authors

Avatar

Mohamed Eldesouki

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Walid Magdy

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Laura Kallmeyer

University of Düsseldorf

View shared research outputs
Top Co-Authors

Avatar

Ahmed M. Ali

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Alessandro Moschitti

Qatar Computing Research Institute

View shared research outputs
Top Co-Authors

Avatar

Yifan Zhang

Qatar Computing Research Institute

View shared research outputs
Researchain Logo
Decentralizing Knowledge