Mahmoud El-Haj
Lancaster University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mahmoud El-Haj.
international conference natural language processing | 2008
Mahmoud El-Haj; Bassam Hammo
In this paper, we present and analyze the results of the application of Arabic query-based text summarization system - AQBTSS - in an attempt to produce a query-oriented summary for a single Arabic document. For this task, we adapted the traditional vector space model (VSM) and the cosine similarity measure to find the most relevant passages extracted form Arabic document to produce a text summary. We aim at using the short summaries in some natural language (NL) tasks such as generating answers for Arabic open domain question answering system (AQAS) as well as experimenting with categorizing Arabic scripts. The obtained results indicate that our simple approach for text summarization is promising.
asia information retrieval symposium | 2011
Mahmoud El-Haj; Udo Kruschwitz; Chris Fox
In this paper we explore clustering for multi-document Arabic summarisation. For our evaluation we use an Arabic version of the DUC-2002 dataset that we previously generated using Google Translate. We explore how clustering (at the sentence level) can be applied to multi-document summarisation as well as for redundancy elimination within this process. We use different parameter settings including the cluster size and the selection model applied in the extractive summarisation process. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.
language resources and evaluation | 2015
Mahmoud El-Haj; Udo Kruschwitz; Chris Fox
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.
language and technology conference | 2009
Mahmoud El-Haj; Udo Kruschwitzc; Chris Fox
The volume of information available on the Web is increasing rapidly. The need for systems that can automatically summarise documents is becoming ever more desirable. For this reason, text summarisation has quickly grown into a major research area as illustrated by the DUC and TAC conference series. Summarisation systems for Arabic are however still not as sophisticated and as reliable as those developed for languages like English. In this paper we discuss two summarisation systems for Arabic and report on a large user study performed on these systems. The first system, the Arabic Query-Based Text Summarisation System (AQBTSS), uses standard retrieval methods to map a query against a document collection and to create a summary. The second system, the Arabic Concept-Based Text Summarisation System (ACBTSS), creates a query-independent document summary. Five groups of users from different ages and educational levels participated in evaluating our systems
computer science and electronic engineering conference | 2011
Mahmoud El-Haj; Udo Kruschwitz; Chris Fox
In this paper we present our generic extractive Arabic and English multi-document summarisers. We also describe the use of machine translation for evaluating the generated Arabic multi-document summaries using English extractive gold standards. In this work we first address the lack of Arabic multi-document corpora for summarisation and the absence of automatic and manual Arabic gold-standard summaries. These are required to evaluate any automatic Arabic summarisers. Second, we demonstrate the use of Google Translate in creating an Arabic version of the DUC-2002 dataset. The parallel Arabic/English dataset is summarised using the Arabic and English summarisation systems. The automatically generated summaries are evaluated using the ROUGE metric, as well as precision and recall. The results we achieve are compared with the top five systems in the DUC-2002 multi-document summarisation task.
computer science and electronic engineering conference | 2013
Mahmoud El-Haj; Lorna Balkan; Suzanne Barbalet; Lucy Bell; John Shepherdson
In this paper we present the tools, techniques and evaluation results of an automatic indexing experiment we conducted on the UK Data Archive/UK Data Service data-related document collection, as part of the Jisc-funded SKOS-HASSET project. We examined the quality of an automatic indexer based on a controlled vocabulary called the Humanities and Social Science Electronic Thesaurus (HASSET). We used the Keyphrase Extraction Algorithm (KEA), a text mining and a machine learning tool. KEA builds a classifier model using training documents with known keywords which is then applied to help assign keywords to new documents. We performed extensive manual and automatic evaluation on the results using recall, precision and F1 scores. The quality of the KEA indexing was measured a) automatically by the degree of overlap between the automated indexing decisions and those originally made by the human indexer and b) manually by comparing KEAs output with the source text. This paper explains how and why we applied the chosen technical solutions, and how we intend to take forward any lessons learned from this work in the future.
2013 ACS International Conference on Computer Systems and Applications (AICCSA) | 2013
Rim Koulali; Mahmoud El-Haj; Abdelouafi Meziane
With the exponential growth of the online available Arabic documents, classifying and processing large Arabic corpora has became a challenging task. The presence of noisy information embedded in these documents has made it even more difficult to get accurate results when applying a Topic Detection (TD) process. To address this problem, a proper features selection approach is needed to enhance the topic detection accuracy. In this paper, we explore the impact of using automatic summarisation technique along with a feature-selection process to enhance Arabic Topic Detection. In our work we show that using automatic summarisation reduces noisy information and results in a significant enhancement to the topic detection process and therefore increases the performance of our TD system. This was achieved by the ability of our summariser system in reducing documents size to speed up the detection process.
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications | 2017
Mahmoud El-Haj; Paul Rayson; Scott Piao; Stephen Wattam
Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses.
text, speech and dialogue | 2014
Mahmoud El-Haj; Paul Rayson; David Hall
We present quantitative and qualitative results of automatic and manual comparisons of translations of the originally French novel “The Stranger” (French: L’Etranger). We provide a novel approach to evaluating translation performance across languages without the need for reference translations or comparable corpora. Our approach examines the consistency of the translation of various document levels including chapters, parts and sentences. In our experiments we analyse four expert translations of the French novel. We also used Google’s machine translation output as baselines. We analyse the translations by using readability metrics, rank correlation comparisons and Word Error Rate (WER).
Theory and Applications of Categories | 2011
George Giannakopoulos; Mahmoud El-Haj; Benoit Favre; Marianna Litvak; Josef Steinberger; Vasudeva Varma