Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Khaled Shaalan is active.

Publication


Featured researches published by Khaled Shaalan.


IEEE Transactions on Knowledge and Data Engineering | 2006

A Survey of Web Information Extraction Systems

Chia-Hui Chang; M. Kayed; R. Girgis; Khaled Shaalan

The Internet presents a huge amount of useful information which is usually formatted for its users, which makes it difficult to extract relevant data from various sources. Therefore, the availability of robust, flexible information extraction (IE) systems that transform the Web pages into program-friendly structures such as a relational database will become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. Unfortunately, in only a few cases can the results generated by distinct tools be directly compared since the addressed extraction tasks are different. This paper surveys the major Web data extraction approaches and compares them in three dimensions: the task domain, the automation degree, and the techniques used. The criteria of the first dimension explain why an IE system fails to handle some Web sites of particular structures. The criteria of the second dimension classify IE systems based on the techniques used. The criteria of the third dimension measure the degree of automation for IE systems. We believe these criteria provide qualitatively measures to evaluate various IE approaches


ACM Transactions on Asian Language Information Processing | 2009

Arabic Natural Language Processing: Challenges and Solutions

Ali Farghaly; Khaled Shaalan

The Arabic language presents researchers and developers of natural language processing (NLP) applications for Arabic text and speech with serious challenges. The purpose of this article is to describe some of these challenges and to present some solutions that would guide current and future practitioners in the field of Arabic natural language processing (ANLP). We begin with general features of the Arabic language in Sections 1, 2, and 3 and then we move to more specific properties of the language in the rest of the article. In Section 1 of this article we highlight the significance of the Arabic language today and describe its general properties. Section 2 presents the feature of Arabic Diglossia showing how the sociolinguistic aspects of the Arabic language differ from other languages. The stability of Arabic Diglossia and its implications for ANLP applications are discussed and ways to deal with this problematic property are proposed. Section 3 deals with the properties of the Arabic script and the explosion of ambiguity that results from the absence of short vowel representations and overt case markers in contemporary Arabic texts. We present in Section 4 specific features of the Arabic language such as the nonconcatenative property of Arabic morphology, Arabic as an agglutinative language, Arabic as a pro-drop language, and the challenge these properties pose to ANLP. We also present solutions that have already been adopted by some pioneering researchers in the field. In Section 5 we point out to the lack of formal and explicit grammars of Modern Standard Arabic which impedes the progress of more advanced ANLP systems. In Section 6 we draw our conclusion.


Journal of the Association for Information Science and Technology | 2009

NERA: Named Entity Recognition for Arabic

Khaled Shaalan; Hafsa Raza

It cannot be overemphasized that changes in concepts have far more impact than new discoveries


Computational Linguistics | 2014

A survey of arabic named entity recognition and classification

Khaled Shaalan

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.


meeting of the association for computational linguistics | 2007

Person Name Entity Recognition for Arabic

Khaled Shaalan; Hafsa Raza

Named entity recognition (NER) is nowadays an important task, which is responsible for the identification of proper names in text and their classification as different types of named entity such as people, locations, and organizations. In this paper, we present our attempt at the recognition and extraction of the most important proper name entity, that is, the person name, for the Arabic language. We developed the system, Person Name Entity Recognition for Arabic (PERA), using a rule-based approach. The system consists of a lexicon, in the form of gazetteer name lists, and a grammar, in the form of regular expressions, which are responsible for recognizing person name entities. The PERA system is evaluated using a corpus that is tagged in a semi-automated way. The system performance results achieved were satisfactory and confirm to the targets set forth for the precision, recall, and f-measure.


Computer Assisted Language Learning | 2005

An Intelligent Computer Assisted Language Learning System for Arabic Learners.

Khaled Shaalan

This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognise by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including morphological analyser and syntax analyser) and error analyser to issue feedback to the learner. Furthermore, we propose a mechanism of correction by the learner which allows the learner to correct the typed sentence independently, and allows the learner to realise that what the error is.


International Journal of Computer Processing of Languages | 2004

Machine Translation of English Noun Phrases into Arabic

Khaled Shaalan; Ahmed Rafea; Azza Abdel Moneim; Hoda Baraka

The present work reports our attempt in automating the translation of English noun phrase (NP) into Arabic. Translating NP is a very important task toward sentence translation since NPs form the majority of textual content of the scientific and technical documents. The system is implemented in Prolog and the parser is written in DCG formalism. The paper also describes our experience with the developed MT system and reports results of its application on real titles of theses from the computer science domain.


international conference natural language processing | 2008

Arabic Named Entity Recognition from Diverse Text Types

Khaled Shaalan; Hafsa Raza

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on Named Entity Recognition (NER) for Arabic text due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this paper, we present the results of our attempt at the recognition and extraction of 10 most important named entities in Arabic script; the person name, location, company, date, time, price, measurement, phone number, ISBN and file name. We developed the system, Name Entity Recognition for Arabic (NERA), using a rule-based approach. The system consists of a whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. NERA is evaluated using our own corpora that are tagged in a semi-automated way, and the performance results achieved were satisfactory in terms of precision, recall, and f-measure.


international conference on computational linguistics | 2012

Integrating rule-based system with classification for arabic named entity recognition

Sherief Abdallah; Khaled Shaalan; Muhammad Shoaib

Named Entity Recognition (NER) is a subtask of information extraction that seeks to recognize and classify named entities in unstructured text into predefined categories such as the names of persons, organizations, locations, etc. The majority of researchers used machine learning, while few researchers used handcrafted rules to solve the NER problem. We focus here on NER for the Arabic language (NERA), an important language with its own distinct challenges. This paper proposes a simple method for integrating machine learning with rule-based systems and implement this proposal using the state-of-the-art rule-based system for NERA. Experimental evaluation shows that our integrated approach increases the F-measure by 8 to 14% when compared to the original (pure) rule based system and the (pure) machine learning approach, and the improvement is statistically significant for different datasets. More importantly, our system outperforms the state-of-the-art machine-learning system in NERA over a benchmark dataset.


Expert Systems With Applications | 2004

A multiagent approach for diagnostic expert systems via the internet

Khaled Shaalan; Mona El-Badry; Ahmed Rafea

In recent years there has been considerable interest in the possibility of building complex problem solving systems as groups of co-operating experts. This has led us to develop a multiagent expert systems capable to run on servers that can support a large group of users (clients) who communicate with the system over the network. The system provides an architecture to coordinate the behavior of several specific agent types. Two types of agents are involved. One type works on the server computer and the other type works on the client computers. The society of agents in our system consists of expert systems agents (diagnosis agents, and a treatment agent) working on the server side, each of which contains an autonomous knowledge-based system. Typically, agents will have expertise in distinct but related domains. The whole system is capable of solving problems, which require the cumulative expertise of the agent community. Besides to the user interface agent who employs an intelligent data collector, so-called communication model in KADS, working on the client sides. We took the advantage of a successful pre-existing expert systems-developed at CLAES (Central Laboratory for Agricultural Expert Systems, Egypt)-for constructing an architecture of a community of cooperating agents. This paper describes our experience with decomposing the diagnosis expert systems into a multi-agent system. Experiments on a set of test cases from real agricultural expert systems were preformed. The expert systems agents are implemented in Knowledge Representation Object Language (KROL) and JAVA languages using KADS knowledge engineering methodology on the WWW platform.

Collaboration


Dive into the Khaled Shaalan's collaboration.

Top Co-Authors

Avatar

Ahmed Rafea

American University in Cairo

View shared research outputs
Top Co-Authors

Avatar

Mostafa Al-Emran

Universiti Malaysia Pahang

View shared research outputs
Top Co-Authors

Avatar

Said A. Salloum

British University in Dubai

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Farhad Oroumchian

University of Wollongong in Dubai

View shared research outputs
Top Co-Authors

Avatar

Sanjeera Siddiqui

British University in Dubai

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Mai Oudah

Masdar Institute of Science and Technology

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge