Kais Haddar
University of Sfax
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kais Haddar.
International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ | 2015
Mohamed Aly Fall Seideh; Hela Fehri; Kais Haddar
With the adverse health effects of chemical drugs and antibiotics, herbal medicine has been a resurgence of interest in recent years. Thus, the use of medicinal plants is being largely considered as an effective and lucrative treatment, especially in Asia and Africa. The objective of this work is to achieve an identification system of medicinal plants names from French-Arabic parallel corpora. Corpora are formed by several texts composed from the multilingual encyclopedia Wikipedia. The identification of Named Entities is realized by several types of patterns. These patterns are represented by a set of transducers. The prototype is implemented in NooJ linguistic platform using a set of morphological and syntactic grammars. This prototype is experimented on a French-Arabic parallel corpora collected from Wikipedia. The obtained results are promising given the measures values.
text speech and dialogue | 2014
Raja Bensalem Bahloul; Marwa Elkarwi; Kais Haddar; Philippe Blache
This paper presents a survey of Arabic treebanks to facilitate their reuse for the building of new linguistic resources. In our case, we created from a treebank an automatically induced Property Grammar (GP). So, we discussed characteristics of these treebanks to choose the appropriate one. To build our resource, we adopted an automatic technique, acquiring first a context-free grammar (CFG) from the chosen treebank, and second, inducing a GP by generating relations between grammatical units described in the CFG.
International NooJ Conference | 2016
Nadia Ghezaiel Hammouda; Kais Haddar
Automatic annotation for Arabic corpora has an important role in many applications of Natural Language Processing (NLP). In this context, we are interested in the automatic annotation of Arabic corpora using transducers set implemented in NooJ platform. And to achieve our aim, we must precede the annotation phase by a segmentation phase. This segmentation phase will, on the one hand, reduce the complexity of the analysis and, on the other hand, improve NooJ platform functionalities. Also, we achieved our annotation phase by identifying different types of lexical ambiguities, and then an appropriate set of rules is proposed. In addition, we experiment our phase on a test corpus with NooJ platform. The obtained results are ambitious and can be improved by adding other rules and heuristics.
International Journal of Computer Processing of Languages | 2009
Kais Haddar; Abdelmajid Ben Hamadou
The ellipsis phenomenon constitutes one of the important topics of study in natural language processing because it appears frequently in dialogues as well as in written texts. This is the context of the present article which proposes an ellipsis processing approach for the Arabic language. Our first contribution consists of introducing a formal characterisation of the ellipsis phenomenon, which constitutes the basis of the method proposed for detection of elliptical sentence parts. Then we present a clause grammar that makes it possible to distinguish between well-formed clauses and those with missing constituents. Concerning the resolution, the proposed method relies on an elliptical sentence classification underlying these three different resolution processes: using propagation, cascaded, and alternation. In this paper, we also try to resolve some ambiguities concerning ellipsis resolution and to study the phenomenon of anaphora, which can interact with ellipsis. To prove the feasibility of the proposed approaches, we have developed a prototype called ERASE (Ellipsis Resolution of Arabic Sentences) and we have tested it on a corpus of elliptical Arabic sentences. The results obtained are satisfactory.
Procedia Computer Science | 2017
Fatma Ben Mesmia; Fatma Zid; Kais Haddar; Denis Maurel
Abstract Since the MUC-7, the extraction of the Semantic Relation (SR) extraction has been started aiming to detect the significant links between Named Entities (NEs). This task is evolved in many domains to realize several objectives such as corpora and electronic NE dictionary enrichment. In this context, we propose a rue-based system called ASRextractor, which extracts and annotates SRs relating Arabic NEs (ANEs). The SR extraction is based on an annotated Arabic Wikipedia corpus and it helps us identify 18 SR types such as synonymy and origin. For the SR annotation, our proposed system reposes on an annotation syntax respecting the TEI (Text Encoding Initiative) recommendation. Moreover, ASRextractor is based on finite state transducers, which ensure both the extraction and annotation process. The established transducers are regrouped inside an analysis cascade in a predefined order. The metric values show that our obtained results are encouraging.
International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ | 2017
Nadia Ghezaiel Hammouda; Kais Haddar
Parsing Arabic corpora is an important task aiming to understand Arabic language, enrich and enhance the electronic resources, and increase the efficiency of natural language applications like translation or the recognition. In this paper, we propose a parsing approach for Arabic sentences especially for nominal ones. To do this, we first study the typology of the Arabic nominal sentence. Then, we develop a set of rules generating different nominal sentences. After that, we present our parsing approach based on transducers and on our tag set. In addition, we transform recursive graph of transducers into transducer cascade to reduce the complexity. Finally, we present the implementation and experimentation of our approach in NooJ platform. The obtained results are satisfactory.
ICALP 2017 - The 6th International Conference on Arabic Language Processing | 2017
Hajer Maraoui; Kais Haddar; Laurent Romary
The standardization of Al-Hadith Al-Shareef can guarantee the interoperability and interchangeability with other textual sources and takes the processing of Al-Hadith corpus to a higher level. Still, research works on Hadith corpora had not previously considered the standardization as real objective, especially for some standards such as TEI (Text Encoding Initiative). In this context, we aim at the standardization of Al-Hadith Al-Shareef on the basis of the TEI guidelines. To achieve this objective, we elaborated a TEI model that we customized for Hadith structure. Then we developed a prototype allowing the encoding of Hadith text. This prototype analyses Hadith texts and automatically generates a standardized version of the Hadith in TEI format. The evaluation of the TEI model and the prototype is based on Hadith corpus collected from Sahih Bukhari. The obtained results were encouraging despite some flaws related to exceptional cases of Hadith structure.
international joint conference on knowledge discovery knowledge engineering and knowledge management | 2015
Raja Bensalem Bahloul; Kais Haddar; Philippe Blache
The enrichment of an Arabic treebank with syntactic properties can facilitate many types of parsing processes. This enrichment allows also the increase of its use in different NLP applications, the acquirement of new linguistic resources and the ease of the probabilistic parsing process by using statistics to limit the properties to the satisfied ones or to the most frequent ones. In this context, our proposed enrichment method is based on a formalization phase, a Property Grammar induction phase from a source treebank and a treebank regeneration phase with a new syntactic property-based representation. Starting with a formalization phase in our enrichment problem may succeed its resolution procedure. In fact, it limits the specification of the data sets and the interactions between them to the used ones, which avoids any duplication. The formalization allows also the anticipation of the constraints to respect in the problem. The implementation of this enrichment method is experimented essentially on the Arabic treebank ATB. This experiment provides us with good and encouraging results and various properties of different types.
International Conference on Automatic Processing of Natural-Language Electronic Texts with NooJ | 2015
Nadia Ghezaiel; Kais Haddar
Lexical analysis can be a way to remove ambiguities in the Arabic language. So, their resolution is an important task in several domains of Natural Language Processing (NLP). In this context, this paper is inscribed. Our proposed resolution method is based essentially on the use of transducers on text automata. Indeed, these transducers specify the lexical rules of the Arabic language allowing corpus disambiguation. In order to achieve our resolution method, different types of lexical ambiguities are identified and studied. Then, an appropriate set of rules is proposed. After that, we represent all specified rules in NooJ. In addition, we present experimentation with NooJ platform conducted through various linguistic resources to obtain disambiguated syntactic structures suitable for the analysis. The results obtained are ambitious and can be improved by adding other rules and heuristics.
text, speech and dialogue | 2018
Samia Ben Ismail; Sirine Boukedi; Kais Haddar
The treatment of Broken Plural (BP) for Arabic noun using a unification grammar is an important task in Natural Language Processing (NLP). This treatment contributes to construct extensional lexicons with a large coverage. In this context, the main objective of this work is to develop a morphological analyzer for Arabic treating BP with Head-driven Phrase Structure Grammar (HPSG). Therefore, after a linguistic study, we start by identifying different patterns of BP and representing them with HPSG. The designed grammar was specified in Type Description Language (TDL) and then was experimented with LKB system. The obtained results were encouraged and satisfactory because our system can generates all BP forms that can have an Arabic singular noun.