Mariem Ellouze
University of Sfax
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mariem Ellouze.
international conference natural language processing | 2009
Wissal Brini; Mariem Ellouze; Slim Mesfar; Lamia Hadrich Belguith
In this paper, we propose an Arabic Question-Answering (Q-A) system called QASAL «Question -Answering system for Arabic Language». QASAL accepts as an input a natural language question written in Modern Standard Arabic (MSA) and generates as an output the most efficient and appropriate answer. The proposed system is composed of three modules: A question analysis module, a passage retrieval module and an answer extraction module. To process these three modules we use the NooJ Platform which represents a linguistic development environment.
conference on intelligent text processing and computational linguistics | 2015
Abir Masmoudi; Nizar Habash; Mariem Ellouze; Yannick Estève; Lamia Hadrich Belguith
In this paper, we describe the process of converting Tunisian Dialect text that is written in Latin script (also called Arabizi) into Arabic script following the CODA orthography convention for Dialectal Arabic. Our input consists of messages and comments taken from SMS, social networks and broadcast videos. The language used in social media and SMS messaging is characterized by the use of informal and non-standard vocabulary such as repeated letters for emphasis, typos, non-standard abbreviations, and nonlinguistic content, such as emoticons. There is a high degree of variation is spelling in Arabic dialects due to the lack of orthographic widely supported standards in both Arabic and Latin scripts. In the context of natural language processing, transliterating from Arabizi to Arabic script is a necessary step since most recently available tools for processing Arabic Dialects expect Arabic script input.
international conference natural language processing | 2002
Mariem Ellouze; Abdelmajid Ben Hamadou
Automatic summaries are often subject to several criticisms (e.g., lack of cohesion and coherence). In this paper, we propose an approach that uses coherent Summary-Schemas (templates) conceived from the rhetorical structure of scientific papers including their abstracts. The Summary-Schemas embed rhetorical roles specified by signatures (sets of positional, structural, linguistic and thematic features) that guide the search for appropriate sentences in the source text.
applications of natural language to data bases | 2014
Rahma Boujelbane; Mariem Mallek; Mariem Ellouze; Lamia Hadrich Belguith
Arabic Dialects (AD) have recently begun to receive more attention from the speech science and technology communities. The use of dialects in language technologies will contribute to improve the development process and the usability of applications such speech recognition, speech comprehension, or speech synthesis. However, AD faces the problem of lack of resources compared to the Modern Standard Arabic (MSA). This paper deals with the problem of tagging an AD: The Tunisian Dialect (TD). We present, in this work, a method for building a fine grained POS (Part Of Speech tagger) for the TD. This method consists on adapting a MSA POS tagger by generating a training TD corpus from a MSA corpus using a bilingual lexicon MSA-TD. The evaluation of the TD tagger on a corpus of text transcriptions achieved an accuracy of 78.5%.
Archive | 2016
Mariem Ellouze; Sami Sayadi
A huge amount of hazardous organopollutants, often persistent and toxic, is pro‐ duced annually over the world and may contaminate soil, water, ground water, and air. Being from various sources such as wastewater, landfill leachates, and solid residues, xenobiotics include phenols, plastics, hydrocarbons, paints, dyes, pesticides and insecticides, paper and pulp mills, and pharmaceuticals. Among biological processes for degradation of xenobiotics, fungal ones, being eco-friendly and cost cheap, have been investigated extensively because most of basidiomycetes are more tolerant to high concentrations of pollutants. Fungal bioremediation is a promising technology using their metabolic potential to remove or reduce xenobiotics. Basidiomycetes are the unique microorganisms that show high capacities of degrading a wide range of toxic xenobiotics. They act via the extracellular ligninolytic enzymes, including laccase, manganese peroxidase, and lignin peroxidase. Their capacities to remove xenobiotic substances and produce polymeric products make them a useful tool for bioremedia‐ tion purposes. During fungal remediation, they utilize hazardous compounds, even the insoluble ones, as the nutrient source and convert them to simple fragmented forms. The aim of this chapter is to elucidate the ability of basidiomycetes to degrade xenobiotics. This is an overview to present the importance of extracellular enzymes for efficient bioremediation of a large variety of xenobiotics.
International Conference on Advanced Intelligent Systems and Informatics | 2016
Imen Touati; Marwa Graja; Mariem Ellouze; Lamia Hadrich Belguith
This paper presents an approach of fine-grained opinion categorization in Arabic news articles. This approach is based on lexical semantic analysis. We propose to categorize every opinion expression using a proposed typology of four top-level semantic categories: reporting, judgment, advice and sentiment. Each word or opinion expression will be annotated with a semantic representation which takes in consideration specificities of Arabic language. To the best of our knowledge, there is no annotated Arabic opinion corpus with the proposed semantic representation. The task of categorization is considered as a classification problem. So, we use a Conditional Random Fields (CRF) as a discriminative model that we consider as a good contribution, because of the lack of similar fine-grained opinion categorization performed with CRF. The obtained results show that the integration of CRF models is important for opinion classification of the Arabic language.
language resources and evaluation | 2018
Abir Masmoudi; Fethi Bougares; Mariem Ellouze; Yannick Estève; Lamia Hadrich Belguith
AbstractAlthough Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%.
Proceedings of the 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence | 2018
Imen Touati; Marwa Graja; Mariem Ellouze; Lamia Hadrich Belguith
Target identification is one of the important tasks related to opinion mining. Indeed, there are few works in this field that deals with Arabic Language because of the lack of annotated corpora. In this paper, we propose to investigate the problem of opinion target identification from Arabic news articles using Conditional Random Fields (CRF) as discriminative framework. Opinion target recognition task consists in determining terms forming the target span. To the best of our knowledge, there is no similar work done in this field for Arabic language and especially for news articles. Experiments show that we can perform excellent results with consideration of semantic correlation between words and without relying on deep syntactic features. Our proposed method identifies opinion target with 95% F-measure, for a given opinion word using bi-gram feature, words in context and other features.
conference of the international speech communication association | 2016
Abir Masmoudi; Mariem Ellouze; Fethi Bougares; Yannick Esètve; Lamia Hadrich Belguith
Conditional Random Fields (CRFs) represent an effective approach for monotone string-to-string translation tasks. In this work, we apply the CRF model to perform graphemeto-phoneme (G2P) conversion for the Tunisian Dialect. This choice is motivated by the fact that CRFs give a long term prediction and assume relaxed state independence conditions compared to HMMs [7]. The CRF model needs to be trained on a 1-to-1 alignement between graphemes and phonemes. Alignments are generated using Joint-Multigram Model (JMM) and GIZA++ toolkit. We trained CRF model for each generated alignment. We then compared our models to state-of-the-art G2P systems based on Sequitur G2P and Phonetisaurus toolkit. We also investigate the CRF prediction quality with different training size. Our results show that CRF perform slightly better using JMM alignment and outperform both Sequitur and Phonetisaurus systems with different training size. At the end, our system gets a phone error rate of 14.09%.
Proceedings of the Mediterranean Conference on Pattern Recognition and Artificial Intelligence | 2016
Imen Touati; Marwa Graja; Mariem Ellouze; Lamia Hadrich Belguith
Arabic opinion mining is a challenging task because Arabic is morphologically and semantically rich language. In this paper, we are interested in analyzing opinions in Arabic news articles. We propose to use a machine learning technique to classify opinions or sentiments at the expression level. Our approach involves determining the semantic category of the expression. It also includes the classification of the opinion expression into positive or negative and the classification of its intensity into high, medium and low. Our method relies on wide range of features which are used in the literature like n-grams, morphological, stylistic features, etc. In addition, we propose new features inspired from contextual, semantic information and others specific for Arabic language. In the same context, we try to have a good contribution in opinion mining in Arabic by proposing to use Conditional Random Fields as a discriminative model. We carry out many experiments by combining at the same time different set of features to find the best combination that yield the best results. We evaluate our method at the expression level using a corpus of Arabic news articles. Our method achieves a good result that reaches 84.93% for contextual polarity classification and 87.54% for semantic opinion expression categorization.