Maher Jaoua
University of Sfax
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Maher Jaoua.
international conference on computational linguistics | 2003
Maher Jaoua; Abdelmajid Ben Hamadou
We propose in this paper a summarization method that creates indicative summaries from scientific papers. Unlike conventional methods that extract important sentences, our method considers the extract as the minimal unit for extraction and uses two steps: the generation and the classification. The first step combines text sentences to produce a population of extracts. The second step evaluates each extract using global criteria in order to select the best one. In this case, the criteria are defined according to the whole extract rather than sentences. We have developed a prototype of the summarization system for French language called ExtraGen that implements a genetic algorithm simulating the mechanism of generation and classification.
international conference on computational linguistics | 2013
Inès Zribi; Marwa Graja; Mariem Ellouze Khmekhem; Maher Jaoua; Lamia Hadrich Belguith
Transcribing spoken Arabic dialects is an important task for building speech corpora. Therefore, it is necessary to follow a definite orthography and a definite annotation to transcribe speech data. In this paper, we present OTTA, Orthographic Transcription for Tunisian Arabic. This convention proposes the use of some rules based on the standard Arabic transcription conventions and we define a set of conventions which preserve the particularities of Tunisian dialect.
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing | 2013
Marwa Graja; Maher Jaoua; Lamia Hadrich Belguith
In this paper, we propose to evaluate the performance of a discriminative model to semantically label spoken Tunisian dialect turns which are not segmented into utterances. We evaluate discriminative algorithm based on Conditional Random Fields (CRF). We check the performance of the CRF model to concept labeling on raw data in Tunisian dialect which are not analyzed in advance. We compared its performance with different types of preprocessing data until arriving to well treated data. CRF model showed the ability to ameliorate the accuracy of labeling task for spoken language understanding of not segmented and not treated speech in Tunisian dialect.
IEEE Transactions on Audio, Speech, and Language Processing | 2015
Marwa Graja; Maher Jaoua; L. Hadrich Belguith
In this paper, we propose a hybrid method for the spoken Tunisian dialect understanding within a limited task. This method couples a discriminative statistical method with a domain ontology. The statistical method is based on conditional random field (CRF) models learned from a little size corpus to perform conceptual labeling task. These models are able to detect the semantic dependency between words. However, the domain ontology is used to add prior knowledge about the task. Our experiments are based on a real spoken Tunisian dialect corpus. The obtained results show that the proposed method is able to improve the performance of CRF models for speech understanding by the integration of the domain ontology. Our method can be exploited for under-resourced languages and Arabic dialects to overcome the lack of linguistic resources .
international conference on neural information processing | 2011
Marwa Graja; Maher Jaoua; Lamia Hadrich Belguith
This paper presents a method for semantic interpretation designed for Tunisian dialect. Our method is based on lexical semantics to overcome the lack of resources for the studied dialect. This method is Ontology-based which allows exploiting the ontological concepts for semantic annotation and ontological relations for interpretation. This combination reduces inaccuracies and increases the rate of comprehension. This paper also details the process of building the Ontology used for annotation and interpretation of Tunisian dialect utterances in the context of speech understanding in dialogue systems.
Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres | 2017
Samira Ellouze; Maher Jaoua; Lamia Hadrich Belguith
The present paper introduces a new Multiling text summary evaluation method. This method relies on machine learning approach which operates by combining multiple features to build models that predict the human score (overall responsiveness) of a new summary. We have tried several single and “ensemble learning” classiers to build the best model. We have experimented our method in summary level evaluation where we evaluate the quality of each text summary separately. The correlation between built models and human score is better than the correlation between the baselines and the manual score.
applications of natural language to data bases | 2016
Samira Ellouze; Maher Jaoua; Lamia Hadrich Belguith
The Evaluation of a summary’s linguistic quality is a difficult task because several linguistic aspects (e.g. grammaticality, coherence, etc.) must be verified to ensure the well formedness of a text’s summary. In this paper, we report the result of combining “Adapted ROUGE” scores and linguistic quality features to assess linguistic quality. We build and evaluate models for predicting the manual linguistic quality score using linear regression. We construct models for evaluating the quality of each text summary (summary level evaluation) and of each summarizing system (system level evaluation). We assess the performance of a summarizing system using the quality of a set of summaries generated by the system. All models are evaluated using the Pearson correlation and the Root mean squared error.
Document numérique | 2012
Maher Jaoua; Fatma Kallel Jaoua; Lamia Hadrich Belguith; Abdelmajid Ben Hamadou
Dans cet article, nous proposons une evaluation de l’impact de l’integration des etapes de compression et de filtrage dans la chaine de resume automatique. Cette evaluation se base sur un certain nombre d’experiences que nous avons menees sur des sous-corpus dissemines lors la conference DUC-TAC. Afin de mener ces experiences, nous avons adopte une methode d’extraction qui considere le processus de resume comme etant un probleme d’optimisation ou il s’agit d’en determiner la meilleure partition qui repond a des criteres predetermines. Les resultats obtenus montrent l’importance de l’integration des etapes de filtrage et de compression.
applications of natural language to data bases | 2016
Houssem Safi; Maher Jaoua; Lamia Hadrich Belguith
The work presented in this paper aims at developing a Personalized Information Retrieval system in Arabic Texts (“PIRAT”) based on the user’s preferences/interests. For this reason, we proposed a user’s modeling and a personalized matching method document-query. The proposed user’s modeling is based on a hybrid representation of the user profile. In this approach, we introduce an algorithm which automatically builds a hierarchical user profile that represents his implicit personal interests and domain. It is to represent the interests and the domain with a conceptual network of nodes linked together through relationships respecting the linking topology defined in the domain of hierarchies and ontologies (hyperonymy, hyponymy, and synonymy). Then, we address the problem of unavailable language resources by building (i) a large Arabic text corpus entitled “WCAT” and (ii) Building our own Arabic queries corpus entitled “AQC2” in order to evaluate the suggested PIRAT system and AXON system. The results of this evaluation are promising.
international conference on computer and electrical engineering | 2009
Jaoua Kallel Fatma; Lamia Hadrich Belguith; Maher Jaoua; Abdelmajid Ben Hamadou
In this paper, we compare two strategies for the integration of a compression module in the automatic summarization chain. The first strategy, that we call precompression uses sentence compression in the first stage of summarization by producing all reduced forms of original sentences. The second strategy, called post-compression, reduces extract’s sentences in order to generate the final extract. The experiment results are presented on a document set extracted from the DUC’04 evaluation conference.