Mohamed Ali Hadj Taieb
University of Sousse
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mohamed Ali Hadj Taieb.
Knowledge and Information Systems | 2014
Mohamed Ali Hadj Taieb; Mohamed Ben Aouicha; Abdelmajid Ben Hamadou
Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.Computing semantic similarity/relatedness between concepts and words is an important issue of many research fields. Information theoretic approaches exploit the notion of Information Content (IC) that provides for a concept a better understanding of its semantics. In this paper, we present a complete IC metrics survey with a critical study. Then, we propose a new intrinsic IC computing method using taxonomical features extracted from an ontology for a particular concept. This approach quantifies the subgraph formed by the concept subsumers using the depth and the descendents count as taxonomical parameters. In a second part, we integrate this IC metric in a new parameterized multistrategy approach for measuring word semantic relatedness. This measure exploits the WordNet features such as the noun “is a” taxonomy, the nominalization relation allowing the use of verb “is a” taxonomy and the shared words (overlaps) in glosses. Our work has been evaluated and compared with related works using a wide set of benchmarks conceived for word semantic similarity/relatedness tasks. Obtained results show that our IC method and the new relatedness measure correlated better with human judgments than related works.
data and knowledge engineering | 2012
Mohamed Ali Hadj Taieb; Mohamed Ben Aouicha; Mohamed Tmar; Abdelmajid Ben Hamadou
Computing semantic relatedness is a key component of information retrieval tasks and natural processing language applications. Wikipedia provides a knowledge base for computing word relatedness with more coverage than WordNet. In this paper we use a new intrinsic information content (IC) metric with Wikipedia category graph (WCG) to measure the semantic relatedness between words. Indeed, we have developed a performed algorithm to extract the categories assigned to a given word from the WCG. Moreover, this extraction strategy is coupled with a new intrinsic information content metric based on the subgraph composed of hypernyms of a given concept. Also, we have developed a process to quantify the information content subgraph. When tested on common benchmark of similarity ratings the proposed approach shows a good correlation value compared to other computational models.
Applied Intelligence | 2016
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Abdelmajid Ben Hamadou
Computing the semantic similarity/relatedness between terms is an important research area for several disciplines, including artificial intelligence, cognitive science, linguistics, psychology, biomedicine and information retrieval. These measures exploit knowledge bases to express the semantics of concepts. Some approaches, such as the information theoretical approaches, rely on knowledge structure, while others, such as the gloss-based approaches, use knowledge content. Firstly, based on structure, we propose a new intrinsic Information Content (IC) computing method which is based on the quantification of the subgraph formed by the ancestors of the target concept. Taxonomic measures including the IC-based ones consume the topological parameters that must be extracted from taxonomies considered as Directed Acyclic Graphs (DAGs). Accordingly, we propose a routine of graph algorithms that are able to provide some basic parameters, such as depth, ancestors, descendents, Lowest Common Subsumer (LCS). The IC-computing method is assessed using several knowledge structures which are: the noun and verb WordNet “is a” taxonomies, Wikipedia Category Graph (WCG), and MeSH taxonomy. We also propose an aggregation schema that exploits the WordNet “is a” taxonomy and WCG in a complementary way through the IC-based measures to improve coverage capacity. Secondly, taking content into consideration, we propose a gloss-based semantic similarity measure that operates based on the noun weighting mechanism using our IC-computing method, as well as on the WordNet, Wiktionary and Wikipedia resources. Further evaluation is performed on various items, including nouns, verbs, multiword expressions and biomedical datasets, using well-recognized benchmarks. The results indicate an improvement in terms of similarity and relatedness assessment accuracy.
hybrid artificial intelligence systems | 2015
Mohamed Ali Hadj Taieb; Mohamed Ben Aouicha; Yosra Bourouis
The investigation of measuring Semantic Similarity (SS) between sentences is to find a method that can simulate the thinking process of human. In fact, it has become an important task in several applications including Artificial Intelligence and Natural Language Processing. Though this task depends strongly on word SS, the latter is not the only important feature. The current paper presents a new method for computing sentence semantic similarity by exploiting a set of its characteristics, namely Features-based Measure of Sentences Semantic Similarity (FM3S). The proposed method aggregates in a non-linear function between three components: the noun-based SS including the compound nouns, the verb-based SS using the tense information, and the common word order similarity. It measures the semantic similarity between concepts that play the same syntactic role. Concerning the word-based semantic similarity, an information content-based measure is used to estimate the SS degree between words by exploiting the WordNet “is a” taxonomy. The proposed method yielded into competitive results compared to previously proposed measures with regard to the Li’s benchmark, showing a high correlation with human ratings. Further experiments performed on the Microsoft Paraphrase Corpus showed the best F-measure values compared to other measures for high similarity thresholds. The results displayed by FM3S prove the importance of syntactic information, compound nouns, and verb tense in the process of computing sentence semantic similarity.
soft computing | 2018
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Abdelmajid Ben Hamadou
Semantic similarity and relatedness measures have increasingly become core elements in the recent research within the semantic technology community. Nowadays, the search for efficient meaning-centered applications that exploit computational semantics has become a necessity. Researchers, have therefore, become increasingly interested in the development of a model that can simulate the human thinking process and capable of measuring semantic similarity/relatedness between lexical terms, including concepts and words. Knowledge resources are fundamental to quantify semantic similarity or relatedness and achieve the best expression for the semantics content. No fully developed system that is able to centralize these approaches is currently available for the research and industrial communities. In this paper, we propose a System for Integrating Semantic Relatedness and similarity measures, SISR, which aims to provide a variety of tools for computing the semantic similarity and relatedness. This system is the first to treat the topic of computing semantic relatedness with a view of integrating different key stakeholders in a parameterized way. As an instance of the proposed architecture, we propose WNetSS which is a Java API allowing the use of a wide WordNet-based semantic similarity measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. It is the first API that allows the extraction of the topological parameters from the WordNet “is a” taxonomy which are used to express the semantics of concepts. Moreover, an evaluation module is proposed to assess the reproducibility of the measures accuracy that can be evaluated according to 10 widely used benchmarks through the correlations coefficients.Semantic similarity and relatedness measures have increasingly become core elements in the recent research within the semantic technology community. Nowadays, the search for efficient meaning-centered applications that exploit computational semantics has become a necessity. Researchers, have therefore, become increasingly interested in the development of a model that can simulate the human thinking process and capable of measuring semantic similarity/relatedness between lexical terms, including concepts and words. Knowledge resources are fundamental to quantify semantic similarity or relatedness and achieve the best expression for the semantics content. No fully developed system that is able to centralize these approaches is currently available for the research and industrial communities. In this paper, we propose a System for Integrating Semantic Relatedness and similarity measures, SISR, which aims to provide a variety of tools for computing the semantic similarity and relatedness. This system is the first to treat the topic of computing semantic relatedness with a view of integrating different key stakeholders in a parameterized way. As an instance of the proposed architecture, we propose WNetSS which is a Java API allowing the use of a wide WordNet-based semantic similarity measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. It is the first API that allows the extraction of the topological parameters from the WordNet “is a” taxonomy which are used to express the semantics of concepts. Moreover, an evaluation module is proposed to assess the reproducibility of the measures accuracy that can be evaluated according to 10 widely used benchmarks through the correlations coefficients.
computational intelligence and security | 2011
Mohamed Ali Hadj Taieb; Mohamed Ben Aouicha; Mohamed Tmar; Abdelmajid Ben Hamadou
Semantic similarity techniques are used to compute the semantic similarity (common shared information) between two concepts according to certain language or domain resources like ontologies, taxonomies, corpora, etc. Semantic similarity techniques constitute important components in most Information Retrieval (IR) and knowledge-based systems. Taking semantics into account passes by the use of external semantic resources coupled with the initial documentation on which it is necessary to have semantic similarity measurements to carry out comparisons between concepts. This paper presents a new approach for measuring semantic relatedness between words and concepts. It combines a new information content (IC) metric using the WordNet thesaurus and the nominalization relation provided by the Java WordNet Library (JWNL). Specifically, the proposed method offers a thorough use of the relation hypernym/hyponym (noun and verb “is a” taxonomy) without external corpus statistical information. Mainly, we use the subgraph formed by hypernyms of the concerned concept which inherits the whole features of its hypernyms and we quantify the contribution of each concept pertaining to this subgraph in its information content. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value 0.70 with a benchmark based on human similarity judgments and especially a large dataset composed of 260 Finkelstein word pairs (Appendix 1 and 2).
Neurocomputing | 2016
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Abdelmajid Ben Hamadou
The measurement of the semantic relatedness between words has gained increasing interest in several research fields, including cognitive science, artificial intelligence, biology, and linguistics. The development of efficient measures is based on knowledge resources, such as Wikipedia, a huge and living encyclopedia supplied by net surfers. In this paper, we propose a novel approach based on multi-Layered Wikipedia representation for Computing word Relatedness (LWCR) exploiting a weighting scheme based on Wikipedia Category Graph (WCG): Term Frequency-Inverse Category Frequency (tfxicf). Our proposal provides for each category pertaining to the WCG a Category Description Vector (CDV) including the weights of stems extracted from articles assigned to a category. The semantic relatedness degree is computed using the cosine measure between the CDVs assigned to the target words couple. The basic idea is followed by enhancement modules exploiting other Wikipedia features, such as article titles, redirection mechanism, and neighborhood category enrichment, to exploit semantic features and better quantify the semantic relatedness between words. To the best of our knowledge, this is the first attempt to incorporate the WCG-based term-weighting scheme (tfxicf) into computing model of semantic relatedness. It is also the first work that exploits 17 datasets in the assessment process, which are divided into two sets. The first set includes the ones designed for semantic similarity purposes: RG65, MC30, AG203, WP300, SimLexNoun666 and GeReSiD50Sim; the second includes datasets for semantic relatedness evaluation: WordSim353, GM30, Zeigler25, Zeigler30, MTurk287, MTurk771, MEN3000, Rel122, ReWord26, GeReSiD50 and SCWS1229. The found results are compared to WordNet-based measures and distributional measures cosine and PMI performed on Wikipedia articles. Experiments show that our approach provides consistent improvements over the state of the art results on multiple benchmarks.
international conference on computational collective intelligence | 2016
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Hania Ibn Marai
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine translation, information extraction because of its semantics understanding. This paper describes the proposed approach (WSD-TIC) which is based on the words surrounding the polysemous word in a context. Each meaning of these words is represented by a vector composed of weighted nouns using taxonomic information content. The main emphasis of this paper is feature selection for disambiguation purpose. The assessment of WSD systems is discussed in the context of the Senseval campaign, aiming at the objective evaluation of our proposal to the systems participating in several different disambiguation tasks.
international symposium on innovations in intelligent systems and applications | 2016
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Sameh Beyaoui
Quantifying the semantic relation between words is a key element in several applications including the treatments at the meaning level. A great variety of approaches are proposed in order to quantify the semantic proximity between concepts or words. These approaches exploit computational models including the hierarchical and textual information of the semantic resources. Among these models, the distributional approaches quantify the semantic relations based on the co-occurrence information according to the target words. In this paper, we study the distributional semantics of three resources: the collaborative resources Wiktionary and Wikipedia, and the thesaurus WordNet through the word relatedness task. We exploit the glosses of WordNet and Wiktionary as a corpus formed by short and precise words, and the contents of Wikipedia articles. The experiments are performed using the known measures PMI and cosine, and a list of known benchmarks in semantic relatedness task. The results show that a small corpus formed by well formed sentences can lead to good correlations but limited coverage capacity. Despite the improvement in coverage capacity using Wikipedia, the correlations between human judgments and computed values do not follow the same enhancement degree.
trans. computational collective intelligence | 2018
Mohamed Ben Aouicha; Mohamed Ali Hadj Taieb; Hania Ibn Marai
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered as a task whose solution is at least as hard as the most difficult problems in artificial intelligence. This is basically used in application like information retrieval, machine translation, information extraction because of its semantics understanding. This paper describes the proposed approach W3SD (This paper is an extended version of our work [4] published in the 8th International Conference on Computational Collective Intelligence.) which is based on the words surrounding the polysemous word in a context. Each meaning of these words is represented by a vector composed of weighted nouns using WordNet and Wiktionary features through the taxonomic information content from WordNet and the glosses from Wiktionary. The main emphasis of this paper is feature selection for disambiguation purpose. The assessment of WSD systems is discussed in the context of the Senseval campaign, aiming at the objective evaluation of our proposal to the systems participating in several different disambiguation tasks.