Engelbert Mephu Nguifo
Blaise Pascal University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Engelbert Mephu Nguifo.
mexican international conference on artificial intelligence | 2008
Philippe Fournier-Viger; Roger Nkambou; Engelbert Mephu Nguifo
Domain experts should provide relevant domain knowledge to an Intelligent Tutoring System (ITS) so that it can guide a learner during problem-solving learning activities. However, for many ill-defined domains, the domain knowledge is hard to define explicitly. In previous works, we showed how sequential pattern mining can be used to extract a partial problem space from logged user interactions, and how it can support tutoring services during problem-solving exercises. This article describes an extension of this approach to extract a problem space that is richer and more adapted for supporting tutoring services. We combined sequential pattern mining with (1) dimensional pattern mining (2) time intervals, (3) the automatic clustering of valued actions and (4) closed sequences mining. Some tutoring services have been implemented and an experiment has been conducted in a tutoring system.
BMC Bioinformatics | 2010
Rabie Saidi; Mondher Maddouri; Engelbert Mephu Nguifo
BackgroundThis paper deals with the preprocessing of protein sequences for supervised classification. Motif extraction is one way to address that task. It has been largely used to encode biological sequences into feature vectors to enable using well-known machine-learning classifiers which require this format. However, designing a suitable feature space, for a set of proteins, is not a trivial task. For this purpose, we propose a novel encoding method that uses amino-acid substitution matrices to define similarity between motifs during the extraction step.ResultsIn order to demonstrate the efficiency of such approach, we compare several encoding methods using some machine learning classifiers. The experimental results showed that our encoding method outperforms other ones in terms of classification accuracy and number of generated attributes. We also compared the classifiers in term of accuracy. Results indicated that SVM generally outperforms the other classifiers with any encoding method. We showed that SVM, coupled with our encoding method, can be an efficient protein classification system. In addition, we studied the effect of the substitution matrices variation on the quality of our method and hence on the classification quality. We noticed that our method enables good classification accuracies with all the substitution matrices and that the variances of the obtained accuracies using various substitution matrices are slight. However, the number of generated features varies from a substitution matrix to another. Furthermore, the use of already published datasets allowed us to carry out a comparison with several related works.ConclusionsThe outcomes of our comparative experiments confirm the efficiency of our encoding method to represent protein sequences in classification tasks.
international conference on formal concept analysis | 2004
Huaiguo Fu; Engelbert Mephu Nguifo
One of the most effective methods to deal with large data for data analysis and data mining is to develop parallel algorithm. Although Formal concept analysis is an effective tool for data analysis and knowledge discovery, it’s very hard for concept lattice structures to face the complexity of very large data. So we propose a new parallel algorithm based on the NextClosure algorithm to generate formal concepts for large data.
intelligent tutoring systems | 2010
Philippe Fournier-Viger; Roger Nkambou; Engelbert Mephu Nguifo
Domains in which traditional approaches for building tutoring systems are not applicable or do not work well have been termed “ill-defined domains.” This chapter provides an updated overview of the problems and solutions for building intelligent tutoring systems for these domains. It adopts a presentation based on the following three complementary and important perspectives: the characteristics of ill-defined domains, the approaches to represent and reason with domain knowledge in these domains, and suitable teaching models. Numerous examples are given throughout the chapter to illustrate the discussion.
Information Systems | 2015
Sabeur Aridhi; Laurent d'Orazio; Mondher Maddouri; Engelbert Mephu Nguifo
Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is an urgent need of efficient and scaling approaches for frequent subgraph discovery. In this paper, we propose a novel approach for large-scale subgraph mining by means of a density-based partitioning technique, using the MapReduce framework. Our partitioning aims to balance computational load on a collection of machines. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.Recently, graph mining approaches have become very popular, especially in domains such as bioinformatics, chemoinformatics and social networks. In this scope, one of the most challenging tasks is frequent subgraph discovery. This task has been motivated by the tremendously increasing size of existing graph databases. Since then, an important problem of designing efficient and scaling approaches for frequent subgraph discovery in large clusters, has taken place. However, failures are a norm rather than being an exception in large clusters. In this context, the MapReduce framework was designed so that node failures are automatically handled by the framework. In this paper, we propose a large-scale and fault-tolerant approach of subgraph mining by means of a density-based partitioning technique, using MapReduce. Our partitioning aims to balance computation load on a collection of machines. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.
international conference on formal concept analysis | 2004
Huaiyu Fu; Huaiguo Fu; Patrick Njiwoua; Engelbert Mephu Nguifo
Several FCA-based classification algorithms have been proposed, such as GRAND, LEGAL, GALOIS, RULEARNER, CIBLe, and CLNN & CLNB. These classifiers have been compared to standard classification algorithms such as C4.5, Naive Bayes or IB1. They have never been compared each other in the same platform, except between LEGAL and CIBLe. Here we compare them together both theoretically and experimentally, and also with the standard machine learning algorithm C4.5. Experimental results are discussed.
ieee international conference on information visualization | 2007
Olivier Couturier; T. Hamrouni; S. Ben Yahia; Engelbert Mephu Nguifo
Providing efficient and easy-to-use graphical tools to users is a promising challenge of data mining (DM). These tools must be able to generate explicit knowledge and to restitute it. Visualization techniques have shown to be an efficient solution to achieve such goal. Even though considered as a key step in the mining process, the visualization step of association rules received much less attention than that paid to the extraction one. Nevertheless, some graphical tools have been developed to extract and visualize association rules. In those tools, various approaches are proposed to filter the huge number of association rules before the visualization step. However both DM steps (association rule extraction and visualization) are treated separately in a one way process. Our approach differs, and uses meta-knowledge to guide the user during the mining process. Standing at the crossroads of DM and Human-Computer Interaction (HCI), we present an integrated framework covering both steps of the DM process. Furthermore, our approach can easily integrate previous techniques of association rule visualization.
international conference on tools with artificial intelligence | 2012
Slim Bouker; Rabie Saidi; Sadok Ben Yahia; Engelbert Mephu Nguifo
The huge number of association rules represent the main hamper that a decision maker faces. In order to bypass this hamper, an efficient selection of rules has to be performed. Since selection is necessarily based on evaluation, many interestingness measures have been proposed. However, the abundance of these measures gave rise to a new problem, namely the heterogeneity of the evaluation results and this created confusion to the decision. In this respect, we propose a novel approach to discover interesting association rules without favoring or excluding any measure by adopting the notion of dominance between association rules. Our approach bypasses the problem of measure heterogeneity and unveils a compromise between their evaluations. Interestingly enough, the proposed approach also avoids another non-trivial problem which is the threshold value specification.
International Journal on Artificial Intelligence Tools | 2014
Slim Bouker; Rabie Saidi; Sadok Ben Yahia; Engelbert Mephu Nguifo
The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. In order to bypass this hamper, an efficient selection of rules has to be performed. Since selection is necessarily based on evaluation, many interestingness measures have been proposed. However, the abundance of these measures gave rise to a new problem, namely the heterogeneity of the evaluation results and this created confusion to the decision. In this respect, we propose a novel approach to discover interesting association rules without favoring or excluding any measure by adopting the notion of dominance between association rules. Our approach bypasses the problem of measure heterogeneity and unveils a compromise between their evaluations. Interestingly enough, the proposed approach also avoids another non-trivial problem which is the threshold value specification. Extensive carried out experiments on benchmark datasets show the benefits of the introduced approach.
International Journal of Foundations of Computer Science | 2008
Tarek Hamrouni; Sadok Ben Yahia; Engelbert Mephu Nguifo
In data mining applications, highly sized contexts are handled what usually results in a considerably large set of frequent itemsets, even for high values of the minimum support threshold. An interesting solution consists then in applying an appropriate closure operator that structures frequent itemsets into equivalence classes, such that two itemsets belong to the same class if they appear in the same sets of objects. Among equivalent itemsets, minimal elements (w.r.t. the number of items) are called minimal generators (MGs), while their associated closure is called closed itemset (CI), and is the largest one within the corresponding equivalence class. Thus, the pairs - composed by MGs and their associated CIs - make easier localizing each itemset since it is necessarily encompassed by an MG and an CI. In addition, they offer informative implication/association rules, with minimal premises and maximal conclusions, which losslessly represent the entire rule set. These important concepts - MG and CI - were hence at the origin of various works. Nevertheless, the inherent absence of a unique MG associated to a given CI leads to an intra-class combinatorial redundancy that leads an exhaustive storage and impractical use. This motivated an in-depth study towards a lossless reduction of this redundancy. This study was started by Dong et al. who introduced the succinct system of minimal generators (SSMG) as an attempt to eliminate the redundancy within this set. In this paper, we give a thorough study of the SSMG as formerly defined by Dong et al. This system will be shown to suffer from some flaws. As a remedy, we introduce a new lossless reduction of the MG set allowing to overcome its limitations. The new SSMG will then be incorporated into the framework of generic bases of association rules. This makes it possible to only maintain succinct and informative rules. After that, we give a thorough formal study of the related inference mechanisms allowing to derive all redundant association rules, starting from the maintained ones. Finally, an experimental evaluation shows the utility of our approach towards eliminating important rate of redundant information.