Fabrice Guillet
University of Nantes
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fabrice Guillet.
Archive | 2009
Fabrice Guillet
Data mining analyzes large amounts of data to discover knowledge relevant to decision making. Typically, numerous pieces of knowledge are extracted by a data mining system and presented to a human user, who may be a decision-maker or a data-analyst. The user is confronted with the task of selecting the pieces of knowledge that are of the highest quality or interest according to his or her preferences. Since this selection is sometimes a daunting task, designing quality and interestingness measures has become an important challenge for data mining researchers in the last decade. This volume presents the state of the art concerning quality and interestingness measures for data mining. The book summarizes recent developments and presents original research on this topic. The chapters include surveys, comparative studies of existing measures, proposals of new measures, simulations, and case studies. Both theoretical and applied chapters are included. Papers for this book were selected and reviewed for correctness and completeness by an international review committee.
IEEE Transactions on Knowledge and Data Engineering | 2010
Claudia Marinica; Fabrice Guillet
In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered rules. To overcome this drawback, several methods were proposed in the literature such as itemset concise representations, redundancy reduction, and postprocessing. However, being generally based on statistical information, most of these methods do not guarantee that the extracted rules are interesting for the user. Thus, it is crucial to help the decision-maker with an efficient postprocessing step in order to reduce the number of rules. This paper proposes a new interactive approach to prune and filter discovered rules. First, we propose to use ontologies in order to improve the integration of user knowledge in the postprocessing task. Second, we propose the Rule Schema formalism extending the specification language proposed by Liu et al. for user expectations. Furthermore, an interactive framework is designed to assist the user throughout the analyzing task. Applying our new approach over voluminous sets of rules, we were able, by integrating domain expert knowledge in the postprocessing step, to reduce the number of rules to several dozens or less. Moreover, the quality of the filtered rules was validated by the domain expert at various points in the interactive process.
conference on information and knowledge management | 2006
Jérôme David; Fabrice Guillet; Henri Briand
This paper presents a simple and adaptable matching method dealing with web directories, catalogs and OWL ontologies. By using a well-known Knowledge Discovery in Databases model, such as the association rule paradigm, this method has the originality to be both extensional and asymmetric. It works at the terminological level (by selecting concept-relevant terms contained in documents) and permits to discover equivalence and also subsumption relations holding between entities (concepts and properties). This method relies on the implication intensity measure, a probabilistic model of deviation from independence. Selection of significant rules between concepts (or properties) is lead by two criteria permitting to assess respectively the implication quality and the generativity of the rule. Finally, the proposed method is evaluated on two benchmarks. The first contains two conceptual hierarchies populated with textual documents and the second one is composed of OWL ontologies.
Knowledge and Information Systems | 2007
Julien Blanchard; Fabrice Guillet; Henri Briand
On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge post-processing is a difficult stage in an association rule discovery process. In order to find relevant knowledge for decision making, the user (a decision maker specialized in the data studied) needs to rummage through the rules. To assist him/her in this task, we here propose the rule-focusing methodology, an interactive methodology for the visual post-processing of association rules. It allows the user to explore large sets of rules freely by focusing his/her attention on limited subsets. This new approach relies on rule interestingness measures, on a visual representation, and on interactive navigation among the rules. We have implemented the rule-focusing methodology in a prototype system called ARVis. It exploits the users focus to guide the generation of the rules by means of a specific constraint-based rule-mining algorithm.
european conference on principles of data mining and knowledge discovery | 2000
Pascale Kuntz; Fabrice Guillet; Rémi Lehn; Henri Briand
This paper describes the components of a human-centered process for discovering association rules where the user is considered as a heuristic which drives the mining algorithms via a well-adapted interface. In this approach, inspired by experimental works on behaviors during a discovery stage, the rule extraction is dynamic : at each step, the user can focus on a subset of potentially interesting items and launch an algorithm for extracting the relevant associated rules according to statistical measures. The discovered rules are represented by a graph updated at each step, and the mining algorithm is an adaptation of the well-known A Priori algorithm where rules are computed locally. Experimental results on a real corpus built from marketing data illustrate the different steps of this process.
Quality Measures in Data Mining | 2007
Xuan-Hiep Huynh; Fabrice Guillet; Julien Blanchard; Pascale Kuntz; Henri Briand; Régis Gras
Finding interestingness measures to evaluate association rules has become an important knowledge quality issue in KDD. Many interestingness measures may be found in the literature, and many authors have discussed and compared interestingness properties in order to improve the choice of the most suitable measures for a given application. As interestingness depends both on the data structure and on the decision-makers goals, some measures may be relevant in some context, but not in others. Therefore, it is necessary to design new contextual approaches in order to help the decision-maker select the most suitable interestingness measures. In this paper, we present a new approach implemented by a new tool, ARQAT, for making comparisons. The approach is based on the analysis of a correlation graph presenting the clustering of objective interestingness measures and reflecting the post-processing of association rules. This graph-based clustering approach is used to compare and discuss the behavior of thirty-six interestingness measures on two prototypical and opposite datasets: a highly correlated one and a lowly correlated one. We focus on the discovery of the stable clusters obtained from the data analyzed between these thirty-six measures.
discovery science | 2005
Xuan-Hiep Huynh; Fabrice Guillet; Henri Briand
In recent years, the problem of finding the different aspects existing in a dataset has attracted many authors in the domain of knowledge quality in KDD. The discovery of knowledge in the form of association rules has become an important research. One of the most difficult issues is that an enormous number of association rules are discovered, so it is not easy to choose the best association rules or knowledge for a given dataset. Some methods are proposed for choosing the best rules with an interestingness measure or matching properties of interestingness measure for a given set of interestingness measures. In this paper, we propose a new approach to discover the clusters of interestingness measures existing in a dataset. Our approach is based on the evaluation of the distance computed between interestingness measures. We use two techniques: agglomerative hierarchical clustering (AHC) and partitioning around medoids (PAM) to help the user graphically evaluates the behavior of interestingness measures.
industrial and engineering applications of artificial intelligence and expert systems | 2006
Xuan-Hiep Huynh; Fabrice Guillet; Henri Briand
Making comparisons from the post-processing of association rules have become a research challenge in data mining. By evaluating interestingness value calculated from interestingness measures on association rules, a new approach based on the Pearson’s correlation coefficient is proposed to answer the question: How we can capture the stable behaviors of interestingness measures on different datasets?. In this paper, a correlation graph is used to evaluate the behavior of 36 interestingness measures on two datasets.
computational intelligence and data mining | 2009
Andrei Olaru; Claudia Marinica; Fabrice Guillet
One of the central problems in Knowledge Discovery in Databases, more precisely in the field of Association Rule Mining, relies on the very large number of rules that classic rule mining systems extract. This problem is usually solved by means of a post-processing step, that filters the entire volume of extracted rules, in order to output only a few potentially interesting ones. This article presents a new approach that allows the user to explore the rule space locally, incrementally, without the need to extract and post-process all rules in the database. This solution is based on Rule Schemas, a new formalism designed in order to improve the representation of user beliefs and expectations, and on a novel algorithm for local association rule mining starting from Schemas. The proposed algorithm has been successfully tested on the database provided by Nantes Habitat.
Int. Federation of Classification Societies, IFCS'2004 | 2004
Raphaël Couturier; Régis Gras; Fabrice Guillet
The interpretation of complex data with data mining techniques is often a difficult task. Nevertheless, this task may be simplified by reducing the variables which could be considered as equivalent. The aim of this paper is to describe a new method for reducing the number of variables in a large set of data. Implicative analysis, which builds association rules with a measure more powerful than conditional probability, is used to detect quasi-equivalent variables. This technique has some advantages over traditional similarity analysis.