Jean-François Boulicaut

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jean-François Boulicaut is active.

Explore More

Publication

Featured researches published by Jean-François Boulicaut.

Archive | 2004

Machine Learning, ECML 2004

Jean-François Boulicaut; Floriana Esposito; Fosca Giannotti; Dino Pedreschi

We show how carefully crafted random matrices can achieve distance-preserving dimensionality reduction, accelerate spectral computations, and reduce the sample complexity of certain kernel methods.

Archive | 2004

Knowledge Discovery in Databases: PKDD 2004

Jean-François Boulicaut; Floriana Esposito; Fosca Giannotti; Dino Pedreschi

We show how carefully crafted random matrices can achieve distance-preserving dimensionality reduction, accelerate spectral computations, and reduce the sample complexity of certain kernel methods.

Genome Biology | 2002

Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data

Céline Becquet; Sylvain Blachon; Baptiste Jeudy; Jean-François Boulicaut; Olivier Gandrillon

BackgroundThe association-rules discovery (ARD) technique has yet to be applied to gene-expression data analysis. Even in the absence of previous biological knowledge, it should identify sets of genes whose expression is correlated. The first association-rule miners appeared six years ago and proved efficient at dealing with sparse and weakly correlated data. A huge international research effort has led to new algorithms for tackling difficult contexts and these are particularly suited to analysis of large gene-expression matrices. To validate the ARD technique we have applied it to freely available human serial analysis of gene expression (SAGE) data.ResultsThe approach described here enables us to designate sets of strong association rules. We normalized the SAGE data before applying our association rule miner. Depending on the discretization algorithm used, different properties of the data were highlighted. Both common and specific interpretations could be made from the extracted rules. In each and every case the extracted collections of rules indicated that a very strong co-regulation of mRNA encoding ribosomal proteins occurs in the dataset. Several rules associating proteins involved in signal transduction were obtained and analyzed, some pointing to yet-unexplored directions. Furthermore, by examining a subset of these rules, we were able both to reassign a wrongly labeled tag, and to propose a function for an expressed sequence tag encoding a protein of unknown function.ConclusionsWe show that ARD is a promising technique that turns out to be complementary to existing gene-expression clustering techniques.

european conference on principles of data mining and knowledge discovery | 2000

Approximation of Frequency Queris by Means of Free-Sets

Jean-François Boulicaut; Artur Bykowski; Christophe Rigotti

Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of Ɛ-adequate representation [10]. We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent item-set discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice.

ACM Transactions on Knowledge Discovery From Data | 2009

Closed patterns meet n -ary relations

Loïc Cerf; Jérémy Besson; Céline Robardet; Jean-François Boulicaut

Set pattern discovery from binary relations has been extensively studied during the last decade. In particular, many complete and efficient algorithms for frequent closed set mining are now available. Generalizing such a task to n-ary relations (n ≥ 2) appears as a timely challenge. It may be important for many applications, for example, when adding the time dimension to the popular objects × features binary case. The generality of the task (no assumption being made on the relation arity or on the size of its attribute domains) makes it computationally challenging. We introduce an algorithm called Data-Peeler. From an n-ary relation, it extracts all closed n-sets satisfying given piecewise (anti) monotonic constraints. This new class of constraints generalizes both monotonic and antimonotonic constraints. Considering the special case of ternary relations, Data-Peeler outperforms the state-of-the-art algorithms CubeMiner and Trias by orders of magnitude. These good performances must be granted to a new clever enumeration strategy allowing to efficiently enforce the closeness property. The relevance of the extracted closed n-sets is assessed on real-life 3-and 4-ary relations. Beyond natural 3-or 4-ary relations, expanding a relation with an additional attribute can help in enforcing rather abstract constraints such as the robustness with respect to binarization. Furthermore, a collection of closed n-sets is shown to be an excellent starting point to compute a tiling of the dataset.

Lecture Notes in Computer Science | 2006

Knowledge Discovery in Inductive Databases

Francesco Bonchi; Jean-François Boulicaut

Invited Papers.- Data Mining in Inductive Databases.- Mining Databases and Data Streams with Query Languages and Rules.- Contributed Papers.- Memory-Aware Frequent k-Itemset Mining.- Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data.- Experiment Databases: A Novel Methodology for Experimental Research.- Quick Inclusion-Exclusion.- Towards Mining Frequent Queries in Star Schemes.- Inductive Databases in the Relational Model: The Data as the Bridge.- Transaction Databases, Frequent Itemsets, and Their Condensed Representations.- Multi-class Correlated Pattern Mining.- Shaping SQL-Based Frequent Pattern Mining Algorithms.- Exploiting Virtual Patterns for Automatically Pruning the Search Space.- Constraint Based Induction of Multi-objective Regression Trees.- Learning Predictive Clustering Rules.

pacific asia conference on knowledge discovery and data mining | 2000

Frequent Closures as a Concise Representation for Binary Data Mining

Jean-François Boulicaut; Artur Bykowski

Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequent sets and frequent closures. N. Pasquier and colleagues designed the close algorithm that provides frequent sets via the discovery of frequent closures. When one mines highly correlated data, apriori-based algorithms clearly fail while close remains tractable. We discuss our implementation of close and the experimental evidence we got from two real-life binary data mining processes. Then, we introduce the concept of almost-closure (generation of every frequent set from frequent almost-closures remains possible but with a bounded error on frequency). To the best of our knowledge, this is a new concept and, here again, we provide some experimental evidence of its add-value.

international conference on entity relationship approach | 1994

Using Queries to Improve Database Reverse Engineering

Jean-Marc Petit; Jacques Kouloumdjian; Jean-François Boulicaut; Farouk Toumani

This paper describes a technique that supports Extended Entity-Relationship (EER) schema extraction from an operating relational database. In this reverse engineering context, the two major decisions that have to be taken are the assumptions on the initial schema and where data semantic is extracted from. Original aspects of our method are manifold. First, it is based on realistic assumptions, e.g., there is no constraints on the uniqueness of the attribute names. Second, the dependencies between the attributes are not supposed to be known a priori. The method starts from the database schema as stored in the DBMS dictionary, i.e., the relation names, the attribute names and their basic characteristics (uniqueness of value, not null values). Finally, semantics extraction is supported by available queries analysis. It is shown how specific kinds of query can help to build an EER schema including is-a relationships and aggregates.

Archive | 2006

Constraint-based mining and inductive databases

Jean-François Boulicaut; Luc De Raedt; Heikki Mannila

The Hows, Whys, and Whens of Constraints in Itemset and Rule Discovery.- A Relational Query Primitive for Constraint-Based Pattern Mining.- To See the Wood for the Trees: Mining Frequent Tree Patterns.- A Survey on Condensed Representations for Frequent Sets.- Adaptive Strategies for Mining the Positive Border of Interesting Patterns: Application to Inclusion Dependencies in Databases.- Computation of Mining Queries: An Algebraic Approach.- Inductive Queries on Polynomial Equations.- Mining Constrained Graphs: The Case of Workflow Systems.- CrossMine: Efficient Classification Across Multiple Database Relations.- Remarks on the Industrial Application of Inductive Database Technologies.- How to Quickly Find a Witness.- Relevancy in Constraint-Based Subgroup Discovery.- A Novel Incremental Approach to Association Rules Mining in Inductive Databases.- Employing Inductive Databases in Concrete Applications.- Contribution to Gene Expression Data Analysis by Means of Set Pattern Mining.- Boolean Formulas and Frequent Sets.- Generic Pattern Mining Via Data Mining Template Library.- Inductive Querying for Discovering Subgroups and Clusters.

Data Mining and Knowledge Discovery | 2005

Constraint-based Data Mining

Jean-François Boulicaut; Baptiste Jeudy

Knowledge Discovery in Databases (KDD) is a complex interactive process. The promising theoretical framework of inductive databases considers this is essentially a querying process. It is enabled by a query language which can deal either with raw data or patterns which hold in the data. Mining patterns turns to be the so-called inductive query evaluation process for which constraint-based Data Mining techniques have to be designed. An inductive query specifies declara-tively the desired constraints and algorithms are used to compute the patterns satisfying the constraints in the data. We survey important results of this active research domain. This chapter emphasizes a real breakthrough for hard problems concerning local pattern mining under various constraints and it points out the current directions of research as well.

Explore More