Is this you? Create Your Porfile

Cyrille Masson

Institut national des sciences Appliquées de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Cyrille Masson is active.

Explore More

Publication

Featured researches published by Cyrille Masson.

Data Mining and Knowledge Discovery | 2005

Data Mining Query Languages

Jean-François Boulicaut; Cyrille Masson

Many Data Mining algorithms enable to extract different types of patterns from data (e.g., local patterns like itemsets and association rules, models like classifiers). To support the whole knowledge discovery process, we need for integrated systems which can deal either with patterns and data. The inductive database approach has emerged as an unifying framework for such systems. Following this database perspective, knowledge discovery processes become querying processes for which query languages have to be designed. In the prolific field of association rule mining, different proposals of query languages have been made to support the more or less declarative specification of both data and pattern manipulations. In this chapter, we survey some of these proposals. It enables to identify nowadays shortcomings and to point out some promising directions of research in this area.

Lecture Notes in Computer Science | 2004

Query languages supporting descriptive rule mining: A comparative study

Marco Botta; Jean-François Boulicaut; Cyrille Masson; Rosa Meo

Recently, inductive databases (IDBs) have been proposed to tackle the problem of knowledge discovery from huge databases. With an IDB, the user/analyst performs a set of very different operations on data using a query language, powerful enough to support all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. We provide a comparison between three query languages (MSQL, DMQL and MINE RULE) that have been proposed for descriptive rule mining and discuss their common features and differences. These query languages look like extensions of SQL. We present them using a set of examples, taken from the real practice of rule mining. In the paper we discuss also OLE DB for Data Mining and Predictive Model Markup Language, two recent proposals that like the first three query languages respectively provide native support to data mining primitives and provide a description in a standard language of statistical and data mining models.

data warehousing and knowledge discovery | 2002

A Comparison between Query Languages for the Extraction of Association Rules

Marco Botta; Jean-François Boulicaut; Cyrille Masson; Rosa Meo

Recently inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a special-purpose language, powerful enough to perform all the required manipulations, such as data preprocessing, pattern discovery and pattern post-processing. In this paper we present a comparison between query languages (MSQL, DMQL and MINE RULE) that have been proposed for association rules extraction in the last years and discuss their common features and differences. We present them using a set of examples, taken from the real practice of data mining. This allows us to define the language design guidelines, with particular attention to the open issues on IDBs.

intelligent data engineering and automated learning | 2002

Mining Frequent Sequential Patterns under a Similarity Constraint

Matthieu Capelle; Cyrille Masson; Jean-François Boulicaut

Many practical applications are related to frequent sequential pattern mining, ranging from Web Usage Mining to Bioinformatics. To ensure an appropriate extraction cost for useful mining tasks, a key issue is to push the user-defined constraints deep inside the mining algorithms. In this paper, we study the search for frequent sequential patterns that are also similar to an user-defined reference pattern. While the effective processing of the frequency constraints is well-understood, our contribution concerns the identification of a relaxation of the similarity constraint into a convertible anti-monotone constraint. Both constraints are then used to prune the search space during a levelwise search. Preliminary experimental validations have confirmed the algorithm efficiency.

inductive logic programming | 2002

Mining frequent logical sequences with SPIRIT-LoG

Cyrille Masson; François Jacquenet

Sequence mining is an active research field of data mining because algorithms designed in that domain lead to various valuable applications. To increase efficiency of basic sequence mining algorithms, generally based on a levelwise approach, more recent algorithms try to introduce some constraints to prune the search space during the discovery process. Nevertheless, existing algorithms are actually limited to extract frequent sequences made up of items of a database. In this paper, we generalize the notion of sequence to define what we call logical sequence where each element of a sequence may contain some logical variables. Then we show how we can extend constrained sequence mining to constrained frequent logical sequence mining.

data warehousing and knowledge discovery | 2003

Comprehensive Log Compression with Frequent Patterns

Kimmo Hätönen; Jean-François Boulicaut; Mika Klemettinen; Markus Miettinen; Cyrille Masson

In this paper we present a comprehensive log compression (CLC) method that uses frequent patterns and their condensed representations to identify repetitive information from large log files generated by communications networks. We also show how the identified information can be used to separate and filter out frequently occurring events that hide other, unique or only a few times occurring events. The identification can be done without any prior knowledge about the domain or the events. For example, no pre-defined patterns or value combinations are needed. This separation makes it easier for a human observer to perceive and analyse large amounts of log data. The applicability of the CLC method is demonstrated with real-world examples from data communication networks.

acm symposium on applied computing | 2004

Optimizing subset queries: a step towards SQL-based inductive databases for itemsets

Cyrille Masson; Céline Robardet; Jean-François Boulicaut

Storing sets and querying them (e.g., subset queries that provide all supersets of a given set) is known to be difficult within relational databases. We consider that being able to query efficiently both transactional data and materialized collections of sets by means of standard query language is an important step towards practical inductive databases. Indeed, data mining query languages like MINE RULE extract collections of association rules whose components are sets into relational tables. Post-processing phases often use extensively subset queries and cannot be efficiently processed by SQL servers. In this paper, we propose a new way to handle sets from relational databases. It is based on a data structure that partially encodes the inclusion relationship between sets. It is an extension of the hash group bitmap key proposed by Morzy et al. [8]. Our experiments show an interesting improvement for these useful subset queries.

Archive | 2003