Is this you? Create Your Porfile

Artur Bykowski

Institut national des sciences Appliquées de Lyon

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Artur Bykowski is active.

Explore More

Publication

Featured researches published by Artur Bykowski.

symposium on principles of database systems | 2001

A condensed representation to find frequent patterns

Artur Bykowski; Christophe Rigotti

Given a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-free sets, instead of extracting the whole frequent pattern collection. We show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies. Moreover, this regeneration can be performed without any access to the original data. Practical experiments show that this representation can be extracted very efficiently even in difficult cases. We compared it with another representation of frequent patterns previously investigated in the literature called frequent closed sets. In nearly all experiments we have run, the disjunction-free sets have been extracted much more efficiently than frequent closed sets.

european conference on principles of data mining and knowledge discovery | 2000

Approximation of Frequency Queris by Means of Free-Sets

Jean-François Boulicaut; Artur Bykowski; Christophe Rigotti

Given a large collection of transactions containing items, a basic common data mining problem is to extract the so-called frequent itemsets (i.e., set of items appearing in at least a given number of transactions). In this paper, we propose a structure called free-sets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of Ɛ-adequate representation [10]. We show that frequent free-sets can be efficiently extracted using pruning strategies developed for frequent item-set discovery, and that they can be used to approximate the support of any frequent itemset. Experiments run on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemsets extraction. Furthermore, the experiments show that the extraction of frequent free-sets is still possible when the extraction of frequent itemsets becomes intractable. Finally, we show that the error made when approximating frequent itemset support remains very low in practice.

pacific asia conference on knowledge discovery and data mining | 2000

Frequent Closures as a Concise Representation for Binary Data Mining

Jean-François Boulicaut; Artur Bykowski

Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequent sets and frequent closures. N. Pasquier and colleagues designed the close algorithm that provides frequent sets via the discovery of frequent closures. When one mines highly correlated data, apriori-based algorithms clearly fail while close remains tractable. We discuss our implementation of close and the experimental evidence we got from two real-life binary data mining processes. Then, we introduce the concept of almost-closure (generation of every frequent set from frequent almost-closures remains possible but with a bounded error on frequency). To the best of our knowledge, this is a new concept and, here again, we provide some experimental evidence of its add-value.

Information Systems | 2003

DBC: a condensed representation of frequent patterns for efficient mining

Artur Bykowski; Christophe Rigotti

Given a large set of data, a common data mining problem is to extract the frequent patterns occurring in this set. The idea presented in this paper is to extract a condensed representation of the frequent patterns called disjunction-bordered condensation (DBC), instead of extracting the whole frequent pattern collection. We show that this condensed representation can be used to regenerate all frequent patterns and their exact frequencies. Moreover, this regeneration can be performed without any access to the original data. Practical experiments show that the DBCcan be extracted very efficiently even in difficult cases and that this extraction and the regeneration of the frequent patterns is much more efficient than the direct extraction of the frequent patterns themselves. We compared the DBC with another representation of frequent patterns previously investigated in the literature called frequent closed sets. In nearly all experiments we have run, the DBC have been extracted much more efficiently than frequent closed sets. In the other cases, the extraction times are very close.

flexible query answering systems | 2001

Towards the Tractable Discovery of Association Rules with Negations

Jean-François Boulicaut; Artur Bykowski; Baptiste Jeudy

Frequent association rules (e.g., A∧B⇒C to say that when properties A and B are true in a record then, C tends to be also true) have become a popular way to summarize huge datasets. The last 5 years, there has been a lot of research on association rule mining and more precisely, the tractable discovery of interesting rules among the frequent ones. We consider now the problem of mining association rules that may involve negations e.g., A∧B⇒⌝C or ⌝A∧B⇒C. Mining such rules is difficult and remains an open problem. We identify several possibilities for a tractable approach in practical cases. Among others, we discuss the active use of constraints. We propose a generic algorithm and discuss the use of constraints to mine the generalized sets from which rules with negations can be derived.

Lecture Notes in Computer Science | 2004

Model-Independent Bounding of the Supports of Boolean Formulae in Binary Data

Artur Bykowski; Jouni K. Seppänen; Jaakko Hollmén

Data mining algorithms such as the Apriori method for finding frequent sets in sparse binary data can be used for efficient computation of a large number of summaries from huge data sets. The collection of frequent sets gives a collection of marginal frequencies about the underlying data set. Sometimes, we would like to use a collection of such marginal frequencies instead of the entire data set (e.g. when the original data is inaccessible for confidentiality reasons) to compute other interesting summaries. Using combinatorial arguments, we may obtain tight upper and lower bounds on the values of inferred summaries. In this paper, we consider a class of summaries wider than frequent sets, namely that of frequencies of arbitrary Boolean formulae. Given frequencies of a number of any different Boolean formulae, we consider the problem of finding tight bounds on the frequency of another arbitrary formula. We give a general formulation of the problem of bounding formula frequencies given some background information, and show how the bounds can be obtained by solving a linear programming problem. We illustrate the accuracy of the bounds by giving empirical results on real data sets.

Lecture Notes in Computer Science | 2004

Integrity Constraints over Association Rules

Artur Bykowski; Thomas Daurel; Nicolas Méger; Christophe Rigotti

In this paper, we propose to investigate the notion of integrity constraints in inductive databases. We advocate that integrity constraints can be used in this context as an abstract concept to encompass common data mining tasks such as the detection of corrupted data or of patterns that contradict the expert beliefs. To illustrate this possibility we propose a form of constraints called association map constraints to specify authorized confidence variations among the association rules. These constraints are easy to read and thus can be used to write clear specifications. We also present experiments showing that their satisfaction can be tested in practice.

Data Mining and Knowledge Discovery | 2003