Fabien De Marchi
University of Lyon
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Fabien De Marchi.
advanced information networking and applications | 2009
Florian Daniel; Fabio Casati; Vincenzo D'Andrea; Emmanuel Mulo; Uwe Zdun; Schahram Dustdar; Steve Strauch; David Schumm; Frank Leymann; Samir Sebahi; Fabien De Marchi; Mohand-Said Hacid
Governing business compliance with regulations, laws, best practices, contracts, and the like is not an easy task, and so far there are only limited software products available that help a company to express compliance rules and to analyze its compliance state. We argue that today’s SOA-based way of implementing and conducting business (e.g., using Web services and business process engines) lends itself very well to the development of a comprehensive compliance government solution that effectively aids companies in being compliant. In this paper, we contextualize the compliance problem in SOA-based businesses, we highlight which are the most salient research challenges that need to be addressed, and we describe our approach to compliance governance, spanning design, execution, and evaluation concerns.
extending database technology | 2002
Fabien De Marchi; Stéphane Lopes; Jean-Marc Petit
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, algorithms need to be devised to discover foreign keys. One of the underlying problems is known to be the inclusion dependency (IND) inference problem. In this paper a new data mining algorithm for computing unary INDs is given. From unary INDs, we also propose a levelwise algorithmto discover all remaining INDs, where candidate INDs of size i + 1 are generated fromsatisfied INDs of size i, (i > 0).An implementation of these algorithms has been achieved and tested against synthetic databases. Up to our knowledge, this paper is the first one to address in a comprehensive manner this data mining problem, from algorithms to experimental results.
intelligent information systems | 2009
Fabien De Marchi; Stéphane Lopes; Jean-Marc Petit
Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved. This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this data-mining problem.
Information Systems | 2007
Fabien De Marchi; Jean-Marc Petit
Functional dependencies (FDs) and inclusion dependencies (INDs) convey most of data semantics in relational databases and are very useful in practice since they generalize keys and foreign keys. Nevertheless, FDs and INDs are often not available, obsolete or lost in real-life databases. Several algorithms have been proposed for mining these dependencies, but the output is always in the same format: a simple list of dependencies, hard to understand for the user. In this paper, we define informative Armstrong databases (IADBs) from databases as being small subsets of an existing database, satisfying exactly the same FDs and INDs. They are an extension of the classical notion of Armstrong databases, but more suitable for the understanding of dependencies, since tuples are real-world tuples. The main result of this paper is to bound the size of an IADB in the case of non-circular INDs. A constructive proof of this result is given, from which an algorithm has been devised. An implementation and experiments against a real-life database were performed; the obtained database contains 0.6% of the initial database tuples only. More importantly, such semantic sampling of databases appear to be a key feature for the understanding of existing databases at the logical level.
international conference on management of data | 2003
Fabien De Marchi; Stéphane Lopes; Jean-Marc Petit; Farouk Toumani
Whereas physical database tuning has received a lot of attention over the last decade, logical database tuning seems to be under-studied. We have developed a project called DBA Companion devoted to the understanding of logical database constraints from which logical database tuning can be achieved.In this setting, two main data mining issues need to be addressed: the first one is the design of efficient algorithms for functional dependencies and inclusion dependencies inference and the second one is about the interestingness of the discovered knowledge. In this paper, we point out some relationships between database analysis and data mining. In this setting, we sketch the underlying themes of our approach. Some database applications that could benefit from our project are also described, including logical database tuning.
Lecture Notes in Computer Science | 2004
Fabien De Marchi; Frédéric Flouvat; Jean-Marc Petit
Given the theoretical framework of Mannila and Toivonen [26], we are interested in the discovery of the positive border of interesting patterns, also called the most specific interesting patterns. Many approaches have been proposed among which we quote the levelwise algorithm and the Dualize and Advance algorithm. In this paper, we propose an adaptive strategy – complementary to these two algorithms – based on four steps: 1) In order to initialize the discovery, eliciting some elements of the negative border, for instance using a levelwise strategy until a certain level k. 2) From the negative border found so far, inferring the optimistic positive border by dualization, i.e. the set of patterns whose all specializations are known to be not interesting patterns. 3) Estimating the distance between the positive border to be discovered and the optimistic positive border. 4) Based on these estimates, carrying out an adaptive search either bottom-up (the jump was too optimistic) or top-down (the solution should be very close). We have instantiated this proposition to the problem of inclusion dependency (IND) discovery. IND is a generalization of the well known concept of foreign keys in databases and is very important in practice. We will first point out how the problem of IND discovery fits into the theoretical framework of [26]. Then, we will describe an instantiation of our adaptive strategy for IND discovery, called Zigzag, from which some experiments were conducted on synthetic databases. The underlying application of this work takes place in a project called DBA Companion devoted to the understanding of existing databases at the logical level using data mining techniques.
intelligent information systems | 2010
Frédéric Flouvat; Fabien De Marchi; Jean-Marc Petit
The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.
intelligent information systems | 2005
Fabien De Marchi; Jean-Marc Petit
Approximating a collection of patterns is a new and active area of research in data mining. The main motivation lies in two observations : the number of mined patterns is often too large to be useful for any end-users and user-defined input parameters of many data mining algorithms are most of the time almost arbitrary defined (e.g. the frequency threshold).
international syposium on methodologies for intelligent systems | 2002
Fabien De Marchi; Stéphane Lopes; Jean-Marc Petit
From statistics, sampling technics were proposed and some of them were proved to be very useful in many database applications. Rather surprisingly, it seems these works never consider the preservation of data semantics. Since functional dependencies (FDs) are known to convey most of data semantics, an interesting issue would be to construct samples preserving FDs satisfied in existing relations.To cope with this issue, we propose in this paper to define Informative Armstrong Relations (IARs); a relation s is an IAR for a relation r if s is a subset of r and if FDs satisfied in s are exactly the same as FDs satisfied in r. Such a relation always exists since r is obviously an IAR for itself; moreover we shall point out that small IARs with interesting bounded sizes exist. Experiments on relations available in the KDD archive were conducted and highlight the interest of IARs to sample existing relations.
international symposium on multimedia | 2009
Ahmed Azough; Alexandre Delteil; Mohand-Said Hacid; Fabien De Marchi
Uncertainty is one of the major challenges related to the semantic gap in multimedia data description and retrieval. It is due not only to errors and imprecisions in content classification but also to the extended range of user queries. In this paper, an extension of fuzzy conceptual graphs, suitable for handling uncertainty in visual event description and retrieval, is presented. We deal with two types of graphs according to the sources of uncertainty. An extension of fuzzy spatial and temporal relationships is defined to capture imprecision in video content spatiotemporal descriptions. Moreover, similarity measures and matching algorithms are defined to assess the degree of match between the descriptions of video objects and the event model and then to detect events within video segments.