Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Arnaud Soulet is active.

Publication


Featured researches published by Arnaud Soulet.


international conference on data mining | 2011

Mining Dominant Patterns in the Sky

Arnaud Soulet; Chedy Raïssi; Marc Plantevit; Bruno Crémilleux

Pattern discovery is at the core of numerous data mining tasks. Although many methods focus on efficiency in pattern mining, they still suffer from the problem of choosing a threshold that influences the final extraction result. The goal of our study is to make the results of pattern mining useful from a user-preference point of view. To this end, we integrate into the pattern discovery process the idea of skyline queries in order to mine skyline patterns in a threshold-free manner. Because the skyline patterns satisfy a formal property of dominations, they not only have a global interest but also have semantics that are easily understood by the user. In this work, we first establish theoretical relationships between pattern condensed representations and skyline pattern mining. We also show that it is possible to compute automatically a subset of measures involved in the user query which allows the patterns to be condensed and thus facilitates the computation of the skyline patterns. This forms the basis for a novel approach to mining skyline patterns. We illustrate the efficiency of our approach over several data sets including a use case from chemo informatics and show that small sets of dominant patterns are produced under various measures.


knowledge discovery and data mining | 2005

An efficient framework for mining flexible constraints

Arnaud Soulet; Bruno Crémilleux

Constraint-based mining is an active field of research which is a key point to get interactive and successful KDD processes. Nevertheless, usual solvers are limited to particular kinds of constraints because they rely on properties to prune the search space which are incompatible together. In this paper, we provide a general framework dedicated to a large set of constraints described by SQL-like and syntactic primitives. This set of constraints covers the usual classes and introduces new tough and flexible constraints. We define a pruning operator which prunes the search space by automatically taking into account the characteristics of the constraint at hand. Finally, we propose an algorithm which efficiently makes use of this framework. Experimental results highlight that usual and new complex constraints can be mined in large datasets.


pacific-asia conference on knowledge discovery and data mining | 2004

Condensed Representation of Emerging Patterns

Arnaud Soulet; Bruno Crémilleux; François Rioult

Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs. We also give a method to provide EPs with the highest growth rates, we call them strong emerging patterns (SEPs). In collaboration with the Philips company, experiments show the interests of SEPs.


Data Mining and Knowledge Discovery | 2008

Adequate condensed representations of patterns

Arnaud Soulet; Bruno Crémilleux

Patterns are at the core of the discovery of a lot of knowledge from data but their uses are limited due to their huge number and their mining cost. During the last decade, many works addressed the concept of condensed representation w.r.t. frequency queries. Such representations are several orders of magnitude smaller than the size of the whole collections of patterns, and also enable us to regenerate the frequency information of any pattern. In this paper, we propose a framework for condensed representations w.r.t. a large set of new and various queries named condensable functions based on interestingness measures (e.g., frequency, lift, minimum). Such condensed representations are achieved thanks to new closure operators automatically derived from each condensable function to get adequate condensed representations. We propose a generic algorithm Mic Mac to efficiently mine the adequate condensed representations. Experiments show both the conciseness of the adequate condensed representations and the efficiency of our algorithm.


intelligent data analysis | 2009

Mining constraint-based patterns using automatic relaxation

Arnaud Soulet; Bruno Crémilleux

Constraint-based mining is an active field of research which is a necessary step to achieve interactive and successful KDD processes. The limitations of the task lies in languages being limited to describe the mined patterns and the ability to express varied constraints. In practice, current approaches focus on a language and the most generic frameworks mine individually or simultaneously a monotone and an anti-monotone constraints. In this paper, we propose a generic framework dealing with any partially ordered language and a large set of constraints. We prove that this set of constraints called primitive-based constraints not only is a superclass of both kinds of monotone ones and their boolean combinations but also other classes such as convertible and succinct constraints. We show that the primitive-based constraints can be efficiently mined thanks to a relaxation method based on virtual patterns which summarize the specificities of the search space. Indeed, this approach automatically deduces pruning conditions having suitable monotone properties and thus these conditions can be pushed into usual constraint mining algorithms. We study the optimal relaxations. Finally, we provide an experimental illustration of the efficiency of our proposal by experimenting it on several contexts.


international conference on computational science and its applications | 2008

Discovering Knowledge from Local Patterns with Global Constraints

Bruno Crémilleux; Arnaud Soulet

It is well known that local patterns are at the core of a lot of knowledge which may be discovered from data. Nevertheless, use of local patterns is limited by their huge number and computational costs. Several approaches (e.g., condensed representations, pattern set discovery) aim at selecting or grouping local patterns to provide a global view of the data. In this paper, we propose the idea of global constraints to write queries addressing global patterns as sets of local patterns. Usefulness of global constraints is to take into account relationships between local patterns, such relations expressing a user bias according to its expectation (e.g., search of exceptions, top-kpatterns). We think that global constraints are a powerful way to get meaningful patterns. We propose the generic Approximate-and-Push approach to mine patterns under global constraints and we give a method for the case of the top-kpatterns w.r.t. any measure. Experiments show its efficiency since it was not feasible to mine such patterns beforehand.


KDID'04 Proceedings of the Third international conference on Knowledge Discovery in Inductive Databases | 2004

Condensed representation of EPs and patterns quantified by frequency-based measures

Arnaud Soulet; Bruno Crémilleux; François Rioult

Emerging patterns (EPs) are associations of features whose frequencies increase significantly from one class to another. They have been proven useful to build powerful classifiers and to help establishing diagnosis. Because of the huge search space, mining and representing EPs is a hard and complex task for large datasets. Thanks to the use of recent results on condensed representations of frequent closed patterns, we propose here an exact condensed representation of EPs (i.e., all EPs and their growth rates). From this condensed representation, we give a method to provide interesting EPs, in fact those with the highest growth rates. We call strong emerging patterns (SEPs) these EPs. We also highlight a property characterizing the jumping emerging patterns. Experiments quantify the interests of SEPs (smaller number, ability to extract longer and less frequent patterns) and show their usefulness (in collaboration with the Philips company, SEPs successfully enabled to identify the failures of a production chain of silicon plates). These concepts of condensed representation and “strong patterns” with respect to a measure are generalized to other interestingness measures based on frequencies.


KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases | 2006

Efficient mining under rich constraints derived from various datasets

Arnaud Soulet; Jiří Kléma; Bruno Crémilleux

Mining patterns under many kinds of constraints is a key point to successfully get new knowledge. In this paper, we propose an efficient new algorithm MUSIC-DFS which soundly and completely mines patterns with various constraints from large data and takes into account external data represented by several heterogeneous datasets. Constraints are freely built of a large set of primitives and enable to link the information scattered in various knowledge sources. Efficiency is achieved thanks to a new closure operator providing an interval pruning strategy applied during the depth-first search of a pattern space. A transcriptomic case study shows the effectiveness and scalability of our approach. It also demonstrates a way to employ background knowledge, such as free texts or gene ontologies, in the discovery of meaningful patterns.


data warehousing and knowledge discovery | 2012

Mining contextual preference rules for building user profiles

Sandra de Amo; Mouhamadou Saliou Diallo; Cheikh Talibouya Diop; Arnaud Giacometti; Haoyuan D. Li; Arnaud Soulet

The emerging of ubiquitous computing technologies in recent years has given rise to a new field of research consisting in incorporating context-aware preference querying facilities in database systems. One important step in this setting is the Preference Elicitation task which consists in providing the user ways to inform his/her choice on pairs of objects with a minimal effort. In this paper we propose an automatic preference elicitation method based on mining techniques. The method consists in extracting a user profile from a set of user preference samples. In our setting, a profile is specified by a set of contextual preference rules verifying properties of soundness and conciseness. We evaluate the efficacy of the proposed method in a series of experiments executed on a real-world database of user preferences about movies.


computer-based medical systems | 2006

Mining Plausible Patterns from Genomic Data

Jiri Klema; Arnaud Soulet; Bruno Crémilleux; Sylvain Blachon; Olivier Gandrillon

The discovery of biologically interpretable knowledge from gene expression data is one of the largest contemporary genomic challenges. As large volumes of expression data are being generated, there is a great need for automated tools that provide the means to analyze them. However, the same tools can provide an overwhelming number of candidate hypotheses which can hardly be manually exploited by an expert. An additional knowledge helping to focus automatically on the most plausible candidates only can up-value the experiment significantly. Background knowledge available in literature databases, biological ontologies and other sources can be used for this purpose. In this paper we propose and verify a methodology that enables to effectively mine and represent meaningful over-expression patterns. Each pattern represents a bi-set of a gene group over-expressed in a set of biological situations. The originality of the framework consists in its constraint-based nature and an effective cross-fertilization of constraints based on expression data and background knowledge. The result is a limited set of candidate patterns that are most likely interpretable by biologists. Supplemental automatic interpretations serve to ease this process. Various constraints can generate plausible pattern sets of different characteristics

Collaboration


Dive into the Arnaud Soulet's collaboration.

Top Co-Authors

Avatar

Arnaud Giacometti

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Patrick Marcel

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Damien Nouvel

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Dominique Haoyuan Li

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Jean-Yves Antoine

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Nathalie Friburger

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Cheikh Talibouya Diop

François Rabelais University

View shared research outputs
Top Co-Authors

Avatar

Bruno Crémilleux

Centre national de la recherche scientifique

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marie Ndiaye

François Rabelais University

View shared research outputs
Researchain Logo
Decentralizing Knowledge