François Jacquenet | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where François Jacquenet is active.

Explore More

Publication

Featured researches published by François Jacquenet.

Machine Learning | 2009

Mining probabilistic automata: a statistical view of sequential pattern mining

Stéphanie Jacquemont; François Jacquenet; Marc Sebban

During the past decade, sequential pattern mining has been the core of numerous research efforts. It is now possible to efficiently extract knowledge of users’ behavior from a huge set of sequences collected over time. This has applications in various domains such as purchases in supermarkets, Web site visits, etc. However, sequence mining algorithms do little to control the risks of extracting false discoveries or overlooking true knowledge. In this paper, the theoretical conditions to achieve a relevant sequence mining process are examined. Then, the article offers a statistical view of sequence mining which has the following advantages: First, it uses a compact and generalized representation of the original sequences in the form of a probabilistic automaton. Second, it integrates statistical constraints to guarantee the extraction of significant patterns. Finally, it provides an interesting solution in a privacy preserving context in order to respect individuals’ information. An application in car flow modeling is presented, showing the ability of our algorithm (acsm) to discover frequent routes without any private information. Comparisons with a classical sequence mining algorithm (spam) are made, showing the effectiveness of our approach.

inductive logic programming | 2002

Mining frequent logical sequences with SPIRIT-LoG

Cyrille Masson; François Jacquenet

Sequence mining is an active research field of data mining because algorithms designed in that domain lead to various valuable applications. To increase efficiency of basic sequence mining algorithms, generally based on a levelwise approach, more recent algorithms try to introduce some constraints to prune the search space during the discovery process. Nevertheless, existing algorithms are actually limited to extract frequent sequences made up of items of a database. In this paper, we generalize the notion of sequence to define what we call logical sequence where each element of a sequence may contain some logical variables. Then we show how we can extend constrained sequence mining to constrained frequent logical sequence mining.

Knowledge Based Systems | 2009

Discovering unexpected documents in corpora

François Jacquenet; Christine Largeron

Text mining is widely used to discover frequent patterns in large corpora of documents. Hence, many classical data mining techniques, that have been proven fruitful in the context of data stored in relational databases, are now successfully used in the context of textual data. Nevertheless, there are many situations where it is more valuable to discover unexpected information rather than frequent ones. In the context of technology watch for example, we may want to discover new trends in specific markets, or discover what competitors are planning in the near future, etc. This paper is related to that context of research. We have proposed several unexpectedness measures and implemented them in a prototype, called UnexpectedMiner, that can be used by watchers, in order to discover unexpected documents in large corpora of documents (patents, datasheets, advertisements, scientific papers, etc.). UnexpectedMiner is able to take into account the structure of documents during the discovery of unexpected information. Many experiments have been performed in order to validate our measures and show the interest of our system.

web intelligence | 2007

Correct your text with Google

Stéphanie Jacquemont; François Jacquenet; Marc Sebban

With the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features.With the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features.

international colloquium on grammatical inference | 2002

Generalized Stochastic Tree Automata for Multi-relational Data Mining

Amaury Habrard; Marc Bernard; François Jacquenet

This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.

Engineering Societies in the Agents World IX | 2009

Sensitive Data Transaction in Hippocratic Multi-Agent Systems

Ludivine Crépin; Yves Demazeau; Olivier Boissier; François Jacquenet

The current evolution of Information Technology leads to the increase of automatic data processing over multiple information systems. The data we deal with concerns sensitive information about users or groups of users. A typical problem in this context concerns the disclosure of confidential identity data. To tackle this difficulty, we consider in this paper the context of Hippocratic Multi-Agent Systems (HiMAS), a model designed for the privacy management. In this context, we propose a common content language combining meta-policies and application context data on one hand and on the other hand an interaction protocol for the exchange of sensitive data. Based on this proposal, agents providing sensitive data are able to check the compliance of the consumers to the HiMAS principles. The protocol that we propose is validated on a distributed calendar management application.

artificial intelligence in medicine in europe | 2003

Multi-Relational Data Mining in Medical Databases

Amaury Habrard; Marc Bernard; François Jacquenet

This paper presents the application of a method for mining data in a multi-relational database that contains some information about patients strucked down by chronic hepatitis. Our approach may be used on any kind of multirelational database and aims at extracting probabilistic tree patterns from a database using Grammatical Inference techniques. We propose to use a representation of the database by trees in order to extract these patterns. Trees provide a natural way to represent structured information taking into account the statistical distribution of the data. In this work we try to show how they can be useful for interpreting knowledge in the medical domain.

european conference on principles of data mining and knowledge discovery | 2004

Discovering unexpected information for technology watch

François Jacquenet; Christine Largeron

The purpose of technology watch is to gather, process and integrate the scientific and technical information that is useful to economic players. In this article, we propose to use text mining techniques to automate processing of data found in scientific text databases. The watch activity introduces an unusual difficulty compared with conventional areas of application for text mining techniques since, instead of searching for frequent knowledge hidden in the texts, the target is unexpected knowledge. As a result, the usual measures used for knowledge discovery have to be revised. For that purpose, we have developed the UnexpectedMiner system using new measures for to estimate the unexpectedness of a document. Our system is evaluated using a base that contains articles relating to the field of machine learning.

Pattern Recognition Letters | 2009

A lower bound on the sample size needed to perform a significant frequent pattern mining task

Stéphanie Jacquemont; François Jacquenet; Marc Sebban

During the past few years, the problem of assessing the statistical significance of frequent patterns extracted from a given set S of data has received much attention. Considering that S always consists of a sample drawn from an unknown underlying distribution, two types of risks can arise during a frequent pattern mining process: accepting a false frequent pattern or rejecting a true one. In this context, many approaches presented in the literature assume that the dataset size is an application-dependent parameter. In this case, there is a trade-off between both errors leading to solutions that only control one risk to the detriment of the other one. On the other hand, many sampling-based methods have attempted to determine the optimal size of S ensuring a good approximation of the original (potentially infinite) database from which S is drawn. However, these approaches often resort to Chernoff bounds that do not allow the independent control of the two risks. In this paper, we overcome the mentioned drawbacks by providing a lower bound on the sample size required to control both risks and achieve a significant frequent pattern mining task.

european conference on artificial intelligence | 2016

Relational grounded language learning

Leonor Becerra-Bonache; Hendrik Blockeel; María Galván; François Jacquenet

In the past, research on learning language models mainly used syntactic information during the learning process but in recent years, researchers began to also use semantic information. This paper presents such an approach where the input of our learning algorithm is a dataset of pairs made up of sentences and the contexts in which they are produced. The system we present is based on inductive logic programming techniques that aim to learn a mapping between n-grams and a semantic representation of their associated meaning. Experiments have shown that we can learn such a mapping that made it possible later to generate relevant descriptions of images or learn the meaning of words without any linguistic resource.

Explore More