Is this you? Create Your Porfile

Rémi Gilleron

French Institute for Research in Computer Science and Automation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Rémi Gilleron is active.

Explore More

Publication

Featured researches published by Rémi Gilleron.

machine learning and data mining in pattern recognition | 2003

Learning multi-label alternating decision trees from texts and data

Francesco De Comité; Rémi Gilleron; Marc Tommasi

Multi-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singers Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data.

extending database technology | 2009

Retrieving meaningful relaxed tightest fragments for XML keyword search

Lingbo Kong; Rémi Gilleron; Aurélien Lemay Mostrare

Adapting keyword search to XML data has been attractive recently, generalized as XML keyword search (XKS). One of its key tasks is to return the meaningful fragments as the result. [1] is the latest work following this trend, and it focuses on returning the fragments rooted at SLCA (Smallest LCA -- Lowest Common Ancestor) nodes. To guarantee that the fragments only contain interesting nodes, [1] proposes a contributor-based filtering mechanism in its MaxMatch algorithm. However, the filtering mechanism is not sufficient. It will commit the false positive problem (discarding interesting nodes) and the redundancy problem (keeping uninteresting nodes). In this paper, our interest is to propose a framework of retrieving meaningful fragments rooted at not only the SLCA nodes, but all LCA nodes. We begin by introducing the concept of Relaxed Tightest Fragment (RTF) as the basic result type. Then we propose a new filtering mechanism to overcome those two problems in Max-Match. Its kernel is the concept of valid contributor, which helps to distinguish the interesting children of a node. The new filtering mechanism is then to prune the nodes in a RTF which are not valid contributors to their parents. Based on the valid contributor concept, our ValidRTF algorithm not only overcomes those two problems in MaxMatch, but also satisfies the axiomatic properties deduced in [1] that an XKS technique should satisfy. We compare ValidRTF with MaxMatch on real and synthetic XML data. The result verifies our claims, and shows the effectiveness of our valid-contributor-based filtering mechanism.

foundations of computer science | 1993

Solving systems of set constraints with negated subset relationships

Rémi Gilleron; Sophie Tison; Marc Tommasi

We present a decision procedure, based on tree automata techniques, for satisfiability of systems of set constraints including negated subset relationships. This result extends all previous works on set constraints solving and solves a problem which was left open by L. Bachmair et al. (1993). We prove in a constructive way that a non empty set of solutions always contains a regular solution, that is a tuple of regular tree languages. Moreover, we think that the new class of tree automata described here could be interesting in its own.<<ETX>>

web information and data management | 2008

Automatic wrapper induction from hidden-web sources with domain knowledge

Pierre Senellart; Avin Mittal; Daniel Muschick; Rémi Gilleron; Marc Tommasi

We present an original approach to the automatic induction of wrappers for sources of the hidden Web that does not need any human supervision. Our approach only needs domain knowledge expressed as a set of concept names and concept instances. There are two parts in extracting valuable data from hidden-Web sources: understanding the structure of a given HTML form and relating its fields to concepts of the domain, and understanding how resulting records are represented in an HTML result page. For the former problem, we use a combination of heuristics and of probing with domain instances; for the latter, we use a supervised machine learning technique adapted to tree-like information on an automatic, imperfect, and imprecise, annotation using the domain knowledge. We show experiments that demonstrate the validity and potential of the approach.

symposium on theoretical aspects of computer science | 1991

Decision problems for term rewriting systems and recognizable tree languages

Rémi Gilleron

We study the connections between recognizable tree languages and rewrite systems. We investigate some decision problems. Particularly, let us consider the property (P): a rewrite system S is such that, for every recognizable tree language F, the set of S-normal forms of terms in F is recognizable too. We prove that the property (P) is undecidable. We prove that the existential fragment of the theory of ground term algebras modulo a congruence \(\mathop \leftrightarrow \limits^* E\) generated by a set E of equations such that there exists a finite, noetherian, confluent rewrite system S satisfying (P) with \(\mathop \leftrightarrow \limits^* S = \mathop \leftrightarrow \limits^* E\) is undecidable. Nevertheless, we develop a decision procedure for the validity of linear formulas in a fiagment of such a theory.

international conference on machine learning and applications | 2010

Boosting Multi-Task Weak Learners with Applications to Textual and Social Data

Jean Baptiste Faddoul; Boris Chidlovskii; Fabien Torre; Rémi Gilleron

Learning multiple related tasks from data simultaneously can improve predictive performance relative to learning these tasks independently. In this paper we propose a novel multi-task learning algorithm called MT-Adaboost: it extends Adaboost algorithm Freund1999Short to the multi-task setting, it uses as multi-task weak classifier a multi-task decision stump. This allows to learn different dependencies between tasks for different regions of the learning space. Thus, we relax the conventional hypothesis that tasks behave similarly in the whole learning space. Moreover, MT-Adaboost can learn multiple tasks without imposing the constraint of sharing the same label set and/or examples between tasks. A theoretical analysis is derived from the analysis of the original Adaboost. Experiments for multiple tasks over large scale textual data sets with social context (Enron and Tobacco) give rise to very promising results.

european conference on machine learning | 2012

Learning multiple tasks with boosted decision trees

Jean Baptiste Faddoul; Boris Chidlovskii; Rémi Gilleron; Fabien Torre

We address the problem of multi-task learning with no label correspondence among tasks. Learning multiple related tasks simultaneously, by exploiting their shared knowledge can improve the predictive performance on every task. We develop the multi-task Adaboost environment with Multi-Task Decision Trees as weak classifiers. We first adapt the well known decision tree learning to the multi-task setting. We revise the information gain rule for learning decision trees in the multi-task setting. We use this feature to develop a novel criterion for learning Multi-Task Decision Trees. The criterion guides the tree construction by learning the decision rules from data of different tasks, and representing different degrees of task relatedness. We then modify MT-Adaboost to combine Multi-task Decision Trees as weak learners. We experimentally validate the advantage of the new technique; we report results of experiments conducted on several multi-task datasets, including the Enron email set and Spam Filtering collection.

language and automata theory and applications | 2008

Efficient Inclusion Checking for Deterministic Tree Automata and DTDs

Jérôme Champavère; Rémi Gilleron; Aurélien Lemay; Joachim Niehren

We present a new algorithm for testing language inclusion L(A) ⊆ L(B) between tree automata in time O(|A|*|B|) where Bis deterministic. We extend this algorithm for testing inclusion between automata for unranked trees Aand deterministic DTDs Din time O(|A|*|Σ|*|D|). No previous algorithms with these complexities exist.

international colloquium on grammatical inference | 2006

Learning n-ary node selecting tree transducers from completely annotated examples

Aurélien Lemay; Joachim Niehren; Rémi Gilleron

We present the first algorithm for learning n-ary node selection queries in trees from completely annotated examples by methods of grammatical inference. We propose to represent n-ary queries by deterministic n-ary node selecting tree transducers (n-NSTTs). These are tree automata that capture the class of monadic second-order definable n-ary queries. We show that n-NSTTs defined polynomially bounded n-ary queries can be learned from polynomial time and data. An application in Web information extraction yields encouraging results.

International Workshop of the Initiative for the Evaluation of XML Retrieval | 2006

XML Document Transformation with Conditional Random Fields

Rémi Gilleron; Florent Jousse; Isabelle Tellier; Marc Tommasi

We address the problem of structure mapping that arises in xml data exchange or xml document transformation. Our approach relies on xml annotation with semantic labels that describe local tree editions. We propose xml Conditional Random Fields (xcrfs), a framework for building conditional models for labeling xml documents. We equip xcrfs with efficient algorithms for inference and parameter estimation. We provide theoretical arguments and practical experiments that illustrate their expressivity and efficiency. Experiments on the Structure Mapping movie datasets of the inex xml Document Mining Challenge yield very good results.

Explore More