Is this you? Create Your Porfile

Marc Tommasi

French Institute for Research in Computer Science and Automation

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marc Tommasi is active.

Explore More

Publication

Featured researches published by Marc Tommasi.

machine learning and data mining in pattern recognition | 2003

Learning multi-label alternating decision trees from texts and data

Francesco De Comité; Rémi Gilleron; Marc Tommasi

Multi-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singers Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data.

rewriting techniques and applications | 2004

Querying Unranked Trees with Stepwise Tree Automata

Julien Carme; Joachim Niehren; Marc Tommasi

The problem of selecting nodes in unranked trees is the most basic querying problem for XML. We propose stepwise tree automata for querying unranked trees. Stepwise tree automata can express the same monadic queries as monadic Datalog and monadic second-order logic. We prove this result by reduction to the ranked case, via a new systematic correspondence that relates unranked and ranked queries.

foundations of computer science | 1993

Solving systems of set constraints with negated subset relationships

Rémi Gilleron; Sophie Tison; Marc Tommasi

We present a decision procedure, based on tree automata techniques, for satisfiability of systems of set constraints including negated subset relationships. This result extends all previous works on set constraints solving and solves a problem which was left open by L. Bachmair et al. (1993). We prove in a constructive way that a non empty set of solutions always contains a regular solution, that is a tuple of regular tree languages. Moreover, we think that the new class of tree automata described here could be interesting in its own.<<ETX>>

International Workshop of the Initiative for the Evaluation of XML Retrieval | 2006

XML Document Transformation with Conditional Random Fields

Rémi Gilleron; Florent Jousse; Isabelle Tellier; Marc Tommasi

We address the problem of structure mapping that arises in xml data exchange or xml document transformation. Our approach relies on xml annotation with semantic labels that describe local tree editions. We propose xml Conditional Random Fields (xcrfs), a framework for building conditional models for labeling xml documents. We equip xcrfs with efficient algorithms for inference and parameter estimation. We provide theoretical arguments and practical experiments that illustrate their expressivity and efficiency. Experiments on the Structure Mapping movie datasets of the inex xml Document Mining Challenge yield very good results.

web intelligence | 2006

Interactive Tuples Extraction from Semi-Structured Data

Rémi Gilleron; Patrick Marty; Marc Tommasi; Fabien Torre

This paper studies from a machine learning viewpoint the problem of extracting tuples of a target n-ary relation from tree structured data like XML or XHTML documents. Our system can extract, without any post-processing, tuples for all data structures including nested, rotated and cross tables. The wrapper induction algorithm we propose is based on two main ideas. It is incremental: partial tuples are extracted by increasing length. It is based on a representation-enrichment procedure: partial tuples of length i are encoded with the knowledge of extracted tuples of length i-1. The algorithm is then set in a friendly interactive wrapper induction system for Web documents. We evaluate our system on several information extraction tasks over corporate Web sites. It achieves state-of-the-art results on simple data structures and succeeds on complex data structures where previous approaches fail. Experiments also show that our interactive framework significantly reduces the number of user interactions needed to build a wrapper

european conference on machine learning | 2014

Fast Gaussian pairwise constrained spectral clustering

David Chatel; Pascal Denis; Marc Tommasi

We consider the problem of spectral clustering with partial supervision in the form of must-link and cannot-link constraints. Such pairwise constraints are common in problems like coreference resolution in natural language processing. The approach developed in this paper is to learn a new representation space for the data together with a distance in this new space. The representation space is obtained through a constraint-driven linear transformation of a spectral embedding of the data. Constraints are expressed with a Gaussian function that locally reweights the similarities in the projected space. A global, non-convex optimization objective is then derived and the model is learned via gradient descent techniques. Our algorithm is evaluated on standard datasets and compared with state of the art algorithms, like [14,18,31]. Results on these datasets, as well on the CoNLL-2012 coreference resolution shared task dataset, show that our algorithm significantly outperforms related approaches and is also much more scalable.

developments in language theory | 2003

Residual finite tree automata

Julien Carme; Rémi Gilleron; Aurélien Lemay; Alain Terlutte; Marc Tommasi

Tree automata based algorithms are essential in many fields in computer science such as verification, specification, program analysis. They become also essential for databases with the development of standards such as XML. In this paper, we define new classes of non deterministic tree automata, namely residual finite tree automata (RFTA). In the bottom-up case, we obtain a new characterization of regular tree languages. In the top-down case, we obtain a subclass of regular tree languages which contains the class of languages recognized by deterministic top-down tree automata. RFTA also come with the property of existence of canonical non deterministic tree automata.

international colloquium on grammatical inference | 2002

A Tool for Language Learning Based on Categorial Grammars and Semantic Information

Daniela Dudau Sofronie; Isabelle Tellier; Marc Tommasi

Natural language learning still remains an open problem, although there exist many approaches issued by actual researches. We also address ourselves this challenge and we provide here a prototype of a tool. First we need to clarify that we center on the syntactic level. We intend to find a (set of) grammar(s) that recognizes new correct sentences (in the sense of the correct order of the words) by means of some initial correct examples that are presented and of a strategy to deduce the corresponding grammar(s) consistent(s) with the examples at each step. In this model, the grammars are the support of the languages, so, the process of learning is a process of grammatical inference. Usually, in NLP approaches, natural language is represented by lexicalized grammars because the power of the language consists in the information provided by the words and their combination schemas. That’s why we adopt here the formal model of a categorial grammar that assigns every word a category and furnishes some general combination schema of categories. But, in our model, the strings of words are not sufficient for the inference, so additional information is needed. In Kanazawa’s work [3] the additional information is the internal structure of each sentence as a Structural Example. We try to provide instead a more lexicalized information, of semantic nature: the semantic type of words. Its provenance, as well as the psycho-linguistic motivation can be found in [1] and [2].

european conference on machine learning | 2014

Hypernode graphs for spectral learning on binary relations over sets

Thomas Ricatte; Rémi Gilleron; Marc Tommasi

We introduce hypernode graphs as weighted binary relations between sets of nodes: a hypernode is a set of nodes, a hyperedge is a pair of hypernodes, and each node in a hypernode of a hyperedge is given a non negative weight that represents the node contribution to the relation. Hypernode graphs model binary relations between sets of individuals while allowing to reason at the level of individuals. We present a spectral theory for hypernode graphs that allows us to introduce an unnormalized Laplacian and a smoothness semi-norm. In this framework, we are able to extend spectral graph learning algorithms to the case of hypernode graphs. We show that hypernode graphs are a proper extension of graphs from the expressive power point of view and from the spectral analysis point of view. Therefore hypernode graphs allow to model higher order relations whereas it is not true for hypergraphs as shown in [1]. In order to prove the potential of the model, we represent multiple players games with hypernode graphs and introduce a novel method to infer skill ratings from game outcomes. We show that spectral learning algorithms over hypernode graphs obtain competitive results with skill ratings specialized algorithms such as Elo duelling and TrueSkill.

international conference on machine learning and applications | 2013

Learning from Multiple Graphs Using a Sigmoid Kernel

Thomas Ricatte; Gemma C. Garriga; Rémi Gilleron; Marc Tommasi

This paper studies the problem of learning from a set of input graphs, each of them representing a different relation over the same set of nodes. Our goal is to merge those input graphs by embedding them into an Euclidean space related to the commute time distance in the original graphs. This is done with the help of a small number of labeled nodes. Our algorithm output a combined kernel that can be used for different graph learning tasks. We consider two combination methods: the (classical) linear combination and the sigmoid combination. We compare the combination methods on node classification tasks using different semi-supervised graph learning algorithms. We note that the sigmoid combination method exhibits very positive results.

Explore More